Hyperight

Sculpting Data for Machine Learning – Jigyasa Grover & Rishabh Misra, Twitter

Session Outline

In machine learning algorithms, “data is the new oil.” For ML algorithms to work their magic, it’s important to lay a strong foundation with access to relevant data. Volumes of crude data are available on the web nowadays, and all we need are the skills to identify and extract meaningful datasets. This talk aims to present the power of the most fundamental aspect of Machine Learning – Dataset Curation, which often does not get its due limelight. This talk at the Data Innovation Summit 2023, walks the audience through the process of constructing good quality datasets as done in formal settings with a simple hands-on Pythonic example. The goal is to institute the importance of data, especially in its worthy format, and the spell it casts on fabricating smart learning algorithms.

Key Takeaways:

  • Significance of data in Machine Learning
  • Identification of relevant data signals
  • End-to-end process of data collection and dataset construction
  • Overview of extraction tools like BeautifulSoup and Selenium
  • Synopsis of Data Preprocessing and Feature Engineering

Add comment

Upcoming Events