Sharing the techniques that worked (and did not work) in the competition organized by Andrew Ng & DeepLearning.AI
Link to Medium writeup: https://towardsdatascience.com/data-centric-ai-competition-tips-and-tricks-of-a-top-5-finish-9cacc254626e
Data is food for AI, and there is vast potential for model performance improvement by shifting from a model-centric to a data-centric approach. That is the motivation behind the recent Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI.
In this repo, I unveil the methods (and codes) of my Top 5% ranked submission (~84% accuracy, ranked 24), including the various techniques that worked and did not work for me. Do check out the Medium article for a more in-depth look at my thought process and methods behind the submission.
- Link to competition page: https://https-deeplearning-ai.github.io/data-centric-comp/
- A collaboration between DeepLearning.AI and Landing AI, the Data-Centric AI Competition aims to elevate data-centric approaches to improving the performance of machine learning models.
- In most machine learning competitions, you are asked to build a high-performance model given a fixed dataset.
- However, machine learning has matured to the point that high-performance model architectures are widely available, while approaches to engineering datasets have lagged.
- The Data-Centric AI Competition inverts the traditional format and instead asks you to improve a dataset given a fixed model. We will provide you with a dataset to improve by applying data-centric techniques such as fixing incorrect labels, adding examples that represent edge cases, apply data augmentation, etc.
- Full_Notebook_Best_Submission.ipynb (Complete walkthrough codes for the best submission I submitted for the competition)
- experiment_tracker.csv (Spreadsheet tracker I used to monitor my various experiments)
- /data (Public Roman MNIST dataset released by the competition)