Sample predictions of the model on test set images. Prediction threshold in this sample is set to 0.9 to better filter out false cases or less confident predictions.
The following repository trains a model using a pre-trained Faster R-CNN torch model for object detection. The model is fine-tuned using the Drinks Dataset and is used to determine the bounding boxes and class of the detected drink. The main classes to be detected by the model are Summit (water bottle), Coke (red soda can), and Pine Juice (green pineapple juice in can.)
The main scripts in the repository can be setup on a personal machine to Run locally or online to Run on Kaggle. It is suggested to go for the latter if there are issues with setting up the pre-requisites. Ensure that there is enough free disk space (~3GB) to properly store the dataset and model. The scripts (train.py and test.py) automatically download both the dataset and the pre-trained model during runtime.
- Install dependencies
Properly setup CUDA in your machine to leverage the GPU.
pip install -r requirements.txt
- Train the model
NOTE: This step is optional since the test script will use the fine-tuned model if there is no locally trained model.
python train.py
- Evaluate the model on the test dataset
python test.py
NOTE: The notebook runs train.py and test.py as scripts to simulate running on personal machines. Additionally, it also contains sample code for plotting predictions on some test set images.
- Register and log in to Kaggle to access custom environment preferences.
- Go to the Kaggle notebook. Press the "Copy & Edit" button.
- Start up the notebook and run with a GPU accelerator and with internet settings toggled on. The settings on the right panel should be similar to the image below:
- Run all the cells
A sample method is given under this section but it is also possible to apply the same ideas for running on colab.
The ntbk
folder contains the initial exploratory code. The notebooks in the directory will require initial setup of pre-loading the dataset and/or trained model to work unlike train.py
and test.py
. However, there are writeups on the notebook to guide the user in the setup and configuration of directories.
- ntbks/ee298z-assignment-2-object-detection-train.ipynb
- Contains code for the custom dataloader, training and saving the model, and sample inference on test images.
- ntbks/ee298z-assignment-2-object-detection-video-gen.ipynb
- Contains code for the loading the trained model, readings frames from a video file, and applying the object detection model to detect the location of the drinks in the image and the corresponding classes.
- NOTES: Due to the limitation of live camera video feed in Kaggle notebooks, feeding from a video file (.mp4) was the workaround for the demo submission. A demo file is generated with 640x480 resolution and 30 frames per second.
- ee298z-hw2-object-detection.ipynb
- Contains the code from the Kaggle notebook