Welcome to our comprehensive solution repository for the 2023 DataSaur Hackathon, specifically designed for the Kaggle competition: Classify images of Automobiles whether they are authentic or fictitious. Our innovative approach involves advanced model training and ensemble techniques to accurately distinguish between authentic and fictitious automobile images.
Our strategy involved the training of multiple sophisticated models, subsequently ensembled for optimal performance. The details of our training process are meticulously documented in 'train.py'. The ensemble model amalgamates the strengths of several networks, namely resnet18, efficientnet_b0, efficientnet_b2, and efficientnet_b3.
Our model achieved remarkable efficiency and accuracy on the provided test set, as evidenced by the following metrics:
- Accuracy: 97.43%
- Precision: 96.67%
- Recall: 96.67%
- F1 Score: 97.29%
Clone the repository and set up the environment by executing the following commands:
git clone https://github.com/AlimTleuliyev/datasaur2023.git
cd datasaur2023
pip install -r requirements.txt
To utilize our pre-trained models, follow these steps:
- Create a 'models' directory within the cloned repository.
- Access and download the pre-trained weights (depending on your task: binary or multiclass classification) via this link.
- Place the downloaded weights into the 'models' directory.
For inference, images should be organized in a specific folder structure. Place the images within a subfolder of your main data directory. This enclosing folder's name will be considered as the class name.
Required Folder Structure:
data_directory_with_images_to_classify
├── image1.jpeg
├── image2.jpeg
├── ...
Utilize 'inference.py' for performing inference. The script accepts five arguments:
task
(required): What task to perform: binary or multiclass classification.image_dir
(required): The path to the directory containing images to classify.batch_size
: The batch size utilized during inference. Default value is 8.num_workers
: The number of workers for the dataloader. Default value is 4.output_name
: The desired name of the output file. It should be a CSV file. Default is labels.csv.
Command Example:
python inference.py --task binary --image_dir data_directory_with_images_to_classify --batch_size 32 --num_workers 4 --output_name results.csv
OR
python inference.py --task multiclass --image_dir data_directory_with_images_to_classify --batch_size 32 --num_workers 4 --output_name results.csv
Your results will be saved in a structured CSV file with the specified name.
We extend our heartfelt thanks to the incredible team members who contributed their expertise and hard work to make this project a success.
- Alim Tleuliyev: [email protected]
- Alikhan Nurkamal: [email protected]
- Beksultan Tleutayev: [email protected]
Feel free to reach out to any of the contributors for questions or feedback concerning the project. We are committed to fostering an open, collaborative environment and welcome any contributions or insights from the community.