This repository contains the codebase developed by the CUDA_Libre team for the Neural Wave Hackathon 2024, where our solution earned 1st place. The project automates the verification of steel bar alignment in a rolling mill using state-of-the-art AI methodologies to enhance operational efficiency and minimize human error.
Manual inspection of steel bar alignment is a labor intensive task that can lead to errors due to operator fatigue. Our solution automates this verification, allowing plant operators to focus on more critical aspects of the production process. The workflow can be divided into two key stages:
- Semi-Supervised Labeling Workflow
- Model Training and Inference
graph TD
subgraph Input["Initial Dataset"]
A1[Raw Images] --> B1[Labeling Phase]
end
subgraph Labeling["Semi-Supervised Labeling"]
C1[DINO v2 Feature Extraction]
C1 --> D1[FAISS Index & KNN Search]
D1 --> E1[Label Assignment]
end
B1 --> Labeling
subgraph Training["Model Training"]
F1[Train EfficientNet B0]
F1 --> G1[Trained Model]
end
E1 --> Training
subgraph Inference["Inference Phase"]
H1[New Image Input]
H1 --> I1[Predict Alignment Status]
end
G1 --> Inference
subgraph Output["Results"]
I1 --> J1[Alignment Status Output]
I1 --> K1[Performance Metrics]
end
Given the large, mostly unlabeled dataset of 15,630 images, we developed an efficient labeling strategy combining manual and automated techniques:
- Manual Labeling: We labeled an initial subset of 5,000 images, creating a foundation for reliable training data.
- DINO v2 for Embeddings: We used DINO v2, a self-supervised vision transformer model, to generate high-dimensional embeddings of the images. These embeddings capture complex semantic features that make it possible to measure image similarity effectively.
- K-Nearest Neighbors (KNN) with FAISS: We applied FAISS for fast, scalable similarity searches within the embedding space. For each unlabeled image, we identified its K-nearest neighbors and assigned a label based on a majority vote of their known labels.
- Cosine Similarity: To ensure robust label assignment, we utilized cosine similarity as the metric for calculating distances in the embedding space:
This method enabled us to expand the labeled dataset efficiently without manual effort for each image
The expanded dataset was used to train an EfficientNet B0 model, chosen for its balance of accuracy and computational efficiency:
- Preprocessing: Images were resized to 256 pixels, centrally cropped to 224 pixels, and normalized using the mean [0.485, 0.456, 0.406] and standard deviation [0.229, 0.224, 0.225].
- Model Architecture: We fine-tuned EfficientNet B0, adapting the classification layer to output a binary classification result for the alignment status.
- Training Details: The model was trained for 30 epochs, with the optimal performance observed at epoch 10. Key performance metrics included:
- Accuracy: 93.40%
- Precision: 94.37%
- Recall: 95.82%
- F1 Score: 95.09%
The model demonstrated reliable classification capabilities with a mean inference time of 0.0298 seconds per image, meeting the real-time requirement of under 0.5 seconds per image.
Inference Time Statistic | Time (seconds) |
---|---|
Mean Time | 0.0298 |
25th Percentile | 0.0111 |
Median (50th Percentile) | 0.0117 |
75th Percentile | 0.0128 |
The project includes the following main components:
data/
: Dataset handling scripts, includingDufercoDataset.py
and preprocessing utilities.dino/
: Code for feature extraction and FAISS-based similarity searches.labeling_workflow/
: GUI (gui_labeler.py
) for manual labeling support.models/
: Training scripts for EfficientNet and related models.train.py
andtest.py
: Scripts for training and evaluating the model.requirements.txt
: List of dependencies.
Install the required packages with:
pip install -r requirements.txt
Run the training script directly:
python train.py --data_config_path "dataset/processed_augmented_split.json" --batch_size 32 --num_epochs 30 --learning_rate 0.0001 --checkpoint_path "checkpoints/efficient_net"
Evaluate the model using:
python test.py --data_config_path "dataset/split.json" --batch_size 16 --model_path "checkpoints/efficient_net/20241027_083453/model_epoch_10.pt"
This project is licensed under the MIT License - see the LICENSE file for details.