Developed models to classify mixed patterns of proteins in microscope images
During training I used many manual steps.
-
I stopped the training when the model started to overfit and reduce the learning rate to train more.
-
I used checkpoint the save the best weight and use it as an initialize weight for the next training.
-
I hand-picked different training and validation set without using cross-validation for training the model.
image_classification.ipynb
is for training the classification.
ensemble.ipynb
is for merging multiple predicted probabilities as the final submission.
Below are the solution that I used for the final submission.
-
EfficientNet-b0 vs EfficientNet-b1 vs EfficientNet-b2 vs EfficientNet-b4 vs ResetNet101 vs DenseNet121. Result shows that EfficientNets perform better than Resnets and Densenets.
-
EfficientNet-B1 vs EfficientNet-B2: I used both of them for ensemble.
-
I didn't use EfficientNet-B4 because the score drops when I resized the image.
AdamW vs Adam: AdamW optimizer converges faster than Adam.
See more details in Why AdamW matters and AdamW and Super-convergence is now the fastest way to train neural nets
OneCyclic vs CosineAnnealingWarmRestarts:
-
CosineAnnealingWarmRestarts converges faster but the better f1 scores (>0.82) are from using Onecyclic with initial learning rate 0.0001 and 26 #epochs.
-
I used CosineAnnealingWarmRestarts for finding the best model, then OneCyclic is used for generating a single submission result before ensemble.
This article explains why choosing the right number of epochs and learning rate matters for OneCyclic.
- I use below augmentation during training time and turn it off during validation and test time.
- RandomHorizontalFlip
- RandomVerticalFlip
- RandomRotation
- ColorJitter(brightness=0.2, saturation=0.2, contrast=0.2)
- Resize() (used only when searching for hyperparameters, I get higher scores without using Resize())
- Choose a loss function that align with the F1-score metric, which are binary cross-entropy or focal loss. When focal loss decreases, f1 score doesn't increase much. I get better f1-scores from using binary cross-entropy.
- I ensemble the 6 best checkpoints of my model.
- I set 0.5 as the threshold value as the predicted classes and then fill missing classes with threshold over 0.46 and 0.445 respectively.
argmax()
is used for predicting the rest of the missing classes. - At first I filled the missing classes with mode class (class 4) and then I changed to fill it with the argmax probabilities that the model generated.
best_checkpoint/weight.pth