Skip to content

Fork of the winning solution by knshnb and tyamaguchi17, described in Patton et al., (2023), to be archived at Zenodo

Notifications You must be signed in to change notification settings

philpatton/multispecies-photoid

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1st Place Solution of Kaggle Happywhale Competition

This is the knshnb's part of the Preferred Dolphin's solution for Happywhale - Whale and Dolphin Identification.

Dataset

Please prepare dataset according to input/README.md and place under input/.

$ ls -F input
fullbody_test_charm.csv   pseudo_labels/          test_backfin.csv*  train_images/
fullbody_test.csv         README.md               test_images/       yolov5_test.csv
fullbody_train_charm.csv  sample_submission.csv*  train2.csv         yolov5_train.csv
fullbody_train.csv        species.npy*            train_backfin.csv
individual_id.npy*        test2.csv               train.csv

Reproducing the winning score

Before the final training round, we repeated 2 rounds of Step 1-2 for pseudo labeling. By default, input/pseudo_labels/round2.csv (the pseudo labels we created) is specified in the config file so that you can skip the first two rounds. You can train from scratch by setting None in pseudo_label field in config files.

Step 1: Training and inference

By src/train.py, we

  1. train model by whole train data.
  2. inference test data and save results under result/{exp_name}/-1/.

Several examples of config files are located in config/.

Example: Training and inference efficientnet_b6 and efficientnet_b7

python -m src.train --config_path config/efficientnet_b6.yaml --exp_name b6
python -m src.train --config_path config/efficientnet_b7.yaml --exp_name b7

Step 2: Postprocess and ensemble

By src/ensemble.py, we

  1. calculate mean of the predictions by knn and logit for each model.
  2. ensemble predictions of the models specified by --model_dirs.
  3. save prediction as submission/{out_prefix}-{new_ratio}-{threshold}.csv.
  4. save pseudo label as submission/pseudo_label_{out_prefix}.csv.

Predictions generated by charmq's repository are saved in the same format, so you can ensemble them by just specifying paths to model directories.

Example: Ensemble b6 and b7

python -m src.ensemble --model_dirs result/b6/-1 result/b7/-1 --out_prefix b6-b7

In our post submission, single model (efficientnet_b7) achieved a score that could rank 3rd place in the final leaderboard. We also confirmed that ensemble of only two models (efficientnet_b6 and efficientnet_b7) could win 1st place. Ensembling more backbones and charmq's modesl can achieve even better results.

Citation

@article{patton2023deep,
  title={A deep learning approach to photo--identification demonstrates high performance on two dozen cetacean species},
  author={Patton, Philip T and Cheeseman, Ted and Abe, Kenshin and Yamaguchi, Taiki and Reade, Walter and Southerland, Ken and Howard, Addison and Oleson, Erin M and Allen, Jason B and Ashe, Erin and others},
  journal={Methods in ecology and evolution},
  volume={14},
  number={10},
  pages={2611--2625},
  year={2023},
  publisher={Wiley Online Library}
}

Links

About

Fork of the winning solution by knshnb and tyamaguchi17, described in Patton et al., (2023), to be archived at Zenodo

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.7%
  • Dockerfile 0.3%