ICCV 2023 Paper: From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels
#single GPU
python tools/train.py configs/distillers/imagenet/res18_sd_img.py
#multi GPU
bash tools/dist_train.sh configs/distillers/imagenet/res34_distill_res18_img.py 8
# Tansfer the Distillation model into mmcls model
python pth_transfer.py --dis_path $dis_ckpt --output_path $new_mmcls_ckpt
#single GPU
python tools/test.py configs/resnet/resnet18_8xb32_in1k.py $new_mmcls_ckpt --metrics accuracy
#multi GPU
bash tools/dist_test.sh configs/resnet/resnet18_8xb32_in1k.py $new_mmcls_ckpt 8 --metrics accuracy
Model | Teacher | Baseline(Top-1 Acc) | +NKD(Top-1 Acc) | dis_config | weight |
---|---|---|---|---|---|
ResNet18 | ResNet34 | 69.90 | 71.96 (+2.06) | config | baidu/one drive |
MobileNet | ResNet50 | 69.21 | 72.58 (+3.37) | config | baidu/one drive |
DeiT-Tiny | DeiT III-Small | 74.42 | 76.68 (+2.26) | config | |
DeiT-Base | DeiT III-Large | 81.76 | 84.96 (+3.20) | config |
Model | Baseline(Top-1 Acc) | +tf-NKD(Top-1 Acc) | dis_config |
---|---|---|---|
MobileNet | 69.21 | 70.38 (+1.17) | config |
MobileNetV2 | 71.86 | 72.41 (+0.55) | config |
ShuffleNetV2 | 69.55 | 70.30 (+0.75) | config |
ResNet18 | 69.90 | 70.79 (+0.89) | config |
ResNet50 | 76.55 | 77.07 (+0.52) | config |
ResNet101 | 77.97 | 78.54 (+0.57) | config |
RegNetX-1.6GF | 76.84 | 77.30 (+0.46) | config |
Swin-Tiny | 81.18 | 81.49 (+0.31) | config |
DeiT-Tiny | 74.42 | 74.97 (+0.55) | config |
@inproceedings{yang2023knowledge,
title={From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels},
author={Yang, Zhendong and Zeng, Ailing and Yuan, Chun and Li, Yu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={17185--17194},
year={2023}
}