Paper: ViTKD: Practical Guidelines for ViT feature knowledge distillation
#multi GPU
bash tools/dist_train.sh configs/distillers/imagenet/deit-s3_distill_deit-t_img.py 4
# Tansfer the Distillation model into mmcls model
python pth_transfer.py --dis_path $dis_ckpt --output_path $new_mmcls_ckpt
#multi GPU
bash tools/dist_test.sh configs/deit/deit-tiny_pt-4xb256_in1k.py $new_mmcls_ckpt 8 --metrics accuracy
Model | Teacher | T_weight | Baseline | ViTKD | weight | ViTKD+NKD | weight | dis_config |
---|---|---|---|---|---|---|---|---|
DeiT-Tiny | DeiT III-Small | baidu/one drive | 74.42 | 76.06 (+1.64) | baidu/one drive | 77.78 (+3.36) | baidu/one drive | config |
DeiT-Small | DeiT III-Base | baidu/one drive | 80.55 | 81.95 (+1.40) | baidu/one drive | 83.59 (+3.04) | baidu/one drive | config |
DeiT-Base | DeiT III-Large | baidu/one drive | 81.76 | 83.46 (+1.70) | baidu/one drive | 85.41 (+3.65) | baidu/one drive | config |
@article{yang2022vitkd,
title={ViTKD: Practical Guidelines for ViT feature knowledge distillation},
author={Yang, Zhendong and Li, Zhe and Zeng, Ailing and Li, Zexian and Yuan, Chun and Li, Yu},
journal={arXiv preprint arXiv:2209.02432},
year={2022}
}