HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation
Chengpeng Wu, Guangxing Tan*, Chunyu Li
With the code contained in this repo, you should be able to reproduce the following results.
Method | Test set | Input size | Params | GFLOPs | Hea | Sho | Elb | Wri | Hip | Kne | Ank | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|
HEViTPose-T | MPII val | 256×256 | 3.21M | 1.75G | 95.9 | 94.9 | 87.4 | 81.6 | 87.4 | 81.6 | 77.2 | 87.2 |
HEViTPose-S | MPII val | 256×256 | 5.88M | 3.64G | 96.3 | 95.2 | 88.7 | 83.3 | 88.5 | 83.9 | 79.5 | 88.5 |
HEViTPose-B | MPII val | 256×256 | 10.63M | 5.58G | 96.5 | 95.6 | 89.5 | 84.5 | 89.1 | 85.7 | 81.1 | 89.4 |
HEViTPose-T | MPII test-dev | 256×256 | 3.21M | 1.75G | 97.6 | 95.1 | 89.0 | 83.6 | 89.1 | 83.9 | 79.1 | 88.7 |
HEViTPose-S | MPII test-dev | 256×256 | 5.88M | 3.64G | 97.8 | 95.9 | 90.5 | 86.0 | 89.7 | 86.0 | 81.7 | 90.1 |
HEViTPose-B | MPII test-dev | 256×256 | 10.63M | 5.58G | 98.0 | 96.1 | 91.3 | 86.5 | 90.2 | 86.6 | 83.0 | 90.7 |
Method | Test set | Input size | AP | AP .5 | AP .75 | AP (M) | AP (L) | AR |
---|---|---|---|---|---|---|---|---|
HEViTPose-B | COCO val | 256×256 | 75.4 | 93.6 | 83.5 | 72.4 | 79.6 | 78.2 |
HEViTPose-B | COCO test-dev | 256×256 | 72.6 | 92.0 | 80.9 | 69.2 | 78.2 | 78.0 |
Some examples of the prediction results of the HEViTPose network model for human posture include occlusion, multiple people, viewpoint and appearance change on the MPII (top) and COCO (bottom) data sets.
git clone https://github.com/T1sweet/HEViTPose
cd ./HEViTPose
conda create -n HEViTPose python=3.9
conda activate HEViTPose
Our model is trained in a GPU platforms and relies on the following versions: torch==1.10.1+cu113, torchvision==0.11.2+cu113
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
Our code is based on the MMPose 0.29.0 code database, and dependencies can be installed through the methods provided by MMPose. Install MMCV using MIM.
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip install -U openmim
mim install mmcv-full==1.4.5
Install other dependency.
pip install -r requirements.txt
Download MPII and COCO from website and put the zip file under the directory following below structure, (xxx.json) denotes their original name.
./data
|── coco
│ └── annotations
| | └──coco_train.json(person_keypoints_train2017.json)
| | └──coco_val.json(person_keypoints_val2017.json)
| | └──coco_test.json(image_info_test-dev2017.json)
| └── images
| | └──train2017
| | | └──000000000009.jpg
| | └──val2017
| | | └──000000000139.jpg
| | └──test2017
| | | └──000000000001.jpg
├── mpii
│ └── annotations
| | └──mpii_train.json(refer to DEKR, link:https://github.com/HRNet/DEKR)
| | └──mpii_val.json
| | └──mpii_test.json
| | └──mpii_gt_val.mat
| └── images
| | └──100000.jpg
Change the checkpoint path by modifying pretrained
in HEViTPose-B_mpii_256x256.py, and run following commands:
python tools/test.py config checkpoint
config
option means the configuration file, which must be set.
checkpoint
option means the training weight file and must be set.
# evaluate HEViTPose-B on mpii val set
python tools/test.py ../configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/HEViTPose-B_mpii_256x256.py /work_dir/HEViTPose/HEViTPose-B.pth
# evaluate HEViTPose-S on mpii val set
python tools/test.py ../configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/HEViTPose-S_mpii_256x256.py /work_dir/HEViTPose/HEViTPose-S.pth
# evaluate HEViTPose-T on mpii val set
python tools/test.py ../configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/HEViTPose-T_mpii_256x256.py /work_dir/HEViTPose/HEViTPose-T.pth
# evaluate HEViTPose-B on coco val set
python tools/test.py ../configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/HEViTPose-B_coco_256x256.py /work_dir/HEViTPose/HEViTPose-B_coco.pth
Change the checkpoint path by modifying pretrained
in HEViTPose-B_mpii_256x256.py, and run following commands:
# evaluate HEViTPose-B on mpii val set
python tools/train.py ../configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/HEViTPose-B_mpii_256x256.py
# evaluate HEViTPose-B on coco val2017 set
python tools/train.py ../configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/HEViTPose-B_coco_256x256.py
If you have any questions about this code or paper, feel free to contact me at [email protected].
If you find this code useful for your research, please cite our paper:
@misc{wu2024hevitpose,
title = {HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation},
author = {Chengpeng Wu, Guangxing Tan*, Chunyu Li},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023},
eprint={2311.13615 },
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This algorithm is based on code database MMPose, and its main ideas are inspired by EfficientViT, PVTv2, Swin and other papers.
@misc{mmpose2020,
title={OpenMMLab Pose Estimation Toolbox and Benchmark},
author={MMPose Contributors},
howpublished = {\url{https://github.com/open-mmlab/mmpose}},
year={2020}
}