The PyTorch implementation for "Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser" (AAAI 2024).
Make sure you have the following dependencies installed (python):
- pytorch >= 0.4.0
- matplotlib=3.1.0
- einops
- timm
- tensorboard You should download MATLAB if you want to evaluate our model on MPI-INF-3DHP dataset.
Our model is evaluated on Human3.6M and MPI-INF-3DHP datasets.
We set up the Human3.6M dataset in the same way as VideoPose3D. You can download the processed data from here. data_2d_h36m_gt.npz
is the ground truth of 2D keypoints. data_2d_h36m_cpn_ft_h36m_dbb.npz
is the 2D keypoints obatined by CPN. data_3d_h36m.npz
is the ground truth of 3D human joints. Put them in the ./data
directory.
We set up the MPI-INF-3DHP dataset following D3DP. You can download the processed data from here. Put them in the ./data
directory.
You can download our pre-trained models, which are evaluated on Human3.6M (from here) and MPI-INF-3DHP (from here). Put them in the ./checkpoint
directory.
To evaluate our D3DP with JPMA using the 2D keypoints obtained by CPN as inputs, please run:
python main.py -k cpn_ft_h36m_dbb -c checkpoint/best_h36m_model -gpu 0 --evaluate best_epoch.bin -num_proposals 1 -sampling_timesteps 1 -b 4 --p2
to compare with the deterministic methods.
Please run:
python main.py -k cpn_ft_h36m_dbb -c checkpoint/best_h36m_model -gpu 0 --evaluate best_epoch.bin -num_proposals 20 -sampling_timesteps 10 -b 4 --p2
to compare with the probabilistic methods.
You can balance efficiency and accuracy by adjusting -num_proposals
(number of hypotheses) and -sampling_timesteps
(number of iterations).
To evaluate our D3DP with JPMA using the ground truth 2D poses as inputs, please run:
python main_3dhp.py -c checkpoint/best_3dhp_model -gpu 0 --evaluate best_epoch.bin -num_proposals 5 -sampling_timesteps 5 -b 4 --p2
After that, the predicted 3D poses under P-Best, P-Agg, J-Best, J-Agg settings are saved as four files (.mat
) in ./checkpoint
. To get the MPJPE, AUC, PCK metrics, you can evaluate the predictions by running a Matlab script ./3dhp_test/test_util/mpii_test_predictions_ori_py.m
(you can change 'aggregation_mode' in line 29 to get results under different settings). Then, the evaluation results are saved in ./3dhp_test/test_util/mpii_3dhp_evaluation_sequencewise_ori_{setting name}_t{iteration index}.csv
. You can manually average the three metrics in these files over six sequences to get the final results. An example is shown in ./3dhp_test/test_util/H20_K10/mpii_3dhp_evaluation_sequencewise_ori_J_Best_t10.csv
.
To train our model using the 2D keypoints obtained by CPN as inputs, please run:
python main.py -k cpn_ft_h36m_dbb -c checkpoint/model_ddhpose_h36m -gpu 0
To train our model using the ground truth 2D poses as inputs, please run:
python main_3dhp.py -c checkpoint/model_ddhpose_3dhp -gpu 0
@inproceedings{cai2024disentangled,
title={Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser},
author={Cai, Qingyuan and Hu, Xuecai and Hou, Saihui and Yao, Li and Huang, Yongzhen},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={2},
pages={882--890},
year={2024}
}
Our code refers to the following repositories.
We thank the authors for releasing their codes