Cédric Rommel, Victor Letzelter, Nermin Samet, Renaud Marlet, Matthieu Cord, Patrick Pérez, Eduardo Valle
[arXiv]
We propose ManiPose, a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting. We provide theoretical and empirical evidence that, due to the depth ambiguity inherent to monocular 3D human pose estimation, traditional regression models suffer from pose-topology consistency issues, which standard evaluation metrics (MPJPE, P-MPJPE and PCK) fail to assess. ManiPose addresses depth ambiguity by proposing multiple candidate 3D poses for each 2D input, each with its estimated plausibility. Unlike previous multi-hypothesis approaches, ManiPose forgoes generative models, greatly facilitating its training and usage. By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses, in contrast to previous works. We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency by a large margin while being very competitive on the MPJPE metric.
The code requires Python 3.7 or later. The file requirements.txt contains the full list of required Python modules.
pip install -r requirements.txt
You may also optionally install MLFlow for experiment tracking:
pip install mlflow
The Human3.6M dataset was set following the AnyGCN repository. Please refer to it to set it up.
The MPI-INF-3DHP data can be obtained from the official website or, alternatively, using the MMPose library.
Consider adding the path to where the data is stored to the data.data_dir
field in the conf/config.yaml
file. Alternatively, this information can also be passed directly to the training/test command line if preferred, as explained below.
You can download checkpoints of pretrained models from the assets of the last code release, and put them inside checkpoints
.
Once checkpoints and data are correctly downloaded and moved to their correct locations, you can evaluate the models by simply running the commands below:
python hpe/main_h36m_lifting.py \
run.checkpoint_model=hpe/checkpoints/manipose_h36m.pth \
run.train=False \
run.test=True \
data.data_dir=/PATH/TO/H36M/DATA/
python hpe/main_3dhp.py \
run.checkpoint_model=hpe/checkpoints/manipose_3dhp.pth \
+data=mpi_inf_3dhp \
train.batch_size=30 \
train.batch_size_test=30
run.train=False \
run.test=True \
data.data_dir=/PATH/TO/MPI/DATA/
Note that you can omit the data.data_dir
part of the command if you filled the corresponding field in conf/config.yaml
beforehand.
Given a pre-trained model checkpoint, you can visualize the predicted poses using the script viz.py
. For example:
python viz.py \
run.checkpoint_model=hpe/checkpoints/manipose_h36m.pth \
run.train=False \
viz.viz_action=greeting \
viz.viz_subject=S11 \
viz.viz_limit=600
The mp4 or gif created will be saved inside a new figures
folder.
In case you have access to the videos from the official Human 3.6M dataset, you can set them as the background of the visualization by passing their path to the additional command line argument viz.viz_video=/PATH/TO/VIDEO
.
Other visualization configurations can be changed within the viz
field, in conf/config.yaml
.
To train ManiPose from scratch on H36M, run:
python hpe/main_h36m_lifting.py \
run.train=True \
run.test=True \
data.data_dir=/PATH/TO/H36M/DATA/
Likewise, for MPI-INF-3DHP:
python hpe/main_3dhp.py \
+data=mpi_inf_3dhp \
run.train=True \
run.test=True \
train.batch_size=30 \
train.batch_size_test=30 \
data.data_dir=/PATH/TO/MPI/DATA/
Figure 4 can be reproduced in ~2 minutes using a ~10GB GPU. Just run the following command:
cd toy_experiment ; python plotting_script.py
You should find the corresponding figures in the ./figures
folder:
Table 1 can be reproduced using the following command:
bash toy_experiment/quantitative_comparison_toy2d.sh
The Figure 8 from 2D-to-3D toy experiment can be reprocuced and saved in toy_experiment/images
by running
python
Table 6 can also be reproced using the command:
bash toy_experiment/quantitative_comparison_toy3d.sh
The basis of the human pose lifting code was borrowed from DiffHPE, which builds on top of several other repositories, including:
The baseline model MixSTE was modified from its official paper repository.
@inproceedings{rommel2024manipose,
title={ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation},
author={Cédric Rommel and Victor Letzelter and Nermin Samet and Renaud Marlet and Matthieu Cord and Patrick Pérez and Eduardo Valle},
booktitle = {Advances in Neural Information Processing Systems},
publisher = {Curran Associates, Inc.},
volume = {37},
year = {2024}
}