This is the official code release for our paper "". [Paper] [Project Page]
Poseiden (Pose In Dynamic Environment) is a stereo-based 3D human pose estimation model capable of providing absolute-scale 3D human poses from stereo image pairs. Surpass the limitation of dynamic environments like underwater where 3D ground truths are extermely challenging to acquire, the model only requires 2D groud truths for training.
This repository is the implementation of the stereo-based 3D human pose estimation model proposed in the paper. For more information about the auto-refinement pipeline preposed in the paper, please refer to DiverPose-AutoRefinement (Coming Soon). Note that the model in this repository has been re-trained. Its performance is close to the model reported in the paper but does not match it exactly.
-
Build Docker Image
docker build -t diverpose docker/
All codes within this repository should be able to run under this docker environment.
-
Run Docker container:
- Change the
$WORKSPACE_DIR
variables in run_container.sh to the path where you store this repository before running the following command.
bash run_container.sh
- Change the
- Poseiden requires pretraining process to enhance feature representations in transformer layers.
- Download the 2017 train and val images and annotations from COCO Keypoints Dataset.
- Move the data folder into
data/
and structure as follows:data/ └── coco/ ├── annotations/ ├── train2017/ └── val2017/
- Download MADS_depth and MADS_multiview from MADS: Martial Arts, Dancing, and Sports Dataset
- Run
extract_mads_data.py
to extract images from videospython extract_mads_data.py \ --depth_data_path <PATH_TO_MADS_depth> \ --multiview_data_path <PAth_TO_MADS_multiview> \ --output_path data/MADS_extract \ --rectify
- Note: the root value in
conf/dataset/mads.yaml
should be the same as the directory set for output_path (data/MADS_extract
by default)
- Note: the root value in
- (Optional) Visualize the data to check if loaded correctly
python helpers/display_data_3d.py --config-name train_stereo dataset=mads
- One of the key contributions of this paper is to automatically refine human annotations from stereo keypoints. Please refer to DiverPose-AutoRefinement for more details and guideline for download.
- Once extracts image and annotations, put the data folder under
data/
. - (Optional) Visualize the data to check if loaded correctly
python helpers/display_data_3d.py --config-name train_stereo dataset=diver
We use the Hydra library to manage configurations. For more information, please refer to the Hydra documentation.
python train.py --config-name train_mono name=<CUSTOM_NAME_FOR_MODEL> dataset=coco
python train.py \
--config-name train_stereo \
name=name=<CUSTOM_NAME_FOR_MODEL> \
model.backbone=gelanS \
dataset=mads \
model.pretrained=<PATH_TO_PRETRAIN_MODEL_WEIGHT> \
model.dmin=5 \
model.dmax=30 \
python train.py \
--config-name train_stereo \
name=name=<CUSTOM_NAME_FOR_MODEL> \
model.backbone=gelanS \
dataset=diver \
model.pretrained=<PATH_TO_PRETRAIN_MODEL_WEIGHT> \
model.dmin=2 \
model.dmax=15 \
- Note that you can also set
model.pretrained=""
to avoid loading weights from pretrained model.
Make sure the mode configuration (dmin, dmax, backbone, etc.) used for testing are same for training.
python test.py \
--config-name test_stereo \
dataset=mads \
model.backbone=gelanS \
model.dmin=5 \
model.dmax=30 \
model_weight=<PATH_TO_MODEL_WEIGHT> \
visualize=False (if true, visualize the estimations and ground truths)
- Note: model weight is also provided here for demo
- Due to the difficulity in collecting 3D ground truths underwater, we collect pseudo ground truths data for validation instead. Please refer to the paper for more details.
- Download test data (coming soon)
- Run:
python test_diver.py \ --config-name test_diver \ model.backbone=gelanS \ model.dmin=2 \ model.dmax=15 \ data_path=<PATH_TO_TEST_DATA> \ model_weight=<PATH_TO_MODEL_WEIGHT> \ yolo_weight=<PATH_TO_ONNX_MODEL>
- Note: model weight is also provided here for demo
- The YOLOv7 onnx model is used here to locate the diver and crop the region of it from the entire image. Please refer to DiverPose-AutoRefinement for more details or download the model weight here for convenience.
Several functions in this repository are adapted and modified from TransPose, YOLOv7, and mmpose.
If you use this code or the DiverPose dataset for your research, please cite: