The details of the person image animation task are provided here.
Person image animation is to generate a video clip using a source person image and target pose skeletons. Compare with the pose-guided person image generation task, this task requires to model the temporal consistency. Therefore, we modify the model in two ways: the noisy poses extracted by the popular pose extraction methods are first prepossessed by a Motion Extraction Network to obtain clean poses. Then we generate the final animation results in a recurrent manner. The technical details are provided in this paper.
From Left to Right: Skeleton Squences. Propossessed Skeleton Seqences; Animation Results.
Two datasets are used in this task: The FashionVideo dataset and the iPER dataset.
-
Download the videos of the datasets.
-
We provide the Alphapose extraction results of these datasets. Meanwhile, the prepossessed clean poses are also available. Please use the following code to download these resources.
./script/download_animation_skeletons.sh
-
Extract the image frames and resize them as 256 x 256 using the following code
python ./script/extract_video_frames.py \ --frame_root=[path to write the video frames] \ --video_path=[path to the mp4 files] \ --anno_path=[path to the previously downloaded skeletons]
Note: you can also extract the skeleton on your own. Please use the Alphapose algorithm and output the results with openpose format.
Download the trained weights from FashionVideo and iPER. Put the obtained checkpoints under ./result/dance_fashion_checkpoints
and ./result/dance_iper_checkpoints
respectively.
Run the following codes to obtain the animation results.
# test on the fashionVideo dataset
python test.py \
--name=dance_fashion_checkpoints \
--model=dance \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=dance \
--sub_dataset=fashion \
--dataroot=./dataset/danceFashion \
--results_dir=./eval_results/dance_fashion \
--checkpoints_dir=result
# test on the iper dataset
python test.py \
--name=dance_iper_checkpoints \
--model=dance \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=dance \
--sub_dataset=iper \
--dataroot=./dataset/iPER \
--results_dir=./eval_results/dance_iper \
--checkpoints_dir=result
If you want to train the model on your own dataset, you need to first extract the skeletons using the pose extraction algorithm Alphapose. Then extract the clean skeletons from the noisy data using the motion extraction net.
This network is used to prepossess the noisy skeletons extracted by some pose extraction models. We train this model using the Human36M dataset. We download the training ground-truth label data_2d_h36m_gt.npz
from here. The corresponding input label data_2d_h36m_detectron_pt_coco.npz
is download from here.
Use the following code to train this model
python train.py \
--name=keypoint \
--model=keypoint \
--gpu_id=2 \
--dataset_mode=keypoint \
--continue_train
We also provide trained weights. Assuming that you want to smooth the skeleton sequences of the iPER training set, you can use the following code
python test.py \
--name=dance_keypoint_checkpoints \
--model=keypoint \
--gpu_id=2 \
--dataset_mode=keypointtest \
--dataroot=[root path of your dataset] \
--sub_dataset=iper \
--results_dir=[path to save the results] \
--eval_set=[train/test/val]
After obtain the clean skeletons. You can train our model on your dataset using the following code. (Note: you need to modify the dance_dataset.py
to add your dataset as a sub_set)
python train.py \
--name=[name_of_the_experiment] \
--model=dance \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0,1 \
--dataset_mode=dance \
--sub_dataset=[iper/fashion/your_dataset_name] \
--dataroot=[your_dataset_root] \
--continue_train