This is a repository contains the implementation of our AAAI'23 oral paper Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning.
Use the following instructions to create the corresponding conda environment.
conda create -n hico python=3.9 anaconda
conda activate hico
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 -c pytorch
pip3 install tensorboard
- Download raw NTU-RGB+D 60 and 120 skeleton data and save to ./data folder.
- data/
- nturgbd_raw/
- nturgb+d_skeletons/
...
- samples_with_missing_skeletons.txt
- nturgbd_raw_120/
- nturgb+d_skeletons/
...
- samples_with_missing_skeletons.txt
- Preprocess data with with
data_gen/ntu_gendata.py
.
cd data_gen
python ntu_gendata.py
HiCo consumes less (due to smaller encoders and queues), so we only implemented single GPU training.
- Run the following script for pretraining. It will save the checkpoints to
./checkpoints/$TEST_NAME/
.
./run_pretraining.sh $CUDA_DEVICE $TEST_NAME $DATASET $PROTOCOL $REPRESENTATION
$CUDA_DEVICE
is the ID of used GPU.
$TEST_NAME
is the name of the folder where the checkpoints are saved in.
$DATASET
is the dataset to use for unsupervised pretraining (ntu60 or ntu120).
$PROTOCOL
means training protocol (cross_subject/cross_view for ntu60, and cross_subject/cross_setup for ntu120).
$REPRESENTATION
is the input skeleton representation (joint or bone or motion).
- An example of pretraining on NTU-60 x-view joint stream.
./run_pretraining.sh 0 ntu60_xview_joint ntu60 cross_view joint
- Task1: Skeleton-based action recognition. Train a linear classifier on pretrained query encoder. The parameter meaning is the same as above.
./run_action_classification.sh $CUDA_DEVICE $TEST_NAME $DATASET $PROTOCOL $REPRESENTATION
It will automatically evaluate on the checkpoint of the last epoch obtained from pretraining. The following example is an evaluation for the previous pretraining on NTU-60 x-view joint stream.
./run_action_classification.sh 0 ntu60_xview_joint ntu60 cross_view joint
- Task2: Skeleton-based action retrieval. Apply a KNN classifier on on pretrained query encoder. It's similar to action recognition, here is an example.
./run_action_retrieval.sh 0 ntu60_xview_joint ntu60 cross_view joint
We release several pretrained models:
- HiCo-GRU on NTU-60 and NTU-120: released_model
- HiCo-LSTM on NTU-60 and NTU-120: released_model
- HiCo-Transformer on NTU-60 and NTU-120: released_model
Expected performance on skeleton-based action recognition:
Model | NTU 60 xsub (%) | NTU 60 xview (%) | NTU 120 xsub (%) | NTU 120 xset (%) |
---|---|---|---|---|
HiCo-GRU | 80.6 | 88.6 | 72.5 | 73.8 |
HiCo-LSTM | 81.4 | 88.8 | 73.7 | 74.5 |
HiCo-Transformer | 81.1 | 88.6 | 72.8 | 74.1 |
We utilize t-SNE to visualize the learned action representation obtained by our HiCo-Transformer models with different granularities on NTU-60 xsub.
If you find this repository useful, please consider citing our paper:
@inproceedings{hico2023,
title={Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning},
author={Jianfeng Dong and Shengkai Sun and Zhonglin Liu and Shujie Chen and Baolong Liu and Xun Wang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2023}
}
The framework of our code is based on skeleton-contrast.
This work was supported by the NSFC (61902347, 62002323, 61976188), the Public Welfare Technology Research Project of Zhejiang Province (LGF21F020010), the Open Projects Program of the National Laboratory of Pattern Recognition, the Fundamental Research Funds for the Provincial Universities of Zhejiang.