Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. In GCNs, graph topology dominates feature aggregation and therefore is the key to extracting representative features. In this work, we propose a novel Channel-wise Topology Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies and effectively aggregate joint features in different channels for skeleton-based action recognition. The proposed CTR-GC models channel-wise topologies through learning a shared topology as a generic prior for all channels and refining it with channel-specific correlations for each channel. Our refinement method introduces few extra parameters and significantly reduces the difficulty of modeling channel-wise topologies. Furthermore, via reformulating graph convolutions into a unified form, we find that CTR-GC relaxes strict constraints of graph convolutions, leading to stronger representation capability. Combining CTR-GC with temporal modeling modules, we develop a powerful graph convolutional network named CTR-GCN which notably outperforms state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
@inproceedings{chen2021channel,
title={Channel-wise topology refinement graph convolution for skeleton-based action recognition},
author={Chen, Yuxin and Zhang, Ziqi and Yuan, Chunfeng and Li, Bing and Deng, Ying and Hu, Weiming},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={13359--13368},
year={2021}
}
We release numerous checkpoints trained with various modalities, annotations on NTURGB+D and NTURGB+D 120. The accuracy of each modality links to the weight file.
Dataset | Annotation | Joint Top1 | Bone Top1 | Joint Motion Top1 | Bone Motion Top1 | Two-Stream Top1 | Four Stream Top1 |
---|---|---|---|---|---|---|---|
NTURGB+D XSub | Official 3D Skeleton | joint_config: 69.3 | bone_config: 63.5 | joint_motion_config: 64.3 | bone_motion_config: 65.8 | 71.2 | 74.1 |
NTURGB+D XView | Official 3D Skeleton | joint_config: 75.6 | bone_config: 72.6 | joint_motion_config: 73.0 | bone_motion_config: 71.9 | 77.5 | 80.4 |
NTURGB+D 120 XSub | Official 3D Skeleton | joint_config: 57.7 | bone_config: 58.7 | joint_motion_config: 54.8 | bone_motion_config: 54.8 | 61.6 | 63.2 |
NTURGB+D 120 XSet | Official 3D Skeleton | joint_config: 61.6 | bone_config: 60.2 | joint_motion_config: 58.2 | bone_motion_config: 56.2 | 64.3 | 66.0 |
We also provide numerous checkpoints trained with BFL (Balanced Representation Learning) on NTURGB+D. The accuracy of each modality links to the weight file.
Dataset | Annotation | Joint Top1 | Bone Top1 | Skip Top1 | Joint Motion Top1 | Bone Motion Top1 | Skip Motion Top1 | Two-Stream Top1 | Four Stream Top1 | Six Stream Top1 |
---|---|---|---|---|---|---|---|---|---|---|
NTURGB+D XSub | Official 3D Skeleton | joint_config: 76.9 | bone_config: 77.3 | skip_config: 76.7 | joint_motion_config: 73.0 | bone_motion_config: 72.9 | skip_motion_config: 73.3 | 80.3 | 81.2 | 81.8 |
NTURGB+D XView | Official 3D Skeleton | joint_config: 80.7 | bone_config: 79.8 | skip_config: 79.9 | joint_motion_config: 78.9 | bone_motion_config: 75.9 | skip_motion_config: 77.2 | 83.1 | 84.7 | 85.0 |
Note
- We use the linear-scaling learning rate (Initial LR ∝ Batch Size). If you change the training batch size, remember to change the initial LR proportionally.
- For Two-Stream results, we adopt the 1 (Joint):1 (Bone) fusion. For Four-Stream results, we adopt the 2 (Joint):2 (Bone):1 (Joint Motion):1 (Bone Motion) fusion. For Six-Stream results, we adopt the 2 (Joint):2 (Bone):2 (Skip):1 (Joint Motion):1 (Bone Motion):1 (Skip Motion) fusion.
You can use the following command to train a model.
bash tools/dist_train.sh ${CONFIG_FILE} ${NUM_GPUS} [optional arguments]
# For example: train CTRGCN on NTURGB+D XSub (Joint Modality) with one GPU, with validation, and test the last and the best (with best validation metric) checkpoint.
bash tools/dist_train.sh configs/ctrgcn/ntu60_xsub_LT_ctrgcn/j.py 1 --validate --test-last --test-best
You can use the following command to test a model.
bash tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${NUM_GPUS} [optional arguments]
# For example: test CTRGCN on NTURGB+D XSub (Joint Modality) with metrics `top_k_accuracy`, and dump the result to `result.pkl`.
bash tools/dist_test.sh configs/ctrgcn/ntu60_xsub_LT_ctrgcn/j.py checkpoints/SOME_CHECKPOINT.pth 1 --eval top_k_accuracy --out result.pkl
You can use the following command to ensemble the results of different modalities.
cd ./tools
python ensemble.py