Skip to content
/ STMT Public

Code for the CVPR'23 paper: "STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition"

Notifications You must be signed in to change notification settings

zgzxy001/STMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

This repository contains the code for the following CVPR'23 paper:

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Introduction

We study the problem of human action recognition using motion capture (MoCap) sequences. Unlike existing techniques that take multiple manual steps to derive standardized skeleton representations as model input, we propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences. The model uses a hierarchical transformer with intra-frame off-set attention and inter-frame self-attention. The attention mechanism allows the model to freely attend between any two vertex patches to learn non-local relationships in the spatial-temporal domain. Masked vertex modeling and future frame prediction are used as two self-supervised tasks to fully activate the bi-directional and auto-regressive attention in our hierarchical transformer. The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models on common MoCap benchmarks.

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Zhu_2023_CVPR,
    author    = {Zhu, Xiaoyu and Huang, Po-Yao and Liang, Junwei and de Melo, Celso M. and Hauptmann, Alexander G.},
    title     = {STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {1526-1536}
}

Installation

Requirements

Clone the AMASS repo and run the following from the root folder:

python install -r requirements.txt
python setup.py develop

Dataset

  • Please download the KIT and BABEL dataset from this link.

  • The dataset split is available at Google Drive.

Code

Training and Inference

To perform training and inference, please run:

$ python3 main.py --root_path ./data/kit_pt_processed_dataset --save_root_dir ./ckpt/stmt_training --framenum 24

Pre-Trained Model

Please use this link to download our pre-trained model: Google Drive

License

Our code and models are only for ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY.

Acknowledgements

Our STMT is based on AMASS Point-Spatio-Temporal-Convolution, P4Transformer and SequentialPointNet.

Contact

Feel free to email me if you have any questions.

About

Code for the CVPR'23 paper: "STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages