This is the official repoistory of the paper Progression-Guided Temporal Action Detection in Videos. Our model achieves 58% [email protected] on THUMOS14 in end-to-end manner.
# 2080ti
conda create -n open-mmlab -y
conda activate open-mmlab
conda install pytorch torchvision -c pytorch
conda install pandas h5py scipy
pip install openmim future tensorboard future timm pytorchvideo
mim install mmengine mmaction2 mmdet
Download the pre-processed THUMOS14 raw frames and the annotations (APN format), and put them under the repo root. You are suggested to put the data in other palce (SSD would be best) and set a symbolic link here pointing to the data path. The folder structure should be like:
APN
|-- configs
|-- ...
|-- my_data
| |-- thumos14
| | |-- annotations
| | | |-- apn
| | | | |-- apn_train.csv
| | | | |-- apn_val.csv
| | | | |-- apn_test.csv
| | |-- rawframes
| | | |-- train
| | | | |-- v_BaseballPitch_g01_c01
| | | | | |-- img_00000.jpg
| | | | | |-- img_00001.jpg
| | | | | |-- ...
| | | | | |-- img_00106.jpg
| | | | | |-- flow_x_00000.jpg
| | | | | |-- flow_x_00001.jpg
| | | | | |-- ...
| | | | | |-- flow_x_00105.jpg
| | | | | |-- flow_y_00000.jpg
| | | | | |-- flow_y_00001.jpg
| | | | | |-- ...
| | | | | |-- flow_y_00105.jpg
| | | | |-- ...
| | | |-- val
| | | | |-- video_validation_0000051
| | | | |-- ...
| | | |-- test
| | | | |-- video_test_0000004
| | | | |-- ...
- Optical flows (TVL1) and RGB frames are included.
- Only videos with temporal annotations (20 classes) are keeped.
- Some wrong annotated videos are removed.
Let's take as example the implementation of APN on THUMOS14 of optical flow using I3D as backbone with resolution of (32 frames x 4 stride):
train.sh configs/localization/apn/apn_r3dsony_32x4_10e_thumos14_flow 2
*replace the 2
with the number of GPUs you want use.
After the training finished, you may use the below command to test the trained checkpoint.
test.sh configs/localization/apn/apn_r3dsony_32x4_10e_thumos14_flow.py work_dirs/apn_r3dsony_32x4_10e_thumos14_flow/latest.pth 2
*replace the 2
with the number of GPUs you want use.
Our code is based on the MMAction2.
If you find our work useful, please cite:
@article{lu2023progression,
title={Progression-Guided Temporal Action Detection in Videos},
author={Lu, Chongkai and Mak, Man-Wai and Li, Ruimin and Chi, Zheru and Fu, Hong},
journal={arXiv preprint arXiv:2308.09268},
year={2023}
}