Skip to content
/ RSA Public

Official Pytorch Implementation of Relational Self-Attention, NeurIPS 2021

Notifications You must be signed in to change notification settings

KimManjin/RSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relational Self-Attention: What's Missing in Attention for Video Understanding

This repository is the official implementation of "Relational Self-Attention: What's Missing in Attention for Video Understanding" by Manjin Kim*, Heeseung Kwon*, Chunyu Wang, Suha Kwak, and Minsu Cho (*equal contribution).

RSA

Requirements

  • Python: 3.7.9
  • Pytorch: 1.6.0
  • TorchVision: 0.2.1
  • Cuda: 10.1
  • Conda environment environment.yml

To install requirements:

    conda env create -f environment.yml
    conda activate rsa

Dataset Preparation

  1. Download Something-Something v1 & v2 (SSv1 & SSv2) datasets and extract RGB frames. Download URLs: SSv1, SSv2
  2. Make txt files that define training & validation splits. Each line in txt files is formatted as [video_path] [#frames] [class_label]. Please refer to any txt files in ./data directory.

Training

To train RSANet-R50 on SSv1 or SSv2 datasets in the paper, run this command:

    # For SSv1
    ./scripts/train_Something_v1.sh <run_name> <num_frames>
    # example: ./scripts/train_Something_v1.sh RSA_R50_SSV1_16frames 16
    
    # For SSv2
    ./scripts/train_Something_v2.sh <run_name> <num_frames>
    # example: ./scripts/train_Something_v2.sh RSA_R50_SSV2_16frames 16

Evaluation

To evaluate RSANet-R50 on SSv2 dataset in the paper, run:

    # For SSv1
    ./scripts/test_Something_v1.sh <run_name> <ckpt_name> <num_frames>
    # example: ./scripts/test_Something_v1.sh RSA_R50_SSV1_16frames resnet_rgb_model_best.pth.tar 16
    
    # For SSv2
    ./scripts/test_Something_v2.sh <run_name> <ckpt_name> <num_frames>
    # example: ./scripts/test_Something_v2.sh RSA_R50_SSV2_16frames resnet_rgb_model_best.pth.tar 16

Results

Our model achieves the following performance on Something-Something-V1 and Something-Something-V2:

model dataset frames top-1 / top-5 logs checkpoints
RSANet-R50 SSV1 16 54.0 % / 81.1 % [log] [checkpoint]
RSANet-R50 SSV2 16 66.0 % / 89.9 % [log] [checkpoint]

Qualitative Results

kernel_visualization

About

Official Pytorch Implementation of Relational Self-Attention, NeurIPS 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published