Skip to content

Latest commit

 

History

History
102 lines (75 loc) · 3.37 KB

README.md

File metadata and controls

102 lines (75 loc) · 3.37 KB

Error Detection on Egocentric Procedural Task Videos

This is the official implementation of Error Detection on Egocentric Procedural Task Videos

Please cite our CVPR 204 paper if our paper/implementation is helpful for your research:

@InProceedings{Lee_2024_CVPR,
    author    = {Lee, Shih-Po and Lu, Zijia and Zhang, Zekun and Hoai, Minh and Elhamifar, Ehsan},
    title     = {Error Detection in Egocentric Procedural Task Videos},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {18655-18666}
}

Preparation

Setup the conda environment.

conda env create -f environment.yml

Run setup.py to generate the directories needed

Visit our project page to see more details of our dataset.

Please send an email with the following information to [email protected] for downloading our datasets and annotations. The shared link will be expired in two weeks.

  • Your Full Name
  • Institution/Organization
  • Advisor/Supervisor Name
  • Current Position/Title
  • Emaill Address (with institutional domain name)
  • Purpose

Here are files information in the dataset.

  • Annotations
    • annotation.json: the annotation file of 5 tasks, containing time stamps, step names, step decriptions, and action types.
    • active_object.json: the annotation file of 5 tasks, containing frame-wise object and active object bounding boxes, categories, and if objects are active.
  • Dataset
    • {task_name}_videos.zip: it contains trimmed RGB videos.
    • {task_name}_other_modalities.zip it contains other modalities such as depth, audio, gaze, hand tracking, etc.
    • training.txt, validation.txt, test.txt: the splits for training, validation, and test.
    • trim_start_end.txt: the start and end time that we trimmed from the original videos.

Preprocessing

Create a dataset folder for the task you want

mkdir data
mkdir data/EgoPER
mkdir data/EgoPER/pinwheels

Download annotation.json, active_object.json, mean.npy, and std.npy and put them under data/EgoPER

Create a video and frame folder. Extract pinwheels_videos.zip into the video folder and extract frames from the videos.

mkdir data/EgoPER/pinwheels/frames_10fps
mkdir data/EgoPER/pinwheels/trim_videos
cd preprocessing
python extract_frames.py

Generate I3D features based on the video frames with the pre-trained weight

Move the weight under I3D_extractor/src/feature_extractor/pretrained_models.

Change root_dir in features_{task_name}.sh to correct path, e.g., data/EgoPER/pinwheels and run

mkdir data/EgoPER/pinwheels/features_10fps
cd I3D_extractor
./features_pin.sh

Training

  • Modify root_dir in libs/datasets/egoper.py to the correct directory.
  • The action segmentation backbone is ActionFormer
  • The number of protoypes of each step is 2
./run_EgoPER_train.sh

Inference

  • The code will evaluation the performance of action segmentation and error detection.
./run_EgoPER_eval.sh