Skip to content

Latest commit

 

History

History
187 lines (134 loc) · 13.3 KB

README.md

File metadata and controls

187 lines (134 loc) · 13.3 KB
  • Teacher training

  1. Create the environment using the requirement.txt file in the teacher folder.

  2. Download the content of the data folder from the google drive link

        data
      	└───egtea_action_seq/
      	└───epic55_action_seq/
      	└───recipe1M_layers/
      	└───processed_data_dict.pt
      	└───vocab.txt
    
  3. Download the content of out folder from the google drive link. This also includes the different LMs pretrined (MLM) on 1M Recipe dataset.

        out
      	└───albert_pretrained/checkpoint-200000/
      	└───bert_pretrained/checkpoint-200000/
      	└───distilbert_pretrained/checkpoint-200000/
      	└───electra_pretrained/checkpoint-200000/
      	└───roberta_pretrained/checkpoint-200000/
      	└─...
      	.
    
  4. LM checkpoints pre-trained on action sequences derived from 1M recipe are in their respective folders

  5. Code for pretraining LM on 1M-Recipe is at code/{model_name}_pretraining.py

  6. To finetune model on EGTEA-GAZE+ dataset

      python ./code/egtea_finetuning.py \
      -model_type bert \ # bert/roberta/distillbert/alberta/deberta/electra
      -batch_size 16 \   # batch-size
      -num_epochs 5 \    # no. of epochs
      -max_len 512 \ # Max # of tokens in the input, tokens beyond this number will be truncated
      -checkpoint_path ./out/bert_pretrained/checkpoint-200000 \ # path to the model checkpoint (for initialization) that was trained on 1M-Recipe through MLM
      -weigh_classes True/ #for imbalanced data, if True, then the loss will be weighted Cross-Entropy; will not work with EPIC-55 as few clases have 0 data instances
      -hist_len 15 \ # context length, i.e. how many actions in the past conditioned on which you want to predict the action after the anticipation time
      -gappy_hist True \ # EGTEA data (action sequence) have gaps as action segments in a video are partiioned into train/test set
      -multi_task True \ # Instead of just predicting the action, also predict the verb and noun
      -sort_seg True \ # Action segnment in the training batch should be sorted by their temporal order 
    
  7. To finetune model on EPIC-55 dataset

      python ./code/epic55_finetuning.py \
          -model_type distillbert \
          -batch_size  16 \
          -num_epochs 8 \
          -max_len 512 \
          -hist_len 5 \
          -checkpoint_path ./out/distilbert_pretrained/checkpoint-200000 \
          -weigh_classes False \
          -multi_task True
    
  8. For the Egtea and EPIC55 dataset, the arguments in the above snippet are the model hyperparameter used to perform the teacher training and reporting the performance.

  9. Sample slurm script can be found in code/slurm scripts

  10. Model (teacher) predictions for the test data can be found at the google drive link.

  11. The predictions are saved as list of dictionary, where each element of the list has the following keys <UID (unique segment ID), action_logit, LM_feature>along with other segment (UID) associated such as actionID, action history, etc.

  12. These teacher predictions are then used to train student Anticipative Video Transformer, through knowledge distillation.

  13. Reproducing teacher metrics reported in the paper: Download the model prediction folder teacher/teacher_student_Predictions/ from link. Calculate the teacher performance metric reported in the paper by running the notebook code/logit_analysis.ipynb

  • Student training

  1. Student training repo is in student. Our student training code is adopted from the AVT code base (link). As such the DATA and other AVT model checkpoints should be first downloaded as explained in their documentation.

  2. Setup the avt.yml python environment.

  3. Since most of the codes are same as AVT so their documentation can be referred to for reference purposes, and we explain and describe where in the codebase these different additions we made.

  4. For all the experiments, we use exactly the same hyperparameters that the AVT authors used to train their model.

  5. Helper codes

  6. Config description

  7. Running Experiments - EK55 - https://github.com/sayontang/Action_Anticipation/blob/main/student/AVT-main/avt_ek55_ensemble_test.job

    - EGTEA - [https://github.com/sayontang/Action_Anticipation/blob/main/student/AVT-main/avt_base_feat_egtea_test.job](https://github.com/sayontang/Action_Anticipation/blob/main/student/AVT-main/avt_base_feat_egtea_test.job)
    
  8. Reading teacher prediction and setting the path in student distillation training

  9. Dataloader - updates explained

  10. Knowledge distillation

  1. Reproducing student metrics reported in the paper: Download the model prediction folder teacher/teacher_student_Predictions/ from link. Calculate the teacher performance metric reported in the paper by running the notebook code/logit_analysis.ipynb