This work evaluted the robustness of video-language models on text-to-video retreival using a variety of video and/or text perturbations. For more information, check out our site.
Different real-world perturbations used in this study.
To generate text perturbations, code is available in generate_noisy_text.py
.
You can call this script from the command line, for example:
python generate_noisy_text.py msrvtt --meta_pth msvrtt_eval.csv --text_style --textflint
This will call perturbations to run for those generated by the TextStyle and TextFlint packages for the MSRVTT dataset using the csv file that has (at minimum) columns for video_id and text.
This is the same procedure for the MC VideoQA on MSRVTT in generate_noisy_mc_videoqa.py
We provide both the on-the-fly generation of perturbations in video_perturbations.py
which is useful for processing
pre-extracted features and generating noisy video copies in generate_noisy_videos.py
.
To run generate_noisy_videos.py
, an example is:
python generate_noisy_videos.py msrvtt data/msrvtt/videos data/msrvtt/noisy_videos blur
This will run generating videos for MSRVTT where the original videos are stored in data/msrvtt/videos
, perturbing with
blur and saving the copies in data/msrvtt/noisy_videos
.
Before running this command, you need to generate a file for the MSRVTT and YouCook2 dataset with a mapping of
the original video for one column and the target file for the second. This should be stored
as datasets/{youcook2, msrvtt}_videolist.csv
. Example:
YouCook2/validation/226/videos/xHr8X2Wpmno.mkv,robustness/youcook2/xHr8X2Wpmno.mkv
YouCook2/validation/105/videos/V53XmPeyjIU.mkv,robustness/youcook2/V53XmPeyjIU.mkv
YouCook2/validation/201/videos/mZwK0TBI1iY.mkv,robustness/youcook2/mZwK0TBI1iY.mkv
YouCook2/validation/310/videos/gEYyWqs1oL0.mp4,robustness/youcook2/gEYyWqs1oL0.mp4
Use video_perturbations.py
by creating a VideoPerturbation
object by initializing the perturbation and severity.
This is useful when modifying video feature extractor code from
fairseq
and VideoFeatureExtractor.
The file robustness_scores.py
provides sample code on how to calculate the robustness score for perturbation combinations. This is done by collecting model retreival scores for R@5, R@10, R@25 for different perturbation scores. This particular function requires a pandas.dataframe
as the results of models and their runs were collected in csv
files. An example of what this file may look like is:
R@1 | R@5 | Median-R | Model | Dataset | Perturbation | Severity | Type | PerturbModality | Name | Train | R@1 Error | R@5 Error |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.103 | 0.227 | 41 | VideoClip | MSRVTT | shuffle_order | 0 | Positional | Text | ShuffleOrder | zs | 0 | 0 |
0.072 | 0.181 | 59 | VideoClip | MSRVTT | shuffle_order | 1 | Positional | Text | ShuffleOrder | zs | -0.031 | -0.046 |
0.103 | 0.227 | 41 | VideoClip | MSRVTT | shot_noise | 0 | Noise | Video | ShotNoise | zs | 0 | 0 |
0.063 | 0.153 | 63.5 | VideoClip | MSRVTT | shot_noise | 1 | Noise | Video | ShotNoise | zs | -0.04 | -0.074 |
Each perturbation will have a severity of 0 that represents the baseline scores for easier calculation. Any severity greater than 0 indicates a perturbation was applied.
@inproceedings{
schiappa2022robustness,
title={Robustness Analysis of Video-Language Models Against Visual and Language Perturbations},
author={Madeline Chantry Schiappa and Shruti Vyas and Hamid Palangi and Yogesh S Rawat and Vibhav Vineet},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=A79jAS4MeW9}
}
For examples, please see EXAMPLES.md
.