Video-Language Model Robustness

Video and Language Perturbations

This work evaluted the robustness of video-language models on text-to-video retreival using a variety of video and/or text perturbations. For more information, check out our site.

Different real-world perturbations used in this study.

Text Perturbations

To generate text perturbations, code is available in generate_noisy_text.py. You can call this script from the command line, for example:

python generate_noisy_text.py msrvtt --meta_pth msvrtt_eval.csv --text_style --textflint

This will call perturbations to run for those generated by the TextStyle and TextFlint packages for the MSRVTT dataset using the csv file that has (at minimum) columns for video_id and text.

This is the same procedure for the MC VideoQA on MSRVTT in generate_noisy_mc_videoqa.py

Video Perturbations

We provide both the on-the-fly generation of perturbations in video_perturbations.py which is useful for processing pre-extracted features and generating noisy video copies in generate_noisy_videos.py.

To run generate_noisy_videos.py, an example is:

python generate_noisy_videos.py msrvtt data/msrvtt/videos data/msrvtt/noisy_videos blur

This will run generating videos for MSRVTT where the original videos are stored in data/msrvtt/videos, perturbing with blur and saving the copies in data/msrvtt/noisy_videos.

Before running this command, you need to generate a file for the MSRVTT and YouCook2 dataset with a mapping of the original video for one column and the target file for the second. This should be stored as datasets/{youcook2, msrvtt}_videolist.csv. Example:

YouCook2/validation/226/videos/xHr8X2Wpmno.mkv,robustness/youcook2/xHr8X2Wpmno.mkv
YouCook2/validation/105/videos/V53XmPeyjIU.mkv,robustness/youcook2/V53XmPeyjIU.mkv
YouCook2/validation/201/videos/mZwK0TBI1iY.mkv,robustness/youcook2/mZwK0TBI1iY.mkv
YouCook2/validation/310/videos/gEYyWqs1oL0.mp4,robustness/youcook2/gEYyWqs1oL0.mp4

Use video_perturbations.py by creating a VideoPerturbation object by initializing the perturbation and severity. This is useful when modifying video feature extractor code from fairseq and VideoFeatureExtractor.

Original Model Code

Robustness Scores

The file robustness_scores.py provides sample code on how to calculate the robustness score for perturbation combinations. This is done by collecting model retreival scores for R@5, R@10, R@25 for different perturbation scores. This particular function requires a pandas.dataframe as the results of models and their runs were collected in csv files. An example of what this file may look like is:

R@1	R@5	Median-R	Model	Dataset	Perturbation	Severity	Type	PerturbModality	Name	Train	R@1 Error	R@5 Error
0.103	0.227	41	VideoClip	MSRVTT	shuffle_order	0	Positional	Text	ShuffleOrder	zs	0	0
0.072	0.181	59	VideoClip	MSRVTT	shuffle_order	1	Positional	Text	ShuffleOrder	zs	-0.031	-0.046
0.103	0.227	41	VideoClip	MSRVTT	shot_noise	0	Noise	Video	ShotNoise	zs	0	0
0.063	0.153	63.5	VideoClip	MSRVTT	shot_noise	1	Noise	Video	ShotNoise	zs	-0.04	-0.074

Each perturbation will have a severity of 0 that represents the baseline scores for easier calculation. Any severity greater than 0 indicates a perturbation was applied.

Citation

@inproceedings{
schiappa2022robustness,
title={Robustness Analysis of Video-Language Models Against Visual and Language Perturbations},
author={Madeline Chantry Schiappa and Shruti Vyas and Hamid Palangi and Yogesh S Rawat and Vibhav Vineet},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=A79jAS4MeW9}

}

Examples

For examples, please see EXAMPLES.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video-Language Model Robustness

Video and Language Perturbations

Text Perturbations

Video Perturbations

Original Model Code

Robustness Scores

Citation

Examples

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
datasets		datasets
images		images
README.md		README.md
generate_noisy_mc_videoqa.py		generate_noisy_mc_videoqa.py
generate_noisy_text.py		generate_noisy_text.py
generate_noisy_videos.py		generate_noisy_videos.py
requirements.txt		requirements.txt
robustness_scores.py		robustness_scores.py
video_perturbations.py		video_perturbations.py

Maddy12/VideoLanguageModelRobustness

Folders and files

Latest commit

History

Repository files navigation

Video-Language Model Robustness

Video and Language Perturbations

Text Perturbations

Video Perturbations

Original Model Code

Robustness Scores

Citation

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages