[Benchmark] MVTamperBench #739

amitbcp · 2025-01-21T02:33:40Z

Thanks for open-sourcing the benchmark tool to enable development of the evaluations of different Multimodal LLMs

We release MVTamperBench - https://arxiv.org/abs/2412.19794v4 | https://amitbcp.github.io/MVTamperBench/

Details -

Multimodal Large Language Models (MLLMs), also known as Large Multi-modal Models (LMMs), are recent advancement of Vision-Language Models (VLMs), that have driven major advances in video understanding, yet their vulnerability to adversarial tampering and manipulations remains underexplored. To address this gap, we introduce \textbf{MVTamperBench}, a benchmark that systematically evaluates MLLM robustness against five prevalent tampering techniques: rotation, masking, substitution, repetition, and dropping. Built from 3.4K original videos—expanded to over 17K tampered clips spanning 19 video tasks.

MVTamperBench challenges models to detect manipulations in spatial and temporal coherence. We evaluate 45 recent MLLMs from 15+ model families, revealing substantial variability in resilience across tampering types and showing that larger parameter counts do not necessarily guarantee robustness. MVTamperBench sets a new benchmark for developing tamper-resilient MLLM in safety-critical applications, including detecting clickbait, preventing harmful content distribution, and enforcing policies on media platforms. We release all code and data to foster open research in trustworthy video understanding.

…EvalKit into MVTamperBench

…cstrings for methods

amitbcp · 2025-01-23T06:28:44Z

Hey @FangXinyu-0913 @kennymckormick - can you please have a look at the PR. We would want to release the benchmark. Also would want to check on how we can share the results with you so that it can be hosted on your leader board as we have done extensive benchmark with 45 MLLMs using VLMEvalKit

FangXinyu-0913 · 2025-01-27T17:07:15Z

Hi @amitbcp, I am trying to reproduce the results using InternVL2_5-8B, but there is a significant deviation from the results in the paper

This is the version of the relevant libraries I am using. I would like to ask what version you are using so that I can reproduce the results after making further modifications

amitbcp · 2025-01-29T05:23:48Z

Hey @FangXinyu-0913
Sorry for the late reply. here is my environment versions -

I am also re-running to ensure its constant

FangXinyu-0913 · 2025-01-29T08:22:42Z

Thank you @amitbcp, and how many frames were used in the configuration when you tested the InternVL2.5 series?
I am using the default settings in video_dataset_config(i.e., 8 frames), because the differences in results are too large, and I think different frame rate settings may cause it

amitbcp · 2025-01-29T23:33:27Z

Hey @FangXinyu-0913

We did our benchmark before the Video Inference was refactored at. - aa9f50e

As I remember before the refactor also, the the default was 8 frames , please correct me if I'm wrong. Also can you share the command that you ran for the benchmarking ?

srikant86panda and others added 30 commits November 11, 2024 15:43

MVBench data prep fixes

be173fc

handle missing video file with mvbench

6aaddc5

MVTamperBench base

101a33c

merge main and add exception information in generate tsv

d5918e0

add nturgb-d information to generate tsv function

9471a46

sample dataset

d0d0dd6

updated tamperbench

04ec81f

updated md5 and dataset name for complete dataset

1e127ec

Merge branch 'open-compass:main' into main

618c003

resolve conflict and update with VLM

dd3bf5e

fix on moving perception file to correct folder

f7f7a11

Merge branch 'MVTamperBench' of https://github.com/srikant86panda/VLM…

efa3824

…EvalKit into MVTamperBench

adding correct md5 for total ~19k sample

675987e

Local changes

b513f83

Merge remote-tracking branch 'origin/MVTamperBench' into MVTamperBench

fd850ed

Merge branch 'open-compass:main' into main

b568447

Local Changes to run

297039f

resolve merge conflict

b5063ae

updated evaluation

869ed13

updated to latest change

f54ac3a

incroporated metrics and pre processing change

1454281

sample file support

ce511b7

reporting fix

0e6fb2c

fix with md5 check

4863148

updated md5 and dataset support

8b6c816

ntu data path fix

fb36aa5

dataset support added on run.py

9f0d451

ntu files removed

a6f2c2d

md5 updated

b18808f

run file updated

46dfe35

srikant86panda and others added 18 commits January 10, 2025 20:27

conflict resolved with main branch

72fc174

resolve conflict cgbench_dataset

56165ef

minor formating

e77dda7

adding explanation to check_ans

07e673d

adding explanation to check_ans_with_model

35ed77f

updated build_prompt with comments

303db6b

adding messages to check integrity method

48d744e

Add metrics code for aggregation

1016f41

Fix imports

ed293b3

Save results

a5ea583

Add base dataset name

7abee4a

function parameter

5b66f70

remove across dataset aggregation

44551f8

Fix linting

a747d97

Added torchvision to dependencies

fd07a7d

removed unused imports

758cf18

Refactor tamperbench.py: improve code readability and add detailed do…

2543607

…cstrings for methods

Updated pre-commit check

86776b3

FangXinyu-0913 self-assigned this Jan 23, 2025

[Fix] sklearn Import on dataset/utils/tamperbench

3ca8432

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] MVTamperBench #739

[Benchmark] MVTamperBench #739

amitbcp commented Jan 21, 2025

amitbcp commented Jan 23, 2025

FangXinyu-0913 commented Jan 27, 2025

amitbcp commented Jan 29, 2025

FangXinyu-0913 commented Jan 29, 2025

amitbcp commented Jan 29, 2025

[Benchmark] MVTamperBench #739

Are you sure you want to change the base?

[Benchmark] MVTamperBench #739

Conversation

amitbcp commented Jan 21, 2025

amitbcp commented Jan 23, 2025

FangXinyu-0913 commented Jan 27, 2025

amitbcp commented Jan 29, 2025

FangXinyu-0913 commented Jan 29, 2025

amitbcp commented Jan 29, 2025