Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

🏠 LMMs-Lab Homepage | discord/lmms-eval | 🎓 Project Page |📝 Arxiv Paper | 🤗 Dataset

Annoucement

[2025-2] 🎉🎉 We update the leaderboard for Qwen-2.5-VL-72B and mPLUG-Owl3-7B.
[2025-1] 🎉🎉 We introduce VideoMMMU, a massive, multi-modal, multi-disciplinary video benchmark that evaluates the knowledge acquisition capability from educational videos.

License

VideoMMMU is only used for academic research. Commercial use in any form is prohibited. The copyright of all videos belongs to the video owners. Without prior approval, you cannot distribute, publish, copy, disseminate, or modify VideoMMMU in whole or in part. You must strictly comply with the above restrictions. For further inquiries, please send an email to [email protected].

Evaluation

The evaluation of VideoMMMU is integrated into LMMs-Eval. Below is a detailed instruction of the evaluation.

Installation

For formal usage, you can install the package from PyPI by running the following command:

pip install lmms-eval

For development, you can install the package by cloning the repository and running the following command:

git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval
pip install -e .

If you want to test LLaVA, you will have to clone their repo from LLaVA and

git clone https://github.com/LLaVA-VL/LLaVA-NeXT
cd LLaVA-NeXT
pip install -e .

Evaluation

We use LLaVA-OneVision-7B as an example in the following commands. You can change --model, and --model_args based on your requirement.

Evaluation of LLaVA-OneVision on VideoMMMU (all 3 tracks)

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32,torch_dype=bfloat16 \
    --tasks video_mmmu \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Evaluate a single track of VideoMMMU

Perception track:

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32,torch_dype=bfloat16 \
    --tasks video_mmmu_perception \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Comprehension track:

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32,torch_dype=bfloat16 \
    --tasks video_mmmu_comprehension \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Adaptation track:

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32,torch_dype=bfloat16 \
    --tasks video_mmmu_adaptation \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Evaluate the question_only track of VideoMMMU -- Knowledge Acquisition Experiment (∆knowledge)

The "question_only" track consists of a 2-second video that contains only the image associated with the Adaptation Track question.

To evaluate this setting, you can use the following command:

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=1,torch_dype=bfloat16 \
    --tasks video_mmmu_adaptation_question_only \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Adaptation Track setting

To ensure compatibility with LMMs-Eval, the image associated with the Adaptation Track question has been appended in the last frame of the video. A prompt has been added to notify the model that the question image is located in the final frame of the video for the Adaptation Track.

If you use the interleave setting, you can manually insert the image (either the last frame of the video or "image 1" from the HF dataset) into the placeholder <image 1>.

Video-MMMU Leaderboard

We evaluate various open-source and proprietary LMMs. The table below provides a detailed comparison. To submit your model results, please send an email to [email protected].

Model	Overall	Perception	Comprehension	Adaptation	Δknowledge
Human Expert	74.44	84.33	78.67	60.33	+33.1
Claude-3.5-Sonnet	65.78	72.00	69.67	55.67	+11.4
GPT-4o	61.22	66.00	62.00	55.67	+15.6
Qwen-2.5-VL-72B	60.22	69.33	61.00	50.33	+9.7
Gemini 1.5 Pro	53.89	59.00	53.33	49.33	+8.7
Aria	50.78	65.67	46.67	40.00	+3.2
Gemini 1.5 Flash	49.78	57.33	49.00	43.00	-3.3
LLaVA-Video-72B	49.67	59.67	46.00	43.33	+7.1
LLaVA-OneVision-72B	48.33	59.67	42.33	43.00	+6.6
Qwen-2.5-VL-7B	47.44	58.33	44.33	39.67	+2.2
mPLUG-Owl3-7B	42.00	49.33	38.67	38.00	+7.5
MAmmoTH-VL-8B	41.78	51.67	40.00	33.67	+1.5
InternVL2-8B	37.44	47.33	33.33	31.67	-8.5
LLaVA-Video-7B	36.11	41.67	33.33	33.33	-5.3
VILA1.5-40B	34.00	38.67	30.67	32.67	+9.4
LLaVA-OneVision-7B	33.89	40.00	31.00	30.67	-5.6
Llama-3.2-11B	30.00	35.67	32.33	22.00	-
LongVA-7B	23.98	24.00	24.33	23.67	-7.0
VILA1.5-8B	20.89	20.33	17.33	25.00	+5.9

Citation

@article{hu2025videommmu,
    title={Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos},
    author={Kairui Hu and Penghao Wu and Fanyi Pu and Wang Xiao and Yuanhan Zhang and Xiang Yue and Bo Li and Ziwei Liu},
    booktitle={arXiv preprint arXiv:2501.13826},
    year={2025},
    url={https://arxiv.org/abs/2501.13826}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
data		data
docs		docs
experiment_on_delta		experiment_on_delta
lmms_eval		lmms_eval
miscs		miscs
tools		tools
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Annoucement

License

Evaluation

Installation

Evaluation

Video-MMMU Leaderboard

Citation

About

Releases

Packages

Contributors 2

Languages

License

EvolvingLMMs-Lab/VideoMMMU

Folders and files

Latest commit

History

Repository files navigation

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Annoucement

License

Evaluation

Installation

Evaluation

Video-MMMU Leaderboard

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages