Skip to content

Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration

License

Notifications You must be signed in to change notification settings

xuyang-liu16/GlobalCom2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔎 Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration 🚀

Xuyang Liu1, Ziming Wang2, Yuhang Han3, Yingyao Wang2, Jiale Yuan2, Jun Song2✉, Bo Zheng2,
Linfeng Zhang4, Siteng Huang5, Honggang Chen1✉

1Sichuan University, 2Taobao & Tmall Group of Alibaba,
3Northeast Forestry University, 4Shanghai Jiaotong University, 5Zhejiang University

🔥 News

  • 2025.01.10 🤗🤗 We release our latest work GlobalCom2, a "global-to-local" approach for training-free acceleration of high-resolution MLLMs. Code is available!
  • 2024.11.17 🤗🤗 We release our work FiCoCo which proposes a unified paradigm to demystify the popular works and guide the future designs of training-free token reduction for MLLMs. Code is available!

✨ Overview

TLDR: We present GlobalCom2, a novel token compression method for high-resolution MLLMs that uses thumbnail tokens to guide crop compression. Evaluations on 10 benchmarks show that GlobalCom2 achieves superior efficiency-performance trade-off with LLaVA-NeXT models.

💥 Core Codes

The two key functions in llava/model/llava_arch.py implement our global-guided local compression: (a) generate_scale_for_crop_features for allocating optimal retention ratios based on each crop's global importance, and (b) interpolate_and_split_cls_attn_scores for performing token compression with importance from the global perspective.

🛠 Preparation

  1. Clone this repository.
git clone https://github.com/xuyang-liu16/GlobalCom2.git
cd GlobalCom2
  1. Environment Setup and Preparation
 conda create -n GlobalCom2 python=3.10 -y
 conda activate GlobalCom2
 pip install -e .
  1. Download Multimodal Benchmark

Please follow the detailed instruction in LLaVA-Evaluation.

  1. Download LLaVA-NeXT-7B and LLaVA-NeXT-13B and put them under ./liuhaotian/llava-next-7b and ./liuhaotian/llava-next-13b.

🚀 Evaluation

👉 The only hyper-parameter is retention ratio in line 101 of llava/model/llava_arch.py. You can achieve different acceleration effects by setting different retention ratio values (default retention ratio = 0.25).

Example for evaluating TextVQA results:

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh

Example for evaluating MME results:

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh

To calculate the theoretical computational efficiency shown above, we recommend the methodology presented in the work of LLM-Viewer. We deeply appreciate their outstanding contribution to this field.

🩻 Visualization

Across all presented cases, our GlobalCom2 demonstrates adaptive redundancy removal across local crops while effectively preserving regions that are significant both locally and globally, enabling LLMs to capture essential information through the retained visual signals.

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{Liu2025:GlobalCom,
    title={Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration}, 
    author={Xuyang Liu and Ziming Wang and Yuhang Han and Yingyao Wang and Jiale Yuan and Jun Song and Bo Zheng and Linfeng Zhang and Siteng Huang and Honggang Chen},
    year={2025},
    eprint={2501.05179},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

💻 Related Works

  • Awesome Token Reduction for Model Compression: An open-source repository that curates a collection of recent awesome papers on token reduction for model compression.
  • FiCoCo: A systematic study that proposes a unified "filter-correlate-compress" paradigm for training-free token reduction in MLLMs, achieving up to 82.4% FLOPs reduction while maintaining model performance.

👍 Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA and LLM-Viewer.

📩 Contact

For any question about our paper or code, please email [email protected].

About

Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published