SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Tongtian Yue^1,3* , Jie Cheng^2,3* , Longteng Guo^1,3* , Xingyuan Dai^2,3 , Zijia Zhao^1,3 , Xingjian He^1,3   Gang Xiong^2,3   Yisheng Lv^2,3   Jing Liu^1,3†
¹Laboratory of Cognition and Decision Intelligence for Complex Systems, CASIA
²State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA
³School of Artificial Intelligence, University of Chinese Academy of Sciences

CVPR, 2024

Requirements

Installation

Create a conda environment and install dependencies:

conda create -n sc_tune python=3.10
conda activate sc_tune
pip install -r requirements.txt

Data

Download the Qwen-VL-Chat checkpoint (10 *.bin files in total) to the path Qwen-VL-Chat/ and Object365 images.

Note

We have modified the codes in Qwen-VL-Chat/visual.py. Please replace the original file with the one in this repo if necessary.

Get Started

Configs

Set the path of Object365 images in scripts/finetune_ds.sh. Other hyperparameters can also be found in this file.

Running

sh scripts/finetune_ds.sh

Main codes

The main codes to implement sc-tune method are in transformers/trainer.py and transformers/trainer_utils.py.

Acknowledgement

This repo benefits from Qwen-VL, TRL, and MOSS. Thanks for their wonderful work.

Citation

@article{yue2024sc,
  title={SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models},
  author={Yue, Tongtian and Cheng, Jie and Guo, Longteng and Dai, Xingyuan and Zhao, Zijia and He, Xingjian and Xiong, Gang and Lv, Yisheng and Liu, Jing},
  journal={arXiv preprint arXiv:2403.13263},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Qwen-VL-Chat		Qwen-VL-Chat
data		data
dataset		dataset
scripts		scripts
transformers		transformers
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
finetune.py		finetune.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Requirements

Installation

Data

Get Started

Configs

Running

Main codes

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

License

ivattyue/SC-Tune

Folders and files

Latest commit

History

Repository files navigation

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Requirements

Installation

Data

Get Started

Configs

Running

Main codes

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages