Skip to content

Latest commit

 

History

History
83 lines (52 loc) · 3.64 KB

readme.md

File metadata and controls

83 lines (52 loc) · 3.64 KB

Improving Long-Text Alignment for Text-to-Image Diffusion Models

This repo is the official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)

by Luping Liu1,2, Chao Du2, Tianyu Pang2, Zehan Wang2,4, Chongxuan Li3, Dong Xu1.

1The University of Hong Kong; 2Sea AI Lab, Singapore; 3Renmin University of China; 4Zhejiang University

What does this work do?

To improve long-text alignment for T2I diffusion models, we propose LongAlign, which features a segment-level encoding method for processing long texts and a decomposed preference optimization method for effective alignment training. For decomposed preference optimization, we find the preference models can be decomposed into two components: a text-relevant part and a text-irrelevant part. We propose a reweighting strategy that assigns different weights to these two components, reducing overfitting and enhancing alignment.

CLIP-based preference decomposition

  • (a) Schematic results for text embeddings. (b) Statistics of the projection scalar $\eta$ for three CLIP-based preference models. (c) The relationship between the original preference score and the two scores after decomposition.

Generation result

  • Generation results using our LongAlign and baselines. We highlight three key facts for each prompt and provide the evaluation results at the end.

  • Generation results using different preference models, with and without our reweighting strategy.

How to run the code?

Prepare environment

pip install -r requirements.txt
# if you encounter an error with LoRA, please run `pip uninstall peft`

Prepare dataset and checkpoint

Train original Stable Diffusion

# support long-text inputs
bash run_unet.sh align ct5f
# please move {args.output_dir}/s{global_step_}_lora_vis.pt --> {args.output_dir}/lora_vis.pt and so on

# preference optimization for long-text alignment
bash run_unet.sh reward test

Train LCM-version Stable Diffusion

# support LCM sampling
bash run_unet.sh lcm ct5f

# preference optimization for long-text alignment
bash run_unet.sh reward_lcm test

References

If you find this work useful for your research, please consider citing:

@article{liu2024improving,
      title={Improving Long-Text Alignment for Text-to-Image Diffusion Models}, 
      author={Luping Liu and Chao Du and Tianyu Pang and Zehan Wang and Chongxuan Li and Dong Xu},
      year={2024},
      journal={arXiv preprint arXiv:2410.11817},
}

This code is mainly built upon diffusers and LaVi-Bridge repositories, which you might also find interesting.