Official implementation of "Mixture of LoRA Experts"
Xun Wu, Shaohan Huang+, Furu Wei
accepted by International Conference on Learning Representations (ICLR), 2024
LoRA has emerged as a pivotal technique for fine-tuning large pre-trained models, renowned for its efficacy across diverse tasks. Its modular design has spurred investigations into the composition of multiple trained LoRAs to enhance task performance. Nevertheless, the effective composition of these LoRAs remains a formidable challenge: (1) Linear arithmetic composition method may lead to the loss of the generative capabilities inherent in the original pre-trained model or the distinctive attributes of the trained LoRAs, resulting in suboptimal outcomes. (2) Reference tuning-based composition method exhibits limitations in terms of the necessary adaptability for effectively composing multiple LoRAs and incurs significant costs due to retraining a sizable model. In response to these challenges, we propose Mixture of LoRA Experts (MoLE). MoLE treats each layer of trained LoRAs as distinct experts and implements hierarchical weight control by integrating a learnable gating function within each layer to learn optimal composition weights tailored to a specific domain objective. MoLE not only surpasses linear arithmetic composition in terms of LoRA composition performance but also preserves the essential flexibility required for the effective composition of trained LoRAs with minimal computational overhead. Extensive experimental evaluations conducted in both Natural Language Processing (NLP) and Vision & Language (V&L) domains validate the efficacy of MoLE.
|-- dreambooth
|-- loss.py # clip loss implementation
-- dataset.py # datatset class for lora finetuning
-- finetune_with_lora.py # main code for finetuning to get the lora candidates
-- id.log # The correspondence between id and lora candidates.
-- train_mixture_of_experts.py # main code for training the MoLE
-- train_mixture_of_experts.sh # script for run the train_mixture_of_experts.py
-- run.sh # a training example
-- inference.py # inference the images with MoLE
|-- tools # tools code
|-- diffusers_mole # modified diffusers for supporting block-wise MoLE training
|-- peft # modified peft for supporting block-wise MoLE training
|-- transformers_mole # modified transformers for supporting block-wise MoLE training
conda create -n MoLE python=3.8 -y
conda activate MoLE
cd dreambooth
bash setup.sh
For the NLP task, we use LoRA candidates provided in LoRAHub, available at LoRAHub - Hugging Face. If you are using these candidates, please consider citing their paper LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition.
we provide text-to-image LoRA candidates trained on DreamBooth dataset.
-
Candidates Link: Google Drive.
-
DreamBooth Dataset Link: https://github.com/google/dreambooth
Or you can download our candidates directly from Hugging Face.
cd dreambooth
bash run.sh # An example
or you can modify the run.sh as:
bash train_mixture_of_experts.sh <YOUR_EXP_TAG> <YOUR_PROMPT> 1e-5 0 0.1 0 0.5
If our work is useful for you, please consider citing our paper:
@inproceedings{wu2023mole,
title={Mole: Mixture of lora experts},
author={Wu, Xun and Huang, Shaohan and Wei, Furu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2023}
}