Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lora weight #10

Open
tanshuai0219 opened this issue Apr 16, 2024 · 9 comments
Open

lora weight #10

tanshuai0219 opened this issue Apr 16, 2024 · 9 comments

Comments

@tanshuai0219
Copy link

When I run the code:
`
TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}
tokenizer = LlamaTokenizer.from_pretrained(llama2_dir)
text_encoder = LlamaForCausalLM.from_pretrained(llama2_dir, torch_dtype=torch.float16).to(device)
tokenizer.pad_token = '[PAD]'
text_encoder.eval()

tokenizer.model_max_length = 256

text_encoder_lora_params, _ = inject_trainable_lora_extended(
text_encoder,
r=32,
target_replace_module=TEXT_ENCODER_REPLACE_MODULES,
# loras=None, # path to lora .pt
)
`

then, I print text_encoder_lora_params, and get "[]" a null dict.

@ShihaoZhaoZSH
Copy link
Owner

Sorry that I cannot reproduce your issue, but I suggest that you check the versions of python packages in your environment. More importantly, you can figure out if the function _find_modules in inject_trainable_lora_extended has successfully found the linear or conv layers and then debug.

@tanshuai0219
Copy link
Author

Sorry that I cannot reproduce your issue, but I suggest that you check the versions of python packages in your environment. More importantly, you can figure out if the function _find_modules in inject_trainable_lora_extended has successfully found the linear or conv layers and then debug.

I align my environment to your provided environment.yaml but still get the same issue. Then, I change
TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}
to
TEXT_ENCODER_REPLACE_MODULES = {"LlamaSdpaAttention","LlamaDecoderLayer"}
I get valid text_encoder_lora_params and check the params via:
`
text_encoder_lora_params2 = itertools.chain(*text_encoder_lora_params)

total_parameters2 = sum(p.numel() for p in text_encoder_lora_params2)
print("Total parameters:", total_parameters2) # 79953920
`
and it print "79953920"

is it ok with your other train code?

@ShihaoZhaoZSH
Copy link
Owner

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok, as inserting LoRA is to find the appropriate layer where it can be inserted within the provided block.

@tanshuai0219
Copy link
Author

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok. The essence of inserting LoRA is to find the appropriate layer within the provided block where it can be inserted.

Thanks for your reply~ Following your suggestions, I degrade my transformers form 4.38.0 (as indicated in your environment.yaml) to 4.34. It finally works with "TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}"!! I think it is better to edit your environment.yaml.

Moreover, some code in my project only works with transformers>=4.38.0, so I wonder whether "LlamaSdpaAttention" is "the appropriate layer" you mentioned or {"LlamaSdpaAttention" and "LlamaDecoderLayer"} is "the appropriate layer" . "LlamaDecoderLayer" is also important layer, I assume.

@tanshuai0219
Copy link
Author

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok. The essence of inserting LoRA is to find the appropriate layer within the provided block where it can be inserted.

Thanks for your reply~ Following your suggestions, I degrade my transformers form 4.38.0 (as indicated in your environment.yaml) to 4.34. It finally works with "TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}"!! I think it is better to edit your environment.yaml.

Moreover, some code in my project only works with transformers>=4.38.0, so I wonder whether "LlamaSdpaAttention" is "the appropriate layer" you mentioned or {"LlamaSdpaAttention" and "LlamaDecoderLayer"} is "the appropriate layer" . "LlamaDecoderLayer" is also important layer, I assume.

Besides, transformers 4.34 cannot work with huggingface_hub.utils, whcih is necessary to run training code...

@ShihaoZhaoZSH
Copy link
Owner

Thank you for your reminder! We have made changes to the environment.yaml, including transformers and huggingface-hub. Additionally, you can try adding LoRA to different layers, such as both LlamaAttention and LlamaDecoderLayer. However, it is important to note that the weights we released only include LoRA added to the LlamaAttention layer. It is indeed a worthwhile exploration to investigate the effects of adding LoRA to different layers.

@tanshuai0219
Copy link
Author

Thank you for your reminder! We have made changes to the environment.yaml, including transformers and huggingface-hub. Additionally, you can try adding LoRA to different layers, such as both LlamaAttention and LlamaDecoderLayer. However, it is important to note that the weights we released only include LoRA added to the LlamaAttention layer. It is indeed a worthwhile exploration to investigate the effects of adding LoRA to different layers.

I have another try. Replacing "LlamaAttention" with "LlamaSdpaAttention" in transformers(4.38.2) works and I check the trainable parameters, it equals to "LlamaAttention" in transformers(4.34)~

@ShihaoZhaoZSH
Copy link
Owner

That is reasonable, as different versions of transformers may have variations in the way attention classes are defined and named. You can refer to the source code of transformers to discover these differences.

@tanshuai0219
Copy link
Author

That is reasonable, as different versions of transformers may have variations in the way attention classes are defined and named. You can refer to the source code of transformers to discover these differences.

Thanks for your reply and awesome work. I got lots of benefits from it~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants