lora weight #10

tanshuai0219 · 2024-04-16T02:29:33Z

When I run the code:
`
TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}
tokenizer = LlamaTokenizer.from_pretrained(llama2_dir)
text_encoder = LlamaForCausalLM.from_pretrained(llama2_dir, torch_dtype=torch.float16).to(device)
tokenizer.pad_token = '[PAD]'
text_encoder.eval()

tokenizer.model_max_length = 256

text_encoder_lora_params, _ = inject_trainable_lora_extended(
text_encoder,
r=32,
target_replace_module=TEXT_ENCODER_REPLACE_MODULES,
# loras=None, # path to lora .pt
)
`

then, I print text_encoder_lora_params, and get "[]" a null dict.

ShihaoZhaoZSH · 2024-04-16T06:23:02Z

Sorry that I cannot reproduce your issue, but I suggest that you check the versions of python packages in your environment. More importantly, you can figure out if the function _find_modules in inject_trainable_lora_extended has successfully found the linear or conv layers and then debug.

tanshuai0219 · 2024-04-16T07:07:11Z

Sorry that I cannot reproduce your issue, but I suggest that you check the versions of python packages in your environment. More importantly, you can figure out if the function _find_modules in inject_trainable_lora_extended has successfully found the linear or conv layers and then debug.

I align my environment to your provided environment.yaml but still get the same issue. Then, I change
TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}
to
TEXT_ENCODER_REPLACE_MODULES = {"LlamaSdpaAttention","LlamaDecoderLayer"}
I get valid text_encoder_lora_params and check the params via:
`
text_encoder_lora_params2 = itertools.chain(*text_encoder_lora_params)

total_parameters2 = sum(p.numel() for p in text_encoder_lora_params2)
print("Total parameters:", total_parameters2) # 79953920
`
and it print "79953920"

is it ok with your other train code?

ShihaoZhaoZSH · 2024-04-16T07:53:15Z

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok, as inserting LoRA is to find the appropriate layer where it can be inserted within the provided block.

tanshuai0219 · 2024-04-16T07:59:58Z

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok. The essence of inserting LoRA is to find the appropriate layer within the provided block where it can be inserted.

Thanks for your reply~ Following your suggestions, I degrade my transformers form 4.38.0 (as indicated in your environment.yaml) to 4.34. It finally works with "TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}"!! I think it is better to edit your environment.yaml.

Moreover, some code in my project only works with transformers>=4.38.0, so I wonder whether "LlamaSdpaAttention" is "the appropriate layer" you mentioned or {"LlamaSdpaAttention" and "LlamaDecoderLayer"} is "the appropriate layer" . "LlamaDecoderLayer" is also important layer, I assume.

tanshuai0219 · 2024-04-16T08:18:40Z

If using "LlamaSdpaAttention" works, it indicates that the issue is related to the version of the transformers library. To work with "LlamaAttention", you can try downgrading the transformers library to a lower version, such as 4.34. However, your mention of using "LlamaSdpaAttention" is also ok. The essence of inserting LoRA is to find the appropriate layer within the provided block where it can be inserted.

Thanks for your reply~ Following your suggestions, I degrade my transformers form 4.38.0 (as indicated in your environment.yaml) to 4.34. It finally works with "TEXT_ENCODER_REPLACE_MODULES = {"LlamaAttention"}"!! I think it is better to edit your environment.yaml.

Moreover, some code in my project only works with transformers>=4.38.0, so I wonder whether "LlamaSdpaAttention" is "the appropriate layer" you mentioned or {"LlamaSdpaAttention" and "LlamaDecoderLayer"} is "the appropriate layer" . "LlamaDecoderLayer" is also important layer, I assume.

Besides, transformers 4.34 cannot work with huggingface_hub.utils, whcih is necessary to run training code...

ShihaoZhaoZSH · 2024-04-16T08:22:45Z

Thank you for your reminder! We have made changes to the environment.yaml, including transformers and huggingface-hub. Additionally, you can try adding LoRA to different layers, such as both LlamaAttention and LlamaDecoderLayer. However, it is important to note that the weights we released only include LoRA added to the LlamaAttention layer. It is indeed a worthwhile exploration to investigate the effects of adding LoRA to different layers.

tanshuai0219 · 2024-04-16T08:36:43Z

Thank you for your reminder! We have made changes to the environment.yaml, including transformers and huggingface-hub. Additionally, you can try adding LoRA to different layers, such as both LlamaAttention and LlamaDecoderLayer. However, it is important to note that the weights we released only include LoRA added to the LlamaAttention layer. It is indeed a worthwhile exploration to investigate the effects of adding LoRA to different layers.

I have another try. Replacing "LlamaAttention" with "LlamaSdpaAttention" in transformers(4.38.2) works and I check the trainable parameters, it equals to "LlamaAttention" in transformers(4.34)~

ShihaoZhaoZSH · 2024-04-16T08:40:20Z

That is reasonable, as different versions of transformers may have variations in the way attention classes are defined and named. You can refer to the source code of transformers to discover these differences.

tanshuai0219 · 2024-04-16T09:08:14Z

That is reasonable, as different versions of transformers may have variations in the way attention classes are defined and named. You can refer to the source code of transformers to discover these differences.

Thanks for your reply and awesome work. I got lots of benefits from it~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lora weight #10

lora weight #10

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

lora weight #10

lora weight #10

Comments

tanshuai0219 commented Apr 16, 2024

tokenizer.model_max_length = 256

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024

ShihaoZhaoZSH commented Apr 16, 2024

tanshuai0219 commented Apr 16, 2024