-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why loading a lora weights so low? #8953
Comments
Hmm, you seem right. I observed similar 3-5 times differences in speed between |
cc @sayakpaul |
Cc: @BenjaminBossan |
Could you please provide a full code example so that I can try to reproduce these timings? Also, what version of PEFT do you have installed? |
@BenjaminBossan The code are plain code to load model and run inference, and I load LCM lora weight as a test as the same as using other loras. Code snippets below like:
diffusers==0.25.1 peft==0.12.0 |
Thanks for providing more context. Since you use a private adapter, I changed the code a little bit to use one from the Hub: import time
import torch
from diffusers import StableDiffusionXLPipeline
try:
import peft
print("peft is installed")
print(peft.__version__)
print(peft.__path__)
except ImportError:
print("peft is not installed")
device = 0
model_path = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = StableDiffusionXLPipeline.from_pretrained(
model_path, torch_dtype=torch.float16, variant="fp16", use_safetensors=True,
).to(device)
t1 = time.time()
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", weight_name="pytorch_lora_weights.safetensors")
print(f"load lcm lora weights time: {time.time()- t1:.2f}s") With this, I can confirm these numbers (averaged over 5 runs):
I investigated further why PEFT is slower and what I found is that with PEFT, when the LoRA layer is created (before the actual checkpoint is loaded), PEFT will instantiate actual LoRA weights. Without PEFT, empty weights are created (meta device). Next I tried to turn off using empty weights to check if that really makes the difference. However, when passing Long story short, after forcing this, I get very similar load times of ~1.7s. So in a sense this is a regression, as using the PEFT backend in diffusers is not loading empty weights. I'm not quite sure if this is an oversight, a bug, or intended for some reason. @sayakpaul do you think this could be added for PEFT LoRA loading? |
We had brought it up in the past a couple of times but as you can see the difference is not really that significant so, I am not sure about the value add here given the number of lines of code to be added to PEFT. |
I also don’t understand what was incorrectly passed down for |
Well, it's up to you and the other diffusers maintainers if you think this is worth fixing or not. On the PEFT side, I think it could be a feature to allow passing some argument to
The value is not passed on in this line: https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/loaders.py#L3234 It is set correctly in the |
Ah you are right. But we have moved from this already. So, I think we can ignore it for now. But on a second thought, I think faster loading support would indeed be nice especially for larger adapters. Please let us know when this has been shipper in PEFT and we can do the necessary changes in diffusers. |
I did some quick and dirty experiments and created a draft PR based on those. There is indeed a speed up for Llama3 8B in this test, although loading the base model is still the overall bottleneck (even with hot cache and M.2 SSD). Overall, I thus think the impact of this feature will not be huge, especially since this a one time cost, but implementing it is not trivial. Therefore, I'll leave this in a draft PR stage for the time being while working on higher priority items. |
This is what I had expected too. Thanks for quickly checking! |
Just to be clear, the test that I mentioned is in a draft PR, so it is not available on PEFT yet. And even if you installed PEFT from my branch, it would still not work because diffusers would need to make some changes too to opt into that feature. I will get back to that PR when I have a bit of extra time.
These were based on another adapter ( |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
not stale, huggingface/peft#1961 is being worked on |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
With #9510 and the latest versions of |
I used diffusers to load lora weights but it much slow to finish.
diffusers version: 0.29.2
I test another version of diffusers 0.23.0 without peft installation, and the time is decent.
And If I use low version of diffusers, much of code need to be modified which cost much work.
Anyone who can help me will be appreciate.
The text was updated successfully, but these errors were encountered: