-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Add safe_merge
option in merge
#1001
Conversation
The documentation is not available anymore as the PR was closed or merged. |
This looks like a useful feature to have, thanks for the addition. For my understanding, the NaN could also be caused by the addition of the delta weights, even if those don't contain any NaN themselves, and that's why you perform the check on the merged weights, not on the delta weights, right? The main reason I'm asking is because if we only do the check on the delta weights, we wouldn't need to create a copy of the original weights. Btw. IIRC the test that was failing was not the flaky one, so there might be some fixing needed. Could be caused by the changed line |
Indeed @BenjaminBossan I believe the overflow ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some small comments, the rest looks good, thanks for adding this useful check.
Co-authored-by: Benjamin Bossan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. I found two more type annotations that need fixing which I missed the first time around, otherwise LGTM.
Co-authored-by: Benjamin Bossan <[email protected]>
Co-authored-by: Benjamin Bossan <[email protected]>
Thanks very much for all the reviews @BenjaminBossan ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this now looks ready to me (once CI is green).
We may want to add the option for safe merging to the other adapters too. Maybe we can create an issue so that it's not forgotten?
I think that we should add it in this PR to make things consistent, let me work on that! |
Cool! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding the safe merging feature to the other methods. It is still missing for LoRA bnb layers, which support merging. It would be fine with me if they are added in a separate PR though.
I noticed some more things only now, sorry for not noticing earlier:
- The error message
NaNs detected in the merged weights. The Lora adapter {active_adapter} seems to be broken and should be removed
is a bit confusing IMO. The issue I see is with suggesting to remove it, because (if I'm not mistaken) it is totally possible for the adapter layer to be working in forward
when applied separately from the original weights, and only encountering NaNs after merging, since the mathematical operation is not identical. E.g. two weight parameters could be overflowing when added, but when they are both first multiplied by the activation and only then added, they might not overflow anymore.
Therefore, I wouldn't ask to remove the adapter, as it may work. Instead, I would change the message to just say that this adapter cannot be merged safely, without any further suggestion. WDYT?
torch.isnan
check
The next issue I only noticed just now is that I think we should not check with torch.isnan
because it doesn't include torch.inf
. Instead, torch.isfinite(x).all()
should work for both torch.inf
and torch.nan
. WDYT? If you agree, maybe the test could be extended to also include module.data[0] = torch.inf
.
lora bnb layers
use torch.isfinite(x).all() instead
@@ -287,7 +287,7 @@ def _prepare_adapter_config(self, peft_config, model_config): | |||
] | |||
return peft_config | |||
|
|||
def merge_and_unload(self): | |||
def merge_and_unload(self, safe_merge: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend the docstring.
@BenjaminBossan all the proposed suggestions sound great to me ! Will work on that |
I have adapted the changes accordingly and added a test case with inf, let me know what do you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for addressing the remaining issues. From my point of view, it can be merged once CI is green.
What does this PR do?
Analogous PR to huggingface/diffusers#5316
Some users that use diffusion models can face strange issues when merging the adapter weights inside the base model. This PR is the PEFT equivalent of @patrickvonplaten 's PR on diffusers
I would advocate to default
safe_merge
toFalse
as this PR adds an overhead. in fact, it is preferable to first copy the merged tensor, check if there is anynan
value there and raise a properValueError
in case there is anan
. Otherwise the potentialnan
would be already propagated to the merged weights before raising the errorAlso this PR is perfectly backward compatible as it preserves all the previous behaviour
Added also nice tests
cc @pacman100 @BenjaminBossan @sayakpaul @patrickvonplaten FYI --> on huggingface/diffusers#5151 we would just pass
safe_merge=safe_merge
inmodule.merge