Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Add safe_merge option in merge #1001

Merged
merged 19 commits into from
Oct 9, 2023

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Oct 6, 2023

What does this PR do?

Analogous PR to huggingface/diffusers#5316

Some users that use diffusion models can face strange issues when merging the adapter weights inside the base model. This PR is the PEFT equivalent of @patrickvonplaten 's PR on diffusers

I would advocate to default safe_merge to False as this PR adds an overhead. in fact, it is preferable to first copy the merged tensor, check if there is any nan value there and raise a proper ValueError in case there is a nan. Otherwise the potential nan would be already propagated to the merged weights before raising the error

Also this PR is perfectly backward compatible as it preserves all the previous behaviour

Added also nice tests

cc @pacman100 @BenjaminBossan @sayakpaul @patrickvonplaten FYI --> on huggingface/diffusers#5151 we would just pass safe_merge=safe_merge in module.merge

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 6, 2023

The documentation is not available anymore as the PR was closed or merged.

@BenjaminBossan
Copy link
Member

This looks like a useful feature to have, thanks for the addition.

For my understanding, the NaN could also be caused by the addition of the delta weights, even if those don't contain any NaN themselves, and that's why you perform the check on the merged weights, not on the delta weights, right? The main reason I'm asking is because if we only do the check on the delta weights, we wouldn't need to create a copy of the original weights.

Btw. IIRC the test that was failing was not the flaky one, so there might be some fixing needed. Could be caused by the changed line if active_adapter not in self._active_adapter:.

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Oct 6, 2023

Indeed @BenjaminBossan I believe the overflow (nan) is purely caused by the sum between the adapter weights and the base model, I also think this usually happens under float16 regime.
It could be potentially caused by nan being in the delat weights but it is also likely that the sum causes the overflow afterwards, therefore I think it is safer to perform the check this way. Let me know what do you think
Regarding your second point that's correct as well !

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some small comments, the rest looks good, thanks for adding this useful check.

src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
tests/testing_common.py Outdated Show resolved Hide resolved
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. I found two more type annotations that need fixing which I missed the first time around, otherwise LGTM.

src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
@younesbelkada
Copy link
Contributor Author

Thanks very much for all the reviews @BenjaminBossan !

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this now looks ready to me (once CI is green).

We may want to add the option for safe merging to the other adapters too. Maybe we can create an issue so that it's not forgotten?

@younesbelkada
Copy link
Contributor Author

I think that we should add it in this PR to make things consistent, let me work on that!

@patrickvonplaten
Copy link
Contributor

Cool!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding the safe merging feature to the other methods. It is still missing for LoRA bnb layers, which support merging. It would be fine with me if they are added in a separate PR though.

I noticed some more things only now, sorry for not noticing earlier:

  1. The error message

NaNs detected in the merged weights. The Lora adapter {active_adapter} seems to be broken and should be removed

is a bit confusing IMO. The issue I see is with suggesting to remove it, because (if I'm not mistaken) it is totally possible for the adapter layer to be working in forward when applied separately from the original weights, and only encountering NaNs after merging, since the mathematical operation is not identical. E.g. two weight parameters could be overflowing when added, but when they are both first multiplied by the activation and only then added, they might not overflow anymore.

Therefore, I wouldn't ask to remove the adapter, as it may work. Instead, I would change the message to just say that this adapter cannot be merged safely, without any further suggestion. WDYT?

  1. torch.isnan check

The next issue I only noticed just now is that I think we should not check with torch.isnan because it doesn't include torch.inf. Instead, torch.isfinite(x).all() should work for both torch.inf and torch.nan. WDYT? If you agree, maybe the test could be extended to also include module.data[0] = torch.inf.

lora bnb layers

use torch.isfinite(x).all() instead

@@ -287,7 +287,7 @@ def _prepare_adapter_config(self, peft_config, model_config):
]
return peft_config

def merge_and_unload(self):
def merge_and_unload(self, safe_merge: bool = False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend the docstring.

@younesbelkada
Copy link
Contributor Author

@BenjaminBossan all the proposed suggestions sound great to me ! Will work on that

@younesbelkada
Copy link
Contributor Author

I have adapted the changes accordingly and added a test case with inf, let me know what do you think!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for addressing the remaining issues. From my point of view, it can be merged once CI is green.

@younesbelkada younesbelkada merged commit c2c544d into huggingface:main Oct 9, 2023
11 checks passed
@younesbelkada younesbelkada deleted the safe-merge branch October 9, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants