Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable OFT adapter for mixed adapter models #1204

Merged

Conversation

BenjaminBossan
Copy link
Member

This PR makes it possible to use the newly added OFT adapter in mixed adapter type models, similar to LoRA, LoHa, etc.

Notes

Adding the integration was pretty straightforward, which is a good sign.

The difficult part was actually about the tests. This stems from the fact that OFT is (if my understanding is correct) never commutative. What I mean is that even if the adapters are applied to the last layer of a model, it makes a difference whether we apply, say, first LoRA, then OFT vs first OFT, then LoRA.

This is different for the other adapters that were added so far for mixed models, as they basically do:

  • Xa = X + dXa
  • Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba

IIUC, this is not true for OFT, so when OFT is used, I had to ensure that no test was applied that (implicitly) assumes commutativity. Ping @okotaku and @lukaskuhn-lku is my understanding correct?

Furthermore, I had to increase the model size, see this comment:

#1160 (comment)

This PR makes it possible to use the newly added OFT adapter in mixed
adapter type models, similar to LoRA, LoHa, etc.

Notes

Adding the integration was pretty straightforward, which is a good sign.

The difficult part was actually about the tests. This stems from the
fact that OFT is (if my understanding is correct) never commutative.
What I mean is that even if the adapters are applied to the last layer
of a model, it makes a difference whether we apply, say, first LoRA,
then OFT vs first OFT, then LoRA.

This is different for the other adapters that were added so far for
mixed models, as they basically do:

- Xa = X + dXa
- Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba

IIUC, this is not true for OFT, so when OFT is used, I had to ensure
that no test was applied that (implicitly) assumes commutativity.

Furthermore, I had to increase the model size, see this comment:

huggingface#1160 (comment)
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

No point in including "Config".
On CPU, apparently higher tolerance is required than on GPU.
@lukaskuhn-lku
Copy link
Contributor

As far as my understanding of OFT goes your assumption about commutativity are correct

@okotaku
Copy link
Contributor

okotaku commented Dec 2, 2023

@BenjaminBossan It's correct.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenjaminBossan !

@BenjaminBossan
Copy link
Member Author

Thanks @lukaskuhn-lku and @okotaku for your input, and @younesbelkada for the review.

@BenjaminBossan BenjaminBossan merged commit e05b267 into huggingface:main Dec 4, 2023
14 checks passed
@BenjaminBossan BenjaminBossan deleted the enh-add-oft-to-mixed-model branch December 4, 2023 11:18
TaoSunVoyage pushed a commit to TaoSunVoyage/peft that referenced this pull request Dec 14, 2023
This PR makes it possible to use the newly added OFT adapter in mixed
adapter type models, similar to LoRA, LoHa, etc.

Notes

Adding the integration was pretty straightforward, which is a good sign.

The difficult part was actually about the tests. This stems from the
fact that OFT is (if my understanding is correct) never commutative.
What I mean is that even if the adapters are applied to the last layer
of a model, it makes a difference whether we apply, say, first LoRA,
then OFT vs first OFT, then LoRA.

This is different for the other adapters that were added so far for
mixed models, as they basically do:

- Xa = X + dXa
- Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba

This is not true for OFT, so when OFT is used, I had to ensure
that no test was applied that (implicitly) assumes commutativity.

Furthermore, I had to increase the model size, see this comment:

huggingface#1160 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants