-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable OFT adapter for mixed adapter models #1204
ENH: Enable OFT adapter for mixed adapter models #1204
Conversation
This PR makes it possible to use the newly added OFT adapter in mixed adapter type models, similar to LoRA, LoHa, etc. Notes Adding the integration was pretty straightforward, which is a good sign. The difficult part was actually about the tests. This stems from the fact that OFT is (if my understanding is correct) never commutative. What I mean is that even if the adapters are applied to the last layer of a model, it makes a difference whether we apply, say, first LoRA, then OFT vs first OFT, then LoRA. This is different for the other adapters that were added so far for mixed models, as they basically do: - Xa = X + dXa - Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba IIUC, this is not true for OFT, so when OFT is used, I had to ensure that no test was applied that (implicitly) assumes commutativity. Furthermore, I had to increase the model size, see this comment: huggingface#1160 (comment)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
No point in including "Config".
On CPU, apparently higher tolerance is required than on GPU.
As far as my understanding of OFT goes your assumption about commutativity are correct |
@BenjaminBossan It's correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenjaminBossan !
Thanks @lukaskuhn-lku and @okotaku for your input, and @younesbelkada for the review. |
This PR makes it possible to use the newly added OFT adapter in mixed adapter type models, similar to LoRA, LoHa, etc. Notes Adding the integration was pretty straightforward, which is a good sign. The difficult part was actually about the tests. This stems from the fact that OFT is (if my understanding is correct) never commutative. What I mean is that even if the adapters are applied to the last layer of a model, it makes a difference whether we apply, say, first LoRA, then OFT vs first OFT, then LoRA. This is different for the other adapters that were added so far for mixed models, as they basically do: - Xa = X + dXa - Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba This is not true for OFT, so when OFT is used, I had to ensure that no test was applied that (implicitly) assumes commutativity. Furthermore, I had to increase the model size, see this comment: huggingface#1160 (comment)
This PR makes it possible to use the newly added OFT adapter in mixed adapter type models, similar to LoRA, LoHa, etc.
Notes
Adding the integration was pretty straightforward, which is a good sign.
The difficult part was actually about the tests. This stems from the fact that OFT is (if my understanding is correct) never commutative. What I mean is that even if the adapters are applied to the last layer of a model, it makes a difference whether we apply, say, first LoRA, then OFT vs first OFT, then LoRA.
This is different for the other adapters that were added so far for mixed models, as they basically do:
Xa = X + dXa
Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba
IIUC, this is not true for OFT, so when OFT is used, I had to ensure that no test was applied that (implicitly) assumes commutativity. Ping @okotaku and @lukaskuhn-lku is my understanding correct?
Furthermore, I had to increase the model size, see this comment:
#1160 (comment)