-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104
LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104
Conversation
Why? this commit adds a generic finetuning pipeline with LoRA and DoRA support
Thanks @tanmayshishodia. These are very welcome contributions. Since the CL is pretty big it will take us some time to review and merge:
|
|
Sure.
Feel free to ask me any other questions you may have. |
SGTM to all. For 4, pytest sounds fine. You can add some tests to this PR it self. Something that covers the call function of the layers should be fine. Let us know if you have any praxis or paxml related questions. |
I believe there should be a better way to load the checkpoint along with the initialized lora and dora weights. I did try replacing the layers, instantiating and then loading the checkpoint but could not do so successfully. Let me know your thoughts on this.
This whole process can be more optimized I believe. Let me know how that can be done and if you have any ideas.
|
Hi @tanmayshishodia, Regarding "Replace the attention and linear layers in the stacked transformer block of the model with LoRA/DoRA layers defined in adapters/lora_layers.py, adapters/dora_layers.py" afaik LoRA adds a low rank adapter additively to the original weights which are held fixed as in |
Hi @rajatsen91 Yes, apologies for framing it incorrectly. The LoRA/DoRA layers defined in the files inherit original Linear and attention layers. While doing the forward pass we get the original weight matrix and add the LoRA delta (A and B which multiply to form delta W) while the original weight matrix is fixed. [ref]. also sharing a diff between weights before training LoRA/DoRA and after a training epoch: https://www.diffchecker.com/SOotQLkx/. |
@rajatsen91 @siriuz42 PR is ready for first round of review. I have added a simple test case which tests inference. I will be adding more, let me know what scenarios need to be covered. |
@tanmayshishodia We, at Nutanix, greatly appreciate your effort as an intern in fine-tuning TimesFM. As your mentor, I am super proud to see this. @rajatsen91 We, at Nutanix, are looking forward to contributing in your TimesFM project. We believe this will have significant impact across the industries. Thanks! |
Thanks again for the PR. Our team is traveling this week, will look into this next week |
peft/fft.sh
Outdated
@@ -0,0 +1,22 @@ | |||
#!/bin/bash | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like different .sh scripts are not needed, one script with command line options is good enough. or we can not check in these scripts and have it as example usage in the header comment of finetune.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Left a minor comment about the shell scripts.
The numbers in the table "Performance Comparison of LoRA/DoRA with Linear Probing" does not match the numbers I get from the finetuning.ipynb notebook that I had. Can you please clarify what are the differences ? In particular for ETTm1 test split I get MAE: 0.351 for the base model. |
peft/linear_probing.sh
Outdated
--adam-clip-threshold=1e2 \ | ||
--early-stop-patience=10 \ | ||
--datetime-col="date" \ | ||
--boundaries=1000 46080 57600 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this 1000 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, thank you for pointing it out. I will remove this param in the .sh file. The script automatically takes the split as 60-20-20.
Hi Rajat, all the experiments were run with boundaries of 60-20-20 % of the given dataset. I just noticed that even though the boundaries in the notebook are spaced in the same configuration they aren't using the whole dataset. For ettm1, the number of data points is 69680, however, the test boundary is 57600 in the notebook. Hence it does not match. Can you try running the notebook with boundaries: I will add this in the PR description. |
Ok sounds great. I think from my side it is ready to merge, great work. It would be great if you can add a README.md in the peft folder along with the results in your table. Thanks again for the great contribution. |
LGTM from my side. Will wait to hear from @siriuz42 and try to merge by EOD. |
Thank you for this great project and providing open source inference code.
What does this PR do?
Why is this PR needed?
Performance Comparison of LoRA/DoRA with Linear Probing
Experiments were performed with a split of 60-20-20 train, val, and test split. Black denotes best, blue denotes second best.
Benchmarking was done with
context_len=128
andhorizon_len=96
, and fine-tuning was done withcontext_len=128
andhorizon_len=128
.Caveats
Functional Testing
Ref PRs