LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104

tanmayshishodia · 2024-07-16T01:41:41Z

Thank you for this great project and providing open source inference code.

What does this PR do?

A generic fine-tuning pipeline which supports 4 fine-tuning strategies:
- Full Fine-Tuning
- Linear Probing [fine-tunes only the residual blocks and the embedding layer]
- LoRA [fine-tunes only a small number of parameters by decomposing the weight matrices into low-rank matrices, making it efficient in terms of memory and computational resources]
- DoRA [an extension of LoRA that decomposes the pre-trained weight into magnitude and direction components and exploits LoRA for ‌directional adaptation improving learning capacity and stability without additional inference overhead; accepted in ICML 24]
Add testing framework [pytest]

Why is this PR needed?

Primary motivation for this PR is to leverage LoRA to enable efficient training of multiple adapter weights on various tasks, domains, and time-series datasets while maintaining the same base model.
Implementation of PEFT techniques are largely unexplored on time series foundational models. Help the research community analyze PEFT on these models.

Performance Comparison of LoRA/DoRA with Linear Probing

Experiments were performed with a split of 60-20-20 train, val, and test split. Black denotes best, blue denotes second best.
Benchmarking was done with context_len=128 and horizon_len=96, and fine-tuning was done with context_len=128 and horizon_len=128.

Caveats

The loading of adapter weights is currently under optimized.
I am new to the PaxML framework. Please assist with optimization wherever possible.

Functional Testing

Test FFT all params get updated.
Test Linear Probing only residual block params get updated.
Test only LoRA params get updated in LoRA FT. Vary rank r.
Test LoRA and DoRA magnitude vector is updated in DoRA FT. Vary rank r.

Ref PRs

Why? this commit adds a generic finetuning pipeline with LoRA and DoRA support

…o feature/lora

rajatsen91 · 2024-07-16T16:51:21Z

Thanks @tanmayshishodia. These are very welcome contributions. Since the CL is pretty big it will take us some time to review and merge:

We added a notebook that can do linear probing (as well as full finetuning) under notebooks/finetuning.ipynb. This really needs just a one line change. The rest of the notebook is just writing the training loop and dealing with paxml models. It might make sense to add LoRA and DoRA examples to that notebook.
A general style nit is that we are trying to not import layers individually but the whole module. For instance instead of from praxis.layers.linears import Linear we would prefer from praxis.layers import linears and then do linears.LInear.
Please let us know when your changes are ready to review and @siriuz42 and I can take a shot at reviewing.

rajatsen91 · 2024-07-16T17:02:30Z

Would you be able to setup testing using your favorite framework ?

tanmayshishodia · 2024-07-16T19:32:17Z

Sure.

This script is largely modified from the example notebook shared in notebooks/finetuning.ipynb. I can add an example of LoRA/DoRA there. The script shall be useful as it let's you run multiple experiments simultaneously with different configurations. I will be adding .sh for common configurations to quickly run them, for example.
Sure, will make changes to address this.
Sure. I will change PR status to ready for review. There are some issues/bottlenecks I am aware of, I will mention them so you can help address them.
I can set it up with pytest, if that sounds good? Shall I keep it in a separate PR or this one?

Feel free to ask me any other questions you may have.

rajatsen91 · 2024-07-16T19:49:12Z

SGTM to all. For 4, pytest sounds fine.

You can add some tests to this PR it self. Something that covers the call function of the layers should be fine. Let us know if you have any praxis or paxml related questions.

tanmayshishodia · 2024-07-16T21:08:14Z

The current workflow for loading the model for adapter fine-tuning is:
- Load the base model checkpoint which loads train_state and jit_decodes the model [tfm].
- Create an instance of finetune model [model].
- Replace the attention and linear layers in thestacked transformer block of the model with LoRA/DoRA layers defined in adapters/lora_layers.py, adapters/dora_layers.py. [ref]
- Since tfm was loaded with base model checkpoint, I have to manually add the LoRA/DoRA adapter weights for each layer. [ref]. The setup method defined in the layer files don't aid in performing initialization.

I believe there should be a better way to load the checkpoint along with the initialized lora and dora weights. I did try replacing the layers, instantiating and then loading the checkpoint but could not do so successfully. Let me know your thoughts on this.

After fine-tuning only the adapter weights are saved which I manually extract using ref. The loading of the fine-tuned model is as follows:
- Load the base model ckpt and jit_decode.
- To load the adapter ckpt I first create the config for a adapter model and extract the necessary var_weight_hparams to load the adapter ckpt and then merge them using the same forward pass logic and then again jit_decode the model.

This whole process can be more optimized I believe. Let me know how that can be done and if you have any ideas.

Although the initialization of the adapter weights is not being done using setup method currently, dora_m param is initialized with the column norm of the pre-trained weight. How to do so with WeightInit in setup method so that it can be used later?

rajatsen91 · 2024-07-17T21:05:20Z

Hi @tanmayshishodia,

Regarding "Replace the attention and linear layers in the stacked transformer block of the model with LoRA/DoRA layers defined in adapters/lora_layers.py, adapters/dora_layers.py"

afaik LoRA adds a low rank adapter additively to the original weights which are held fixed as in $Wx + \Delta W x$. However, here you are replacing the original attention weights. May be this is just a terminology issue and you are just adding $\Delta W$ and not removing $W$ ?

tanmayshishodia · 2024-07-17T22:10:42Z

Hi @rajatsen91

Yes, apologies for framing it incorrectly. The LoRA/DoRA layers defined in the files inherit original Linear and attention layers. While doing the forward pass we get the original weight matrix and add the LoRA delta (A and B which multiply to form delta W) while the original weight matrix is fixed. [ref].

also sharing a diff between weights before training LoRA/DoRA and after a training epoch: https://www.diffchecker.com/SOotQLkx/.

tanmayshishodia · 2024-07-19T15:51:16Z

@rajatsen91 @siriuz42 PR is ready for first round of review. I have added a simple test case which tests inference. I will be adding more, let me know what scenarios need to be covered.

rghosh08 · 2024-07-26T22:16:05Z

@tanmayshishodia We, at Nutanix, greatly appreciate your effort as an intern in fine-tuning TimesFM. As your mentor, I am super proud to see this.

@rajatsen91 We, at Nutanix, are looking forward to contributing in your TimesFM project. We believe this will have significant impact across the industries. Thanks!

rajatsen91 · 2024-07-27T07:19:37Z

Thanks again for the PR. Our team is traveling this week, will look into this next week

rajatsen91 · 2024-08-01T17:00:05Z

peft/fft.sh

@@ -0,0 +1,22 @@
+#!/bin/bash
+


I feel like different .sh scripts are not needed, one script with command line options is good enough. or we can not check in these scripts and have it as example usage in the header comment of finetune.py

Okay, I will keep only one of them.

Running python3 finetune.py --help gives the following o/p, which is self-explanatory. Perhaps I can add it in README?

rajatsen91

LGTM overall. Left a minor comment about the shell scripts.

rajatsen91 · 2024-08-01T17:28:38Z

The numbers in the table "Performance Comparison of LoRA/DoRA with Linear Probing" does not match the numbers I get from the finetuning.ipynb notebook that I had. Can you please clarify what are the differences ?

In particular for ETTm1 test split I get MAE: 0.351 for the base model.

rajatsen91 · 2024-08-01T20:01:30Z

peft/linear_probing.sh

+    --adam-clip-threshold=1e2 \
+    --early-stop-patience=10 \
+    --datetime-col="date" \
+    --boundaries=1000 46080 57600 \


why is this 1000 here?

My bad, thank you for pointing it out. I will remove this param in the .sh file. The script automatically takes the split as 60-20-20.

tanmayshishodia · 2024-08-03T05:26:11Z

The numbers in the table "Performance Comparison of LoRA/DoRA with Linear Probing" does not match the numbers I get from the finetuning.ipynb notebook that I had. Can you please clarify what are the differences ?

In particular for ETTm1 test split I get MAE: 0.351 for the base model.

Hi Rajat, all the experiments were run with boundaries of 60-20-20 % of the given dataset. I just noticed that even though the boundaries in the notebook are spaced in the same configuration they aren't using the whole dataset. For ettm1, the number of data points is 69680, however, the test boundary is 57600 in the notebook. Hence it does not match. Can you try running the notebook with boundaries: [41808, 55744, 69680] for ettm1, you should get the same result.

I will add this in the PR description.

rajatsen91 · 2024-08-03T19:43:24Z

Ok sounds great. I think from my side it is ready to merge, great work. It would be great if you can add a README.md in the peft folder along with the results in your table. Thanks again for the great contribution.

rajatsen91 · 2024-08-05T18:00:53Z

LGTM from my side. Will wait to hear from @siriuz42 and try to merge by EOD.

tanmayshishodia added 5 commits July 16, 2024 01:26

add parameter efficient finetuning pipeline

71d9802

Why? this commit adds a generic finetuning pipeline with LoRA and DoRA support

Merge branch 'master' into feature/lora

ee7462b

revert test env name

b6ebd80

Merge branch 'feature/lora' of github.com:tanmayshishodia/timesfm int…

77b004b

…o feature/lora

update checkpoint dir name

461c2cd

rajatsen91 self-assigned this Jul 16, 2024

tanmayshishodia added 7 commits July 16, 2024 17:52

update adapter init file docstring

c34896b

gitgnore all pycache dirs

c8aaf31

update usage tutorial

845661d

gitignore jax egg info

e3fb45c

add src init file for poetry package

2174a8c

change import style

39665af

add example dora.sh file

a59979d

tanmayshishodia added 2 commits July 17, 2024 19:44

update lora/dora intermediate var names

5ae8c7d

add pytest framework

d4d4afd

add bash scripts for running diff FT strategies

5901805

add docstrings in adapter utils

a908448

tanmayshishodia marked this pull request as ready for review July 18, 2024 03:24

remove helper and fix early stopping logic

18da73a

tanmayshishodia changed the title ~~LoRA/DoRA support for Fine-Tuning TimesFM~~ LoRA and DoRA PEFT support for Fine-Tuning TimesFM Jul 19, 2024

rajatsen91 reviewed Aug 1, 2024

View reviewed changes

rajatsen91 assigned siriuz42 Aug 1, 2024

rajatsen91 reviewed Aug 1, 2024

View reviewed changes

add poetry packages

807ddfd

Merge branch 'master' into feature/lora

f15daba

tanmayshishodia and others added 5 commits August 4, 2024 09:56

keep only a single bash script

d72ff83

update poetry lock

6517590

update pytest poetry

0ccc10f

add new line EOF

e5be6bd

Create PEFT README.md

55f71de

rajatsen91 merged commit 577e4e8 into google-research:master Aug 6, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104

LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104

tanmayshishodia commented Jul 16, 2024 •

edited

Loading

rajatsen91 commented Jul 16, 2024 •

edited

Loading

rajatsen91 commented Jul 16, 2024 •

edited

Loading

tanmayshishodia commented Jul 16, 2024

rajatsen91 commented Jul 16, 2024

tanmayshishodia commented Jul 16, 2024

rajatsen91 commented Jul 17, 2024

tanmayshishodia commented Jul 17, 2024 •

edited

Loading

tanmayshishodia commented Jul 19, 2024

rghosh08 commented Jul 26, 2024

rajatsen91 commented Jul 27, 2024

rajatsen91 Aug 1, 2024

tanmayshishodia Aug 3, 2024

rajatsen91 left a comment •

edited

Loading

rajatsen91 commented Aug 1, 2024

rajatsen91 Aug 1, 2024

tanmayshishodia Aug 3, 2024

tanmayshishodia commented Aug 3, 2024 •

edited

Loading

rajatsen91 commented Aug 3, 2024

rajatsen91 commented Aug 5, 2024

LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104

LoRA and DoRA PEFT support for Fine-Tuning TimesFM #104

Conversation

tanmayshishodia commented Jul 16, 2024 • edited Loading

What does this PR do?

Why is this PR needed?

Performance Comparison of LoRA/DoRA with Linear Probing

Caveats

Functional Testing

Ref PRs

rajatsen91 commented Jul 16, 2024 • edited Loading

rajatsen91 commented Jul 16, 2024 • edited Loading

tanmayshishodia commented Jul 16, 2024

rajatsen91 commented Jul 16, 2024

tanmayshishodia commented Jul 16, 2024

rajatsen91 commented Jul 17, 2024

tanmayshishodia commented Jul 17, 2024 • edited Loading

tanmayshishodia commented Jul 19, 2024

rghosh08 commented Jul 26, 2024

rajatsen91 commented Jul 27, 2024

rajatsen91 Aug 1, 2024

Choose a reason for hiding this comment

tanmayshishodia Aug 3, 2024

Choose a reason for hiding this comment

rajatsen91 left a comment • edited Loading

Choose a reason for hiding this comment

rajatsen91 commented Aug 1, 2024

rajatsen91 Aug 1, 2024

Choose a reason for hiding this comment

tanmayshishodia Aug 3, 2024

Choose a reason for hiding this comment

tanmayshishodia commented Aug 3, 2024 • edited Loading

rajatsen91 commented Aug 3, 2024

rajatsen91 commented Aug 5, 2024

tanmayshishodia commented Jul 16, 2024 •

edited

Loading

rajatsen91 commented Jul 16, 2024 •

edited

Loading

rajatsen91 commented Jul 16, 2024 •

edited

Loading

tanmayshishodia commented Jul 17, 2024 •

edited

Loading

rajatsen91 left a comment •

edited

Loading

tanmayshishodia commented Aug 3, 2024 •

edited

Loading