-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add lora fine tuning for llama 3.2 #958
base: main
Are you sure you want to change the base?
Conversation
``` | ||
|
||
### 3. Compile a hybrid FHE model for the LORA adapted PyTorch model | ||
|
||
Compile the hybrid FHE model to convert the selected outsourced layers to use FHE, while the rest will run on the client side. Note that the exchange of encrypted activations and gradients may require significant bandwidth. | ||
Before training in FHE, we need to compile the model. Compilation calibrates and converts the outsourced linear layers to their FHE equivalents. The compile method uses representative data for this step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest:
Before training in FHE, the model must first be compiled. This process calibrates and converts the outsourced linear layers into their FHE equivalents. The compilation step needs representative data to ensure accurate calibration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use the passive voice yes good point.
|
||
<!--pytest-codeblocks:skip--> | ||
|
||
```python | ||
hybrid_model.model.inference_model(x) | ||
peft_model(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better
You may, precise the default mode of inference here
use_case_examples/lora_finetuning/data_finetune/raw_cml_1.7.0_examples.txt
Show resolved
Hide resolved
d2a25cf
to
372307f
Compare
" loss_fn=nn.CrossEntropyLoss(),\n", | ||
" training_args={\"gradient_accumulation_steps\": 1},\n", | ||
"# Set up LoRA training\n", | ||
"lora_trainer = LoraTrainer(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here, to say that the LoraTrainer uses the hybrid approach.
At this point, everything is perfectly encapsulated, the user doesn't see a hybrid model, but the title does.
So maybe add a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably not mention the hybrid model at this point. It introduces complex topics. I don't think there is any mention to hybrid model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title mentioned the hybrid model: Setup FHE fine-tuning with LoraTraining and HybridFHEModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes that's a miss. I will update to mention LoraTrainer instead.
n_layers_to_skip (int): Number of layers to skip. | ||
model (nn.Module): The model to replace layers in. | ||
n_layers_to_skip_for_backprop (int): Number of initial linear layers to keep as standard | ||
layers. Since the first layer doesn't need backpropagation (no previous layer to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should change the signature of the function, since you mentioned default to 1
n_layers_to_skip_for_backprop: int = 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest:
n_layers_to_skip_for_backprop (int): Determines how many of the first linear layers are excluded from backpropagation. This is typically set to 1 because the first layer only transforms the input data and does not depend on previous layers for gradient updates. By skipping this layer, we save unnecessary computations. Defaults to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain why we replace with custom linear layers.
(attach the forward_module and backward_module...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the default to one here. It's default to one in th LoraTrainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have to update the documentation. The definition if this variable is already quite complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR.
Some comments:
- if we want to go for LoRA, maybe we should add it in the forbidden list, I stopped spamming you with my LoRA comments lol
- The new Lora API is very cool
- GPT2 and LLAma notebooks follow the same logic and share same utility functions, maybe we can create a utils file for them.
- In GPT2 notebook, I think you don't use the full potential of the new LoRA API, or maybe you wanted to highlight what's happening behind the scene and I did not get it
- In the 3 notebooks, I think it's not clear for the reader, if we are using FHE only for the inference or for adapters as well, maybe you should explicitly specify it in the conclusion or the introduction.
I think they share a few function already with the utils file. GPT2 uses the previous API version without the LoraTrainer so a bit more complicated but more flexible as well.
Yes I kept GPT2 without LoraTrainer to show that one could use its own training method but it implies defining the hybrid model / remote layers and so on.
I will add a sentence at the beginning to make sure what we do here is clear. |
8d227cc
to
333c46d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
It would be nice to specify if the weights are encrypted too.
|
||
### LLaMA Results | ||
|
||
TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfrery, i think you forgot that part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't have the result yet. it's a WIP
|
Coverage passed ✅Coverage details
|
No description provided.