Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add lora fine tuning for llama 3.2 #958

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

jfrery
Copy link
Collaborator

@jfrery jfrery commented Dec 10, 2024

No description provided.

@cla-bot cla-bot bot added the cla-signed label Dec 10, 2024
@jfrery jfrery marked this pull request as ready for review December 11, 2024 11:13
@jfrery jfrery requested a review from a team as a code owner December 11, 2024 11:13
```

### 3. Compile a hybrid FHE model for the LORA adapted PyTorch model

Compile the hybrid FHE model to convert the selected outsourced layers to use FHE, while the rest will run on the client side. Note that the exchange of encrypted activations and gradients may require significant bandwidth.
Before training in FHE, we need to compile the model. Compilation calibrates and converts the outsourced linear layers to their FHE equivalents. The compile method uses representative data for this step.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

Before training in FHE, the model must first be compiled. This process calibrates and converts the outsourced linear layers into their FHE equivalents. The compilation step needs representative data to ensure accurate calibration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use the passive voice yes good point.


<!--pytest-codeblocks:skip-->

```python
hybrid_model.model.inference_model(x)
peft_model(x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better

You may, precise the default mode of inference here

" loss_fn=nn.CrossEntropyLoss(),\n",
" training_args={\"gradient_accumulation_steps\": 1},\n",
"# Set up LoRA training\n",
"lora_trainer = LoraTrainer(\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here, to say that the LoraTrainer uses the hybrid approach.

At this point, everything is perfectly encapsulated, the user doesn't see a hybrid model, but the title does.
So maybe add a comment here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably not mention the hybrid model at this point. It introduces complex topics. I don't think there is any mention to hybrid model?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title mentioned the hybrid model: Setup FHE fine-tuning with LoraTraining and HybridFHEModel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that's a miss. I will update to mention LoraTrainer instead.

n_layers_to_skip (int): Number of layers to skip.
model (nn.Module): The model to replace layers in.
n_layers_to_skip_for_backprop (int): Number of initial linear layers to keep as standard
layers. Since the first layer doesn't need backpropagation (no previous layer to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should change the signature of the function, since you mentioned default to 1

n_layers_to_skip_for_backprop: int = 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

n_layers_to_skip_for_backprop (int): Determines how many of the first linear layers are excluded from backpropagation. This is typically set to 1 because the first layer only transforms the input data and does not depend on previous layers for gradient updates. By skipping this layer, we save unnecessary computations. Defaults to 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain why we replace with custom linear layers.
(attach the forward_module and backward_module...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the default to one here. It's default to one in th LoraTrainer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to update the documentation. The definition if this variable is already quite complex

Copy link
Collaborator

@kcelia kcelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.

Some comments:

  • if we want to go for LoRA, maybe we should add it in the forbidden list, I stopped spamming you with my LoRA comments lol
  • The new Lora API is very cool
  • GPT2 and LLAma notebooks follow the same logic and share same utility functions, maybe we can create a utils file for them.
  • In GPT2 notebook, I think you don't use the full potential of the new LoRA API, or maybe you wanted to highlight what's happening behind the scene and I did not get it
  • In the 3 notebooks, I think it's not clear for the reader, if we are using FHE only for the inference or for adapters as well, maybe you should explicitly specify it in the conclusion or the introduction.

@jfrery
Copy link
Collaborator Author

jfrery commented Dec 16, 2024

GPT2 and LLAma notebooks follow the same logic and share same utility functions, maybe we can create a utils file for them.

I think they share a few function already with the utils file. GPT2 uses the previous API version without the LoraTrainer so a bit more complicated but more flexible as well.

In GPT2 notebook, I think you don't use the full potential of the new LoRA API, or maybe you wanted to highlight what's happening behind the scene and I did not get it

Yes I kept GPT2 without LoraTrainer to show that one could use its own training method but it implies defining the hybrid model / remote layers and so on.

In the 3 notebooks, I think it's not clear for the reader, if we are using FHE only for the inference or for adapters as well, maybe you should explicitly specify it in the conclusion or the introduction.

I will add a sentence at the beginning to make sure what we do here is clear.

Copy link
Collaborator

@kcelia kcelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

It would be nice to specify if the weights are encrypted too.


### LLaMA Results

TBD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrery, i think you forgot that part

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't have the result yet. it's a WIP

Copy link

⚠️ Known flaky tests have been rerun ⚠️

One or several tests initially failed but were identified as known flaky. tests. Therefore, they have been rerun and passed. See below for more details.

Failed tests details

Known flaky tests that initially failed:

  • tests/torch/test_compile_torch.py::test_compile_torch_or_onnx_conv_networks[True-True-CNN_conv1d-relu]- tests/torch/test_compile_torch.py::test_compile_torch_or_onnx_conv_networks[False-True-CNN_grouped-relu]

Copy link

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    8482      0   100%

63 files skipped due to complete coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants