Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoints are the full base_model and not just the lora model #353

Closed
winglian opened this issue Apr 21, 2023 · 17 comments
Closed

Checkpoints are the full base_model and not just the lora model #353

winglian opened this issue Apr 21, 2023 · 17 comments

Comments

@winglian
Copy link
Contributor

Started happening sometime in the last week.

@Rallio67
Copy link

Rallio67 commented May 8, 2023

I am also seeing this same issue when generating checkpoints with LoRA. All my checkpoints contain what appear to be the full model weights (I assume it is the merged LoRA + full model weights) and no configuration files to actually run the checkpoint.

 15K May  8 00:01 rng_state_0.pth
7.7K May  8 00:01 trainer_state.json
193M May  8 00:01 optimizer.pt
 627 May  8 00:01 scheduler.pt
3.6K May  8 00:01 training_args.bin
 37G May  8 00:01 pytorch_model.bin
 15K May  8 00:00 rng_state_1.pth
 15K May  8 00:00 rng_state_4.pth
 15K May  8 00:00 rng_state_6.pth
 15K May  8 00:00 rng_state_7.pth
 15K May  8 00:00 rng_state_2.pth
 15K May  8 00:00 rng_state_3.pth
 15K May  8 00:00 rng_state_5.pth

Is what is generated in the checkpoint directories. I tried directly using the "full model weights" in the pytorch_model.bin and it does not work. How do we extract the LoRA adapter from this file or get the checkpoint to be saved as a LoRA configured adapter that can be fun for inference.

Did you find any solution @winglian ?

@0x000011b
Copy link
Contributor

0x000011b commented May 8, 2023

Edit: There's good reason to believe that the code below does not work as expected - I'm leaving it for context, but I recommend trying the Trainer callback approach instead as a workaround for this.


I'm not sure whether this is intended behavior or not, but personally I do something like this to grab the adapter from a Trainer checkpoint:

import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model, set_peft_model_state_dict

BASE_MODEL = "/data/your-base-model-path-here"
OUTPUT_DIR = "/data/your-peft-adapter-will-go-here"
STATE_DICT = "/data/your-checkpoint-folder-here/pytorch_model.bin"

model = AutoModelForCausalLM.from_pretrained(BASE_MODEL)

# This needs to match your training configuration _exactly_.
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=64,
    lora_alpha=32,
    lora_dropout=0.05,
)
model = get_peft_model(model, peft_config)

full_state_dict = torch.load(STATE_DICT, map_location="cpu")
set_peft_model_state_dict(model, full_state_dict)

model.save_pretrained(OUTPUT_DIR)

Works with LLaMA trained with DeepSpeed ZeRO 1. If doing model sharding (FSDP, ZeRO 3) you might need to make some changes, but the general gist is: get the PyTorch module (the model) to be the same as the one used for training, load the state dict from the Trainer checkpoint onto it, then you can use the usual peft stuff (.from_pretrained) to spit out the adapter.

@NanoCode012
Copy link

NanoCode012 commented May 8, 2023

Hello,

@winglian , I found this post by a collaborator telling to use callbacks in Trainer to save model: #286 (comment)

@0x000011b , does the lora work well for you? I have tried it (made sure same config), ran successfully, but my results were quite bad. I'm not sure if it's due to my training or this method ?

I checked the keys whether they match

get_peft_model_state_dict(model).keys() 
# keys start with _orig_mod
# dict_keys(['_orig_mod.base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight', '_orig_mod.base_model.model.model.layers.0.self_attn.q_proj..

full_state_dict.keys()
# odict_keys(['base_model.model.model.embed_tokens.weight', 'base_model.model.model.layers.0.self_attn.q_proj...

Looking at this, what if the lora is not properly merged due to keys mismatch?

@0x000011b
Copy link
Contributor

@NanoCode012 Regarding whether or not the keys match, an easy way to find out would be replacing:

set_peft_model_state_dict(model, STATE_DICT)

with:

model.load_state_dict(STATE_DICT)

This will skip a bunch of PEFT internals (which are probably there for a reason, hence I don't quite recommend it), but it's a useful test because it should output something like:

<All keys matched successfully>

Which should let you know that everything is OK.

However, I've been getting some subpar results while attempting to test a LoRA that I extracted via this method. I assumed this was down to poor training data or hparams - but @Rallio67 has also reported similar behavior, so perhaps there is indeed something wrong with this approach. Does the callback method work well for you?

@NanoCode012
Copy link

NanoCode012 commented May 8, 2023

@0x000011b , I get

model.load_state_dict(STATE_DICT)


RuntimeError: Error(s) in loading state_dict for OptimizedModule:
        Missing key(s) in state_dict: "_orig_mod.base_model.model.model.embed_tokens.weight"..

Unexpected key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.self_attn.q_proj.weight"

I remember the original tloen also has similar error but is ok to ignore. I'm not sure if it's the same thing.

I will test the callback method.

@0x000011b
Copy link
Contributor

@NanoCode012 I've heard a lot of complaints about bugs and weird behaviors in the tloen repo recently so I'm not sure how much I'd trust that comment - if model weights are failing to load because of mismatched key names, I think something is indeed going wrong and it's not safe to ignore.

If the training code you're using does more stuff to model, you can try replicating it in the code that you use to extract the adapter, but indeed I'd just give the callback approach a shot if you can. It's what I'm doing right now.

@NanoCode012
Copy link

NanoCode012 commented May 8, 2023

@0x000011b , I've tested using resume_from_checkpoint.

# Extract lora
08745c9d7cb8f38aebe64c538cd5dfe2cc22f5edcd333afc4c25efb875eee954  adapter_model.bin

# Resume then save_pretrained
8671810c23f7310fe1c1933cbb227dc405873476eb241ae99e5e7fa210efcff2  adapter_model.bin

Note: If there is a bug with resume or if trainer modifies weight slightly, then this will invalidate results above.

I want to try callback but I'm not sure how I can "force" a on_save for callback since my training is complete. Hmm, I could load an earlier weight and train, but I'll try that if this new lora fails since that takes a while to train.

Edit: Loading the resumed lora gets me trainable params: 0 || all params: 6742609920 || trainable%: 0.0 which does not seem right.

@Rallio67
Copy link

Rallio67 commented May 9, 2023

I'm not sure whether this is intended behavior or not, but personally I do something like this to grab the adapter from a Trainer checkpoint:

import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model, set_peft_model_state_dict

BASE_MODEL = "/data/your-base-model-path-here"
OUTPUT_DIR = "/data/your-peft-adapter-will-go-here"
STATE_DICT = "/data/your-checkpoint-folder-here/pytorch_model.bin"

model = AutoModelForCausalLM.from_pretrained(BASE_MODEL)

# This needs to match your training configuration _exactly_.
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=64,
    lora_alpha=32,
    lora_dropout=0.05,
)
model = get_peft_model(model, peft_config)

full_state_dict = torch.load(STATE_DICT, map_location="cpu")
set_peft_model_state_dict(model, full_state_dict)

model.save_pretrained(OUTPUT_DIR)

Works with LLaMA trained with DeepSpeed ZeRO 1. If doing model sharding (FSDP, ZeRO 3) you might need to make some changes, but the general gist is: get the PyTorch module (the model) to be the same as the one used for training, load the state dict from the Trainer checkpoint onto it, then you can use the usual peft stuff (.from_pretrained) to spit out the adapter.

Thank you for your help looking into this. The code does work to generate an adaptor that PEFT can accept without errors, however the adaptor is corrupted in some way since the output when using the adaptor is no different than the untrained model. I am testing with t5-xl-lm and I evaluated the converted checkpoint at 350 steps with the final output (from the training script) and the output is all good with the model generated at the completion of the training script (352 steps in my case).

@NanoCode012
Copy link

NanoCode012 commented May 9, 2023

@0x000011b , I have fine-tuned a simple model using callbacks (code here: https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/callbacks.py)

I have no idea if it's an implementation issue or some training issue, but all my adapter_model.bin within checkpoint folder are all the same (within checkpoint 1.6-1.8k) using sha256sum.

The final one in the output folder is different ( after model trained ). How have your results fared?

The code does work to generate an adaptor that PEFT can accept without errors, however the adaptor is corrupted in some way since the output when using the adaptor is no different than the untrained model.

@Rallio67 , yes, I suspect that something along this line happened as well. Did you mean to say, the final adapter works, but not the extracted one?

@Rallio67
Copy link

Rallio67 commented May 9, 2023

@NanoCode012 if you get to the end of the trainer training loop using LoRA PEFT the final saved model (not any of the checkpoints) does work and gives good expected performance. I have no figured out a way to make any of the checkpoints work.

@NanoCode012
Copy link

@0x000011b @Rallio67 , I checked the source code and this method of "extracting" lora seems to exactly as the one used to load adapter_model.bin. It does not relate to pytorch_model.bin.

peft/src/peft/peft_model.py

Lines 372 to 376 in b1059b7

adapters_weights = torch.load(
filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
)
# load the weights into the model
set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)

In fact, this makes me suspect whether the below works in loading. I cannot find any code within this repo messing with the saving of checkpoints, so theoretically, it should load all weights properly, but what if the peft weights aren't loaded properly?

trainer.train(resume_from_checkpoint=resume_from_checkpoint) # folder with pytorch_model.bin

@NanoCode012
Copy link

NanoCode012 commented May 9, 2023

@0x000011b , I have fine-tuned a simple model using callbacks (code here: https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/callbacks.py)

I have no idea if it's an implementation issue or some training issue, but all my adapter_model.bin within checkpoint folder are all the same (within checkpoint 1.6-1.8k) using sha256sum.

I redid a training for this. I had an issue with optimizer due to some code changes. I believe the callback does work. The result seem somewhat ok for what it's given (small dataset).

@0x000011b
Copy link
Contributor

@NanoCode012 I can confirm that on my end the callback does indeed seem to work as expected:

checkpoint-720/adapter_model/adapter_model.bin A63CEAAD
checkpoint-780/adapter_model/adapter_model.bin 7D67E129

Different files for each checkpoint, plus when loaded with from_pretrained the model is coherent and seems to be learning from the training data. Here are the versions I'm using of all the relevant packages, just in case:

accelerate 565152183334f709ac955204ef663023d1f63b7a
transformers 3d3204c025b6b5de013e07dd364208e28b4d9589
peft 382b178911edff38c1ff619bbac2ba556bd2276b
deepspeed 0.8.3 (regular pip install)

@NanoCode012
Copy link

NanoCode012 commented May 9, 2023

@0x000011b , I was wondering if you have tried to "extract" LORA from your last checkpoint and compare against the lora saved by the callback? Are they the same?

My machine is a bit busy, so I was not able to test this.

@NanoCode012
Copy link

I found another repo which loads from pytorch_model.bin , then sets the weight to the model. This follows the same principle as the extract lora above. https://github.com/Facico/Chinese-Vicuna/blob/cd04b2d8c3ed07c921b03b4f9fc1e56969a997a1/finetune.py#L89-L113

@annahung31
Copy link

@NanoCode012 I can confirm that on my end the callback does indeed seem to work as expected:

checkpoint-720/adapter_model/adapter_model.bin A63CEAAD
checkpoint-780/adapter_model/adapter_model.bin 7D67E129

Different files for each checkpoint, plus when loaded with from_pretrained the model is coherent and seems to be learning from the training data. Here are the versions I'm using of all the relevant packages, just in case:

accelerate 565152183334f709ac955204ef663023d1f63b7a
transformers 3d3204c025b6b5de013e07dd364208e28b4d9589
peft 382b178911edff38c1ff619bbac2ba556bd2276b
deepspeed 0.8.3 (regular pip install)

Hi @0x000011b, may I ask how do you use callback to correctly save and load the adapter weight? Thanks a lot!

@younesbelkada
Copy link
Contributor

Hi everyone,
The issues related to saving PEFT models should have been resolved in the recent PRs on the HF trainer: huggingface/transformers#24073 / huggingface/transformers#24103 / huggingface/transformers#24274

If you install transformers' latest version or install it from source everything should work

I am temporary closing this issue, feel free to re-open or open a new ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants