[Question] Lora微调训练的时候报错 #72

wickedvalley · 2023-06-20T15:17:59Z

Required prerequisites

I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Traceback (most recent call last):
File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_sft.py", line 97, in
main()
File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_sft.py", line 69, in main
train_result = trainer.train()
File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1987, in inner_training_loop
self.accelerator.clip_grad_norm(
File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1893, in clip_grad_norm_
self.unscale_gradients()
File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1856, in unscale_gradients
self.scaler.unscale_(opt)
File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 275, in unscale_
raise RuntimeError("unscale_() has already been called on this optimizer since the last update().")
RuntimeError: unscale_() has already been called on this optimizer since the last update().

Checklist

I have provided all relevant and necessary information above.
I have chosen a suitable title for this issue.

jiacheo · 2023-06-21T03:51:13Z

用 train_pt.py 也类似错误：

Traceback (most recent call last):
File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_pt.py", line 81, in
main()
File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_pt.py", line 53, in main
train_result = trainer.train()
File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/transformers/trainer.py", line 1987, in inner_training_loop
self.accelerator.clip_grad_norm(
File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1893, in clip_grad_norm_
self.unscale_gradients()
File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1856, in unscale_gradients
self.scaler.unscale_(opt)
File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 275, in unscale_
raise RuntimeError("unscale_() has already been called on this optimizer since the last update().")
RuntimeError: unscale_() has already been called on this optimizer since the last update().
3%|███▎ | 1/30 [00:07<03:27, 7.17s/it]

jiacheo · 2023-06-21T04:05:36Z

参考这篇： huggingface/transformers#24245 ，看样子是transformers某个版本的bug，换成评论里写的

!pip install git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9
就OK了。

wickedvalley · 2023-06-21T09:27:33Z

换成指定的transformers==4.29.1就好了

wickedvalley added the question Further information is requested label Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Lora微调训练的时候报错 #72

[Question] Lora微调训练的时候报错 #72

wickedvalley commented Jun 20, 2023

jiacheo commented Jun 21, 2023

jiacheo commented Jun 21, 2023

wickedvalley commented Jun 21, 2023

[Question] Lora微调训练的时候报错 #72

[Question] Lora微调训练的时候报错 #72

Comments

wickedvalley commented Jun 20, 2023

Required prerequisites

Questions

Checklist

jiacheo commented Jun 21, 2023

jiacheo commented Jun 21, 2023

wickedvalley commented Jun 21, 2023