You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometime after 1st epoch I run into the following error
{'loss': 2.2014, 'learning_rate': 0.0002538461538461538, 'epoch': 0.99}
{'loss': 2.24, 'learning_rate': 0.0002492307692307692, 'epoch': 1.0}
{'loss': 2.2383, 'learning_rate': 0.0002446153846153846, 'epoch': 1.01}
raceback (most recent call last):███████████████████████████████████▏ | 112/333 [42:21<1:21:32, 22.14s/it]
File "/home/kunal/ml/train.py", line 234, in<module>
trainer.train(resume_from_checkpoint = False)
File "/home/kunal/miniconda3/envs/lora/lib/python3.10/site-packages/transformers/trainer.py", line 1530, in train
return inner_training_loop(
File "/home/kunal/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/utils/memory.py", line 132, in decorator
return function(batch_size, *args, **kwargs)
File "/home/kunal/miniconda3/envs/lora/lib/python3.10/site-packages/transformers/trainer.py", line 1843, in _inner_training_loop
self.accelerator.clip_grad_norm_(
File "/home/kunal/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1913, in clip_grad_norm_
self.unscale_gradients()
File "/home/kunal/miniconda3/envs/lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1876, in unscale_gradients
self.scaler.unscale_(opt)
File "/home/kunal/miniconda3/envs/lora/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 275, in unscale_
raise RuntimeError("unscale_() has already been called on this optimizer since the last update().") RuntimeError: unscale_() has already been called on this optimizer since the last update().
This training works fine on transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9.
Expected behavior
Training should succeed.
The text was updated successfully, but these errors were encountered:
System Info
transformers
version: 4.31.0.dev0accelerate
version: 0.21.0.dev0peft
version: 0.4.0.dev0Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This training works fine on
transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9
.Expected behavior
Training should succeed.
The text was updated successfully, but these errors were encountered: