You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
following this script https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py and training with DeepSpeed Zero-2, if we didn't set the gradient accumulation steps within the deepspeed config and set GAS within the accelerator a strange behavior happens the time is decrease lineally with GAS, for example if I set the GAS= 2 the time decrease twice the original time, and if you increase the GAS the time still go down, This behavior only happens when using DeepSpeed and not set the GAS within the DeepSpeed config. if This fine and this how it's should work please make some update or hint on the usage of GAS within Accelerator, because it might cause a wrong training loop.
Expected behavior
Set the GAS within accelerator should be passed to DeepSpeed config, and work fine but this doesn't happens the time decrease linearlly with GAS and that mean batches are skiped from training.
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
following this script https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py and training with DeepSpeed Zero-2, if we didn't set the gradient accumulation steps within the deepspeed config and set GAS within the accelerator a strange behavior happens the time is decrease lineally with GAS, for example if I set the GAS= 2 the time decrease twice the original time, and if you increase the GAS the time still go down, This behavior only happens when using DeepSpeed and not set the GAS within the DeepSpeed config. if This fine and this how it's should work please make some update or hint on the usage of GAS within Accelerator, because it might cause a wrong training loop.
Expected behavior
Set the GAS within accelerator should be passed to DeepSpeed config, and work fine but this doesn't happens the time decrease linearlly with GAS and that mean batches are skiped from training.
The text was updated successfully, but these errors were encountered: