Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient accumulation with deepSpeed has issue if not set during configuration #3369

Open
2 of 4 tasks
khalil-Hennara opened this issue Jan 27, 2025 · 1 comment
Open
2 of 4 tasks

Comments

@khalil-Hennara
Copy link

System Info

`Accelerate` version: 1.2.1
- Platform: Linux-6.8.0-45-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /opt/conda/bin/accelerate
- Python version: 3.11.10
- Numpy version: 2.1.2
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 503.46 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
	- compute_environment: LOCAL_MACHINE
	- distributed_type: DEEPSPEED
	- mixed_precision: no
	- use_cpu: False
	- debug: False
	- num_processes: 2
	- machine_rank: 0
	- num_machines: 1
	- rdzv_backend: static
	- same_network: True
	- main_training_function: main
	- enable_cpu_affinity: False
	- deepspeed_config: {'gradient_accumulation_steps': 1, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2}
	- downcast_bf16: no
	- tpu_use_cluster: False
	- tpu_use_sudo: False
	- tpu_env: []

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

following this script https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py and training with DeepSpeed Zero-2, if we didn't set the gradient accumulation steps within the deepspeed config and set GAS within the accelerator a strange behavior happens the time is decrease lineally with GAS, for example if I set the GAS= 2 the time decrease twice the original time, and if you increase the GAS the time still go down, This behavior only happens when using DeepSpeed and not set the GAS within the DeepSpeed config. if This fine and this how it's should work please make some update or hint on the usage of GAS within Accelerator, because it might cause a wrong training loop.

Expected behavior

Set the GAS within accelerator should be passed to DeepSpeed config, and work fine but this doesn't happens the time decrease linearlly with GAS and that mean batches are skiped from training.

@khalil-Hennara
Copy link
Author

khalil-Hennara commented Jan 27, 2025

I have a question also for this script, https://github.com/huggingface/accelerate/blob/main/examples/by_feature/gradient_accumulation_for_autoregressive_models.py if I am using DeepSpeed should I update my code like this or deepspeed will mange that,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant