Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with checking args.eval_accumulation_steps to gather tensors #25819

Merged
merged 2 commits into from
Aug 29, 2023
Merged

Error with checking args.eval_accumulation_steps to gather tensors #25819

merged 2 commits into from
Aug 29, 2023

Conversation

chaumng
Copy link
Contributor

@chaumng chaumng commented Aug 29, 2023

The error is in trainer.py.

While the deprecated (legacy) code has the correct check (line 3772):

if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:

The current code does not (line 3196):

if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:

We need to check (step + 1) % args.eval_accumulation_steps == 0, because otherwise, if we set args.eval_accumulation_steps = 10, the code may still gather tensors for each step.

Hence, the line 3196 should be modified to:

if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:

…n_steps to gather tensors)

While the deprecated code has the correct check (line 3772): 
"if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:"

The current code does not (line 3196):
"if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:"

We need to check "(step + 1) % args.eval_accumulation_steps == 0". Hence, the line 3196 should be modified to:
"if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:"
@amyeroberts
Copy link
Collaborator

cc @muellerzr

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you run make style; make quality? This will fix the failing test

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@chaumng
Copy link
Contributor Author

chaumng commented Aug 29, 2023

@muellerzr Thank you. I have run the make commands. All tests passed.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@amyeroberts amyeroberts merged commit 483861d into huggingface:main Aug 29, 2023
@chaumng chaumng deleted the patch-1 branch August 29, 2023 14:17
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
…uggingface#25819)

* Update trainer.py (error with checking steps in args.eval_accumulation_steps to gather tensors)

While the deprecated code has the correct check (line 3772): 
"if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:"

The current code does not (line 3196):
"if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:"

We need to check "(step + 1) % args.eval_accumulation_steps == 0". Hence, the line 3196 should be modified to:
"if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:"

* Fix error with checking args.eval_accumulation_steps to gather tensors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants