Error with checking args.eval_accumulation_steps to gather tensors #25819

chaumng · 2023-08-29T10:02:42Z

The error is in trainer.py.

While the deprecated (legacy) code has the correct check (line 3772):

if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:

The current code does not (line 3196):

if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:

We need to check (step + 1) % args.eval_accumulation_steps == 0, because otherwise, if we set args.eval_accumulation_steps = 10, the code may still gather tensors for each step.

Hence, the line 3196 should be modified to:

if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:

…n_steps to gather tensors) While the deprecated code has the correct check (line 3772): "if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:" The current code does not (line 3196): "if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:" We need to check "(step + 1) % args.eval_accumulation_steps == 0". Hence, the line 3196 should be modified to: "if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:"

amyeroberts · 2023-08-29T10:09:52Z

cc @muellerzr

muellerzr

Thanks! Can you run make style; make quality? This will fix the failing test

HuggingFaceDocBuilderDev · 2023-08-29T11:57:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

chaumng · 2023-08-29T13:50:13Z

@muellerzr Thank you. I have run the make commands. All tests passed.

amyeroberts

Thanks for the fix!

…uggingface#25819) * Update trainer.py (error with checking steps in args.eval_accumulation_steps to gather tensors) While the deprecated code has the correct check (line 3772): "if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:" The current code does not (line 3196): "if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:" We need to check "(step + 1) % args.eval_accumulation_steps == 0". Hence, the line 3196 should be modified to: "if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:" * Fix error with checking args.eval_accumulation_steps to gather tensors

muellerzr approved these changes Aug 29, 2023

View reviewed changes

Fix error with checking args.eval_accumulation_steps to gather tensors

6ea17ca

amyeroberts approved these changes Aug 29, 2023

View reviewed changes

amyeroberts merged commit 483861d into huggingface:main Aug 29, 2023

chaumng deleted the patch-1 branch August 29, 2023 14:17

VeryLazyBoy mentioned this pull request Sep 8, 2023

bug: eval_accumulation_steps can lead to incorrect metrics #24734

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with checking args.eval_accumulation_steps to gather tensors #25819

Error with checking args.eval_accumulation_steps to gather tensors #25819

chaumng commented Aug 29, 2023

amyeroberts commented Aug 29, 2023

muellerzr left a comment

HuggingFaceDocBuilderDev commented Aug 29, 2023

chaumng commented Aug 29, 2023

amyeroberts left a comment

Error with checking args.eval_accumulation_steps to gather tensors #25819

Error with checking args.eval_accumulation_steps to gather tensors #25819

Conversation

chaumng commented Aug 29, 2023

amyeroberts commented Aug 29, 2023

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 29, 2023

chaumng commented Aug 29, 2023

amyeroberts left a comment

Choose a reason for hiding this comment