-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix the grad_acc issue at epoch boundaries #24415
Conversation
Co-Authored-By: Zach Mueller <[email protected]>
The documentation is not available anymore as the PR was closed or merged. |
Co-authored-by: sumpster
I believe this PR also fixes #24245 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Happy with the changes. My only concern is things breaking as this relies on feature that's just been merged on accelerate
's main branch. Is the next release of accelerate going to be before the next release of transformers?
@pacman100 @muellerzr You've been handling the switch using accelerate in trainer really well, and I can see our CI runs install accelerate from source, so happy if this follows the pattern you've using.
@amyeroberts we coordinate Accelerate releases to be a day or two before (Though @pacman100 we should do the version check like we've done before with these fixes 😬 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this! There needs to be a couple of protections before this is merged (that we can remove at the next release of Accelerate when we move the pinned version up).
Hello, How do i install this? I expected that PR means some kind of transformers update, in this case there should be install link such as git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9 |
Hello @Oxi84, you can install this once it gets merged via |
Just completed a training run with this PR and can confirm that the issue didn't occur. Thanks for the fix! |
What does this PR do?
Should solve accumulating via epoch when using Accelerate (seen in #23935 (comment)). Requires huggingface/accelerate#1624
Fixes # (issue)