-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct zero division error in inverse sqrt scheduler #28982
Correct zero division error in inverse sqrt scheduler #28982
Conversation
Hi @DavidAfonsoValente, thanks for opening this PR! Could you give some more context about the issue this resolves, ideally with a reproducible snippet? Just looking at the PR, it implies that |
I corrected the description link to the issue, after testing it seems like both the timescale and (current_step + shift) can possibly be zero, leading to the zero division error |
Hi @DavidAfonsoValente, thanks for linking to the relevant issue. I still have an outstanding question about how this can occur i.e. when is cc @muellerzr who can provide some more context over the behaviour |
@DavidAfonsoValente reading #21495, correct me if I'm wrong but the whole scheduler works off a timescale which is equal to the number of warmup steps. Which, at least in my understanding, means that we can't have a timescale of 0 and thus should raise a |
cc @Sangh0 |
Yes, this problem occurs when num_warmup_steps is 0, the check that is made is to ensure that num_warmup_steps is not None so it goes through. However most of the training examples provided with get_scheduler initiate the scheduler with num_warmup_steps = 0. One other possible correction could be defaulting the timescale to 10_000 as it is done in : |
@muellerzr Thank you! I'm glad this issue has been resolved well. |
Should I default timescale to 10_000 instead of the current solution? |
@muellerzr is off atm, so we'll have to wait from him to confirm. From what I understand, yes, let's a better default for |
Agree with Amy here! |
@DavidAfonsoValente can you rebase from main and push with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing and iterating!
Great, thank you! |
What does this PR do?
Fixes #28835
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.