Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct zero division error in inverse sqrt scheduler #28982

Merged

Conversation

DavidAfonsoValente
Copy link
Contributor

@DavidAfonsoValente DavidAfonsoValente commented Feb 12, 2024

What does this PR do?

Fixes #28835

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts
Copy link
Collaborator

Hi @DavidAfonsoValente, thanks for opening this PR!

Could you give some more context about the issue this resolves, ideally with a reproducible snippet?

Just looking at the PR, it implies that timescale is 0, which I don't think should ever be the case.

@DavidAfonsoValente
Copy link
Contributor Author

DavidAfonsoValente commented Feb 12, 2024

I corrected the description link to the issue, after testing it seems like both the timescale and (current_step + shift) can possibly be zero, leading to the zero division error

@amyeroberts
Copy link
Collaborator

Hi @DavidAfonsoValente, thanks for linking to the relevant issue. I still have an outstanding question about how this can occur i.e. when is timescale 0?

cc @muellerzr who can provide some more context over the behaviour

@muellerzr
Copy link
Contributor

@DavidAfonsoValente reading #21495, correct me if I'm wrong but the whole scheduler works off a timescale which is equal to the number of warmup steps. Which, at least in my understanding, means that we can't have a timescale of 0 and thus should raise a NotImplementedError directing the user to ensure they have a number of warmup steps set in the scheduler, no?

@muellerzr
Copy link
Contributor

cc @Sangh0

@DavidAfonsoValente
Copy link
Contributor Author

DavidAfonsoValente commented Feb 13, 2024

Yes, this problem occurs when num_warmup_steps is 0, the check that is made is to ensure that num_warmup_steps is not None so it goes through. However most of the training examples provided with get_scheduler initiate the scheduler with num_warmup_steps = 0. One other possible correction could be defaulting the timescale to 10_000 as it is done in :
https://github.com/google-research/big_vision/blob/fd2d3bd2efc9d89ea959f16cd2f58ae8a495cd44/big_vision/configs/proj/clippo/train_clippo.py#L144
https://github.com/google-research/big_vision/blob/6ff6d080d62c1f47e2e4eeb8b6474deb38dfe406/big_vision/configs/proj/scaling_laws/train_vit_g.py#L79
I believe maybe the current implementation came from interpreting the original implementation as having timescale==num_warmup_steps however a more accurate implementation could be one where these both default to 10_000, what do you think?

@Sangh0
Copy link

Sangh0 commented Feb 16, 2024

@muellerzr Thank you! I'm glad this issue has been resolved well.

@DavidAfonsoValente
Copy link
Contributor Author

Should I default timescale to 10_000 instead of the current solution?

@amyeroberts
Copy link
Collaborator

amyeroberts commented Feb 19, 2024

@muellerzr is off atm, so we'll have to wait from him to confirm. From what I understand, yes, let's a better default for timescale. For backwards compatibility, I'd suggest default to 10_000 it num_warmup_step is 0 and timescale is not set.

@muellerzr
Copy link
Contributor

Agree with Amy here!

@muellerzr
Copy link
Contributor

@DavidAfonsoValente can you rebase from main and push with -f (to not have the git commit history/diff all crazy) so we can double check tests are green? 🤗 Thanks!

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@muellerzr muellerzr requested a review from amyeroberts March 1, 2024 13:31
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing and iterating!

@amyeroberts amyeroberts merged commit 831bc25 into huggingface:main Mar 1, 2024
21 checks passed
@DavidAfonsoValente
Copy link
Contributor Author

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZeroDivisonError in inverse_sqrt scheduler
5 participants