Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaBelief bias correction #1963

Merged
merged 5 commits into from
May 10, 2022
Merged

AdaBelief bias correction #1963

merged 5 commits into from
May 10, 2022

Conversation

cossio
Copy link
Contributor

@cossio cossio commented May 10, 2022

@cossio cossio mentioned this pull request May 10, 2022
@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

The use of epsilon is not completely clear to me, they seem to have moved it around in different versions of the paper.

In any case, it seems reasonable to have some epsilon inside the square root, since the variance estimate can get negative by numerical error. I have added a comment about this.

I think this is also consistent with what's done in Optax (https://optax.readthedocs.io/en/latest/api.html?highlight=belief#optax.scale_by_belief), and the latest "official" implementation by the authors. (https://github.com/juntang-zhuang/Adabelief-Optimizer)

@codecov-commenter
Copy link

codecov-commenter commented May 10, 2022

Codecov Report

Merging #1963 (6403805) into master (25457f5) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1963      +/-   ##
==========================================
+ Coverage   87.92%   87.94%   +0.02%     
==========================================
  Files          19       19              
  Lines        1482     1485       +3     
==========================================
+ Hits         1303     1306       +3     
  Misses        179      179              
Impacted Files Coverage Δ
src/optimise/optimisers.jl 93.75% <100.00%> (+0.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25457f5...6403805. Read the comment docs.

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor tweak

src/optimise/optimisers.jl Outdated Show resolved Hide resolved
@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

CI passes. Are merging this?

Also, can we tag a release with this? I could bump the version here if needed.

@mcabbott
Copy link
Member

Does Optimisers.jl need an update to match?

@darsnack
Copy link
Member

@CarloLucibello @ToucheSir @mcabbott I've been too busy to pay close attention this month to see if we've merged breaking changes. Can you weigh in?

Personally, I am okay releasing with every non-breaking PR (including this one), since it is much easier to keep track of changes.

Looks like Optimisers.jl also needs an update for correctness.

@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

I can do a PR to Optimisers.

Also, +1 to releasing for each non-breaking PR. Contributors are happy to see their patches quickly available for use.

@mcabbott
Copy link
Member

Not aware of any breaking release, I'd hope that would be well-discussed, and have a milestone. Have not paid close attention lately though.

+1 to frequent releases.

mt, st = get!(() -> (zero(x), zero(x)), o.state, x)::NTuple{2,typeof(x)}

mt, st, βp = get!(o.state, x) do
(zero(x), zero(x), Float64[β[1], β[2]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤮

I know this is what ADAM does, but it really is non-optimal. Not a blocker for this PR, but more impetus to move to Optimisers.jl ASAP.

@darsnack
Copy link
Member

@cossio Want to version bump then so we can make a release?

@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

@darsnack Done

@ToucheSir
Copy link
Member

ToucheSir commented May 10, 2022

On release cadence, we'd be well into the hundreds on v0.12 under a 1 PR == 1 review model. Perhaps that's an argument for pushing 1.0, but to my knowledge that is (for legitimate reasons) not currently on the table. As a compromise, perhaps we could release at a somewhat regular schedule (e.g. weekly when there are new PRs, after every X PRs, after every bugfix only), and exclude non-functional changes such as docs (e.g. typo fixes) or refactoring?

@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

we'd be well into the hundreds on v0.12 under a 1 PR == 1 review model

Is there something wrong with that? There is a nice blog post by Lyndon White arguing in favor of quick releases: https://www.oxinabox.net/2019/09/28/Continuous-Delivery-For-Julia-Packages.html

On the other hand, +1 for having a v1.0. But I think we can treat this as an independent issue.

@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

The FastAI failure looks unrelated to this. All other downstream tasks pass.

@cossio
Copy link
Contributor Author

cossio commented May 10, 2022

Out of curiosity, where are these downstream tests defined? I would like to know if any of them is using AdaBelief.

@ToucheSir ToucheSir merged commit eaa7ee8 into FluxML:master May 10, 2022
@ToucheSir
Copy link
Member

Is there something wrong with that? There is a nice blog post by @oxinabox arguing in favor of quick releases: https://www.oxinabox.net/2019/09/28/Continuous-Delivery-For-Julia-Packages.html

I agree with the idea of frequent releases, but much like 1.0 vs non 1.0 there's a perception issue if a repo is running 0.13.399. Too frequent updates also means more re-precompilation, which is less of a big deal locally but can add up on CI.

Out of curiosity, where are these downstream tests defined?

https://github.com/FluxML/Flux.jl/blob/master/.github/workflows/Downstream.yml

@cossio cossio deleted the bias branch May 10, 2022 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AdaBelief issues
5 participants