AdaBelief bias correction #1963

cossio · 2022-05-10T13:15:05Z

Fix #1962

cossio · 2022-05-10T13:38:01Z

The use of epsilon is not completely clear to me, they seem to have moved it around in different versions of the paper.

In any case, it seems reasonable to have some epsilon inside the square root, since the variance estimate can get negative by numerical error. I have added a comment about this.

I think this is also consistent with what's done in Optax (https://optax.readthedocs.io/en/latest/api.html?highlight=belief#optax.scale_by_belief), and the latest "official" implementation by the authors. (https://github.com/juntang-zhuang/Adabelief-Optimizer)

codecov-commenter · 2022-05-10T16:28:41Z

Codecov Report

Merging #1963 (6403805) into master (25457f5) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1963      +/-   ##
==========================================
+ Coverage   87.92%   87.94%   +0.02%     
==========================================
  Files          19       19              
  Lines        1482     1485       +3     
==========================================
+ Hits         1303     1306       +3     
  Misses        179      179

Impacted Files	Coverage Δ
src/optimise/optimisers.jl	`93.75% <100.00%> (+0.10%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25457f5...6403805. Read the comment docs.

darsnack

One minor tweak

src/optimise/optimisers.jl

cossio · 2022-05-10T18:44:10Z

CI passes. Are merging this?

Also, can we tag a release with this? I could bump the version here if needed.

mcabbott · 2022-05-10T18:46:29Z

Does Optimisers.jl need an update to match?

darsnack · 2022-05-10T18:48:57Z

@CarloLucibello @ToucheSir @mcabbott I've been too busy to pay close attention this month to see if we've merged breaking changes. Can you weigh in?

Personally, I am okay releasing with every non-breaking PR (including this one), since it is much easier to keep track of changes.

Looks like Optimisers.jl also needs an update for correctness.

cossio · 2022-05-10T19:02:13Z

I can do a PR to Optimisers.

Also, +1 to releasing for each non-breaking PR. Contributors are happy to see their patches quickly available for use.

mcabbott · 2022-05-10T19:51:37Z

Not aware of any breaking release, I'd hope that would be well-discussed, and have a milestone. Have not paid close attention lately though.

+1 to frequent releases.

ToucheSir · 2022-05-10T19:58:46Z

src/optimise/optimisers.jl

-  mt, st = get!(() -> (zero(x), zero(x)), o.state, x)::NTuple{2,typeof(x)}
+
+  mt, st, βp = get!(o.state, x) do
+      (zero(x), zero(x), Float64[β[1], β[2]])


🤮

I know this is what ADAM does, but it really is non-optimal. Not a blocker for this PR, but more impetus to move to Optimisers.jl ASAP.

darsnack · 2022-05-10T20:01:30Z

@cossio Want to version bump then so we can make a release?

cossio · 2022-05-10T20:04:08Z

@darsnack Done

ToucheSir · 2022-05-10T20:06:09Z

On release cadence, we'd be well into the hundreds on v0.12 under a 1 PR == 1 review model. Perhaps that's an argument for pushing 1.0, but to my knowledge that is (for legitimate reasons) not currently on the table. As a compromise, perhaps we could release at a somewhat regular schedule (e.g. weekly when there are new PRs, after every X PRs, after every bugfix only), and exclude non-functional changes such as docs (e.g. typo fixes) or refactoring?

cossio · 2022-05-10T20:31:46Z

we'd be well into the hundreds on v0.12 under a 1 PR == 1 review model

Is there something wrong with that? There is a nice blog post by Lyndon White arguing in favor of quick releases: https://www.oxinabox.net/2019/09/28/Continuous-Delivery-For-Julia-Packages.html

On the other hand, +1 for having a v1.0. But I think we can treat this as an independent issue.

cossio · 2022-05-10T21:10:42Z

The FastAI failure looks unrelated to this. All other downstream tasks pass.

cossio · 2022-05-10T21:11:24Z

Out of curiosity, where are these downstream tests defined? I would like to know if any of them is using AdaBelief.

ToucheSir · 2022-05-10T21:22:03Z

Is there something wrong with that? There is a nice blog post by @oxinabox arguing in favor of quick releases: https://www.oxinabox.net/2019/09/28/Continuous-Delivery-For-Julia-Packages.html

I agree with the idea of frequent releases, but much like 1.0 vs non 1.0 there's a perception issue if a repo is running 0.13.399. Too frequent updates also means more re-precompilation, which is less of a big deal locally but can add up on CI.

Out of curiosity, where are these downstream tests defined?

https://github.com/FluxML/Flux.jl/blob/master/.github/workflows/Downstream.yml

AdaBelief bias correction

3816725

cossio mentioned this pull request May 10, 2022

AdaBelief issues #1962

Closed

eps_root

e9af20a

eps_root following comment

a34cc85

cossio force-pushed the bias branch from 8a4dd94 to a34cc85 Compare May 10, 2022 16:34

darsnack approved these changes May 10, 2022

View reviewed changes

src/optimise/optimisers.jl Outdated Show resolved Hide resolved

add comment

8b79c57

cossio mentioned this pull request May 10, 2022

AdaBelief bias correction and epsilon FluxML/Optimisers.jl#75

Merged

ToucheSir reviewed May 10, 2022

View reviewed changes

ToucheSir approved these changes May 10, 2022

View reviewed changes

darsnack added the run downstream test label May 10, 2022

bump version

6403805

ToucheSir merged commit eaa7ee8 into FluxML:master May 10, 2022

cossio deleted the bias branch May 10, 2022 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaBelief bias correction #1963

AdaBelief bias correction #1963

cossio commented May 10, 2022 •

edited

Loading

cossio commented May 10, 2022 •

edited

Loading

codecov-commenter commented May 10, 2022 •

edited

Loading

darsnack left a comment

cossio commented May 10, 2022

mcabbott commented May 10, 2022

darsnack commented May 10, 2022

cossio commented May 10, 2022

mcabbott commented May 10, 2022

ToucheSir May 10, 2022

darsnack commented May 10, 2022

cossio commented May 10, 2022

ToucheSir commented May 10, 2022 •

edited

Loading

cossio commented May 10, 2022

cossio commented May 10, 2022

cossio commented May 10, 2022

ToucheSir commented May 10, 2022

AdaBelief bias correction #1963

AdaBelief bias correction #1963

Conversation

cossio commented May 10, 2022 • edited Loading

cossio commented May 10, 2022 • edited Loading

codecov-commenter commented May 10, 2022 • edited Loading

Codecov Report

darsnack left a comment

Choose a reason for hiding this comment

cossio commented May 10, 2022

mcabbott commented May 10, 2022

darsnack commented May 10, 2022

cossio commented May 10, 2022

mcabbott commented May 10, 2022

ToucheSir May 10, 2022

Choose a reason for hiding this comment

darsnack commented May 10, 2022

cossio commented May 10, 2022

ToucheSir commented May 10, 2022 • edited Loading

cossio commented May 10, 2022

cossio commented May 10, 2022

cossio commented May 10, 2022

ToucheSir commented May 10, 2022

cossio commented May 10, 2022 •

edited

Loading

cossio commented May 10, 2022 •

edited

Loading

codecov-commenter commented May 10, 2022 •

edited

Loading

ToucheSir commented May 10, 2022 •

edited

Loading