-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AdaBelief bias correction #1963
Conversation
The use of epsilon is not completely clear to me, they seem to have moved it around in different versions of the paper. In any case, it seems reasonable to have some epsilon inside the square root, since the variance estimate can get negative by numerical error. I have added a comment about this. I think this is also consistent with what's done in Optax (https://optax.readthedocs.io/en/latest/api.html?highlight=belief#optax.scale_by_belief), and the latest "official" implementation by the authors. (https://github.com/juntang-zhuang/Adabelief-Optimizer) |
Codecov Report
@@ Coverage Diff @@
## master #1963 +/- ##
==========================================
+ Coverage 87.92% 87.94% +0.02%
==========================================
Files 19 19
Lines 1482 1485 +3
==========================================
+ Hits 1303 1306 +3
Misses 179 179
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor tweak
CI passes. Are merging this? Also, can we tag a release with this? I could bump the version here if needed. |
Does Optimisers.jl need an update to match? |
@CarloLucibello @ToucheSir @mcabbott I've been too busy to pay close attention this month to see if we've merged breaking changes. Can you weigh in? Personally, I am okay releasing with every non-breaking PR (including this one), since it is much easier to keep track of changes. Looks like Optimisers.jl also needs an update for correctness. |
I can do a PR to Optimisers. Also, +1 to releasing for each non-breaking PR. Contributors are happy to see their patches quickly available for use. |
Not aware of any breaking release, I'd hope that would be well-discussed, and have a milestone. Have not paid close attention lately though. +1 to frequent releases. |
mt, st = get!(() -> (zero(x), zero(x)), o.state, x)::NTuple{2,typeof(x)} | ||
|
||
mt, st, βp = get!(o.state, x) do | ||
(zero(x), zero(x), Float64[β[1], β[2]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤮
I know this is what ADAM does, but it really is non-optimal. Not a blocker for this PR, but more impetus to move to Optimisers.jl ASAP.
@cossio Want to version bump then so we can make a release? |
@darsnack Done |
On release cadence, we'd be well into the hundreds on v0.12 under a 1 PR == 1 review model. Perhaps that's an argument for pushing 1.0, but to my knowledge that is (for legitimate reasons) not currently on the table. As a compromise, perhaps we could release at a somewhat regular schedule (e.g. weekly when there are new PRs, after every X PRs, after every bugfix only), and exclude non-functional changes such as docs (e.g. typo fixes) or refactoring? |
Is there something wrong with that? There is a nice blog post by Lyndon White arguing in favor of quick releases: https://www.oxinabox.net/2019/09/28/Continuous-Delivery-For-Julia-Packages.html On the other hand, +1 for having a v1.0. But I think we can treat this as an independent issue. |
The FastAI failure looks unrelated to this. All other downstream tasks pass. |
Out of curiosity, where are these downstream tests defined? I would like to know if any of them is using AdaBelief. |
I agree with the idea of frequent releases, but much like 1.0 vs non 1.0 there's a perception issue if a repo is running 0.13.399. Too frequent updates also means more re-precompilation, which is less of a big deal locally but can add up on CI.
https://github.com/FluxML/Flux.jl/blob/master/.github/workflows/Downstream.yml |
Fix #1962
CC @darsnack @willtebbutt