-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use faster activation functions #1837
Conversation
It's a pity to pollute the code a bit but I guess the performance increase is worth it. I wonder what could be a more aesthetically pleasing alternative. Pointing Just for reference, the reason why this PR is switching activation in the forward instead of construction time is here
|
I wouldn't claim this as an aesthetic improvement. But it is a performance one. I'm against less explicit names like |
Ok. Should we wait for a final tag of the v0.12 before merging (#1838)? I wouldn't say this PR is breaking though |
Not so breaking as to demand a release, but if there's one nearby perhaps that's the safe time. So after last 0.12.x tag, before v0.13, as you say. |
Buildkite is failing with weird errors again 😬 |
bors try |
tryBuild succeeded: |
Co-authored-by: Brian Chen <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #1837 +/- ##
==========================================
+ Coverage 73.85% 73.94% +0.09%
==========================================
Files 28 28
Lines 1683 1689 +6
==========================================
+ Hits 1243 1249 +6
Misses 440 440
Continue to review full report at Codecov.
|
Co-authored-by: Brian Chen <[email protected]>
c′ = @. sigmoid_fast(forget) * c + sigmoid_fast(input) * tanh_fast(cell) | ||
h′ = @. sigmoid_fast(output) * tanh_fast(c′) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ought these to get the fast_act
treatment too? I'm ok with revisiting too, RNN cell inflexibility is a bit of a long standing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. If we decide to disable fast_tanh
on CuArrays, that will be ignored here. But perhaps revisit if & when... it's a bit more clutter to squeeze that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a blessing in disguise, as currently plain tanh.(...)
will hit the likely obsolete https://github.com/FluxML/NNlibCUDA.jl/blob/master/src/cudnn/activations.jl.
This substitutes
tanh_fast
where possible, sincetanh
often dominates the forward pass of small networks. Builds on #1761, but in the months that sat waiting, I have mislaid the benchmarks I ran. Best case 2x better forwards, worst case no improvement, modulo noise.Intersects with #1832, which would also do this to conv layers, but not to RNNs.
Closes #1272; this version is significantly faster, and this PR applies it to more cases.