-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inline exp functions #38726
inline exp functions #38726
Conversation
If anyone has |
can this be tagged performance, math, and latency? |
I personally don't think these should be inlined by default since I think a very small number of people will use
and then people can at least opt in to the inline version (with the caveat that maybe it is considered internals 🤷 ). |
Float32 doesn't require ivdep. And Float64 shouldn't. The only reason it currently does is that LLVM for some reason isn't able to tell that the table of constants doesn't alias the output. |
I'm also of the opinion that these shouldn't be inlined by default. What's the criteria that a function gets inlined by default? It seems dangerous without having the compiler detect and do this automagically in cases where makes sense and doesn't blow up the code, otherwise we can run into issues like that in #24117 again. |
Does @nanosoldier still work? If so, that might be useful to see what the effects are. |
@nanosoldier |
How long does @nanosoldier usually take? 8 hours seems like a lot. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG |
That is far from enlightening. Hyperbolic trig functions look like they might have slightly regressed, but it mainly looks like noise given how many completely unrelated things changed. |
For an |
Another thing to benchmark is small NN with |
tanh in Julia doesn't use exp (it however uses it uses expm1, so maybe inline it?). In general for at least sin, tanh and more you may have a fast path for even just using an identity function. Does it help to split up some functions in two, so that the fast path can get inlined? [I checked for tanh, and see that is has a Taylor-expansion that seems preferable to exp, but I guess something more advanced than Taylor is used anyway (by expm1), but should it?] |
As of https://github.com/JuliaLang/julia/pull/38382/files, tanh uses exp for big numbers and a minimax polynomial for small numbers. The reason this helps is that our current expm1 is pretty slow, so it's worth avoiding where easy. |
[Off-topic (for exp, helpful for Julia's tanh?)] K-TANH: EFFICIENT TANH FOR DEEP LEARNING
|
I don't think this is especially useful. This paper is about efficient approximations with ~1% error. We are targeting 1.5 ULPs precision which is ~10^-7 error for |
[Not about exp, only (deep-learning) Tanh] For some reason Float64 is not mentioned in that paper, while Float32 is (but note the error they show is for BFloat16): Yes, it's an approximation scheme, for sure belongs in non-Base library: I'm not sure if the error can then be improved (I recall Newton-Raphson doubling the number of correct bits, but I'm getting rusty and maybe it doesn't apply here?). At least I found the paper intriguing, and also commented as a warning, in case you would be benchmarking thinking Julia's tanh and exp indirectly called, in case Julia's tanh substituted in some library (very likely, at least on GPUs?). Thinking of this comment: "Another thing to benchmark is small NN with tanh activation layers" |
Pushing a new version that makes 2 separate versions of the |
Now that the bulk of my changes to exp have been merged, I want to know if we should consider adding `@inline` to the code. The upside is faster performance and possible vectorization (although vectorizing requires `@simd ivdep` for `Float64`). The disadvantage is probable increased compile times, and possible performance hits in some situations due to spilling code caches. This `@inline` was removed from the original PR as it added questions to a PR that was already hard to merge due to it's scope, but I think this is probably worth it.
Are these really unsafe or are their more akin to the |
Good point. They are more fastmathy. The two differences are lacks of overflow checks (ie returns garbage if result should be |
Okay, then I think maybe they should not be called |
I don't feel I can fully comment on the code rearrangment here. I would recommend doing the approach KristofferC mentioned, with |
I think this pr is basically subsumed by the pr I wrote with actual fastmath versions of the exp functions. |
Now that the bulk of my changes to exp have been merged, I want to know if we should consider adding
@inline
to the code. The upside is faster performance and possible vectorization (although vectorizing requires@simd ivdep
forFloat64
). The disadvantage is probable increased compile times, and possible performance hits in some situations due to spilling code caches. This@inline
was removed from the original PR as it added questions to a PR that was already hard to merge due to it's scope, but I think this is probably worth it.