-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp2(i::Int64) is slower than 2.0^(i::Int64) #17412
Comments
Because I was having trouble describing the bitmasking operation, I ended up implementing it, to be satisfied I understood. The final result is from some back and forth with @ScottPJones
The more extensive testing is at: https://gist.github.com/oxinabox/6e72640222357fa10ca94bdd8c2140a6 This is just kinda cool. Significant speedup. About 6x Shall I make a PR? |
Thanks, this would be great! However before you do, how does it compare to Can I suggest though that to make it a bit clearer, you use the (non-exported) functions |
I've added a few more comparasons to the gist Metric is:
The earlier higher comparative performance of the power based method (to exp2), I think related to the particular range, and/or to it dumping subnormals to 0. @simonbyrne I will take a look into those functions. Thanks |
If you are interested, you can also produce a |
@eschnett I was planning to, but it is not on the list of allowed transformations in fastmath.jl, so not sure now @simonbyrne I put in the nicer way with I updated the gist |
@oxinabox Indeed, LLVM's fastmath flag doesn't allow flushing subnormals to zero. Apparently this is not a compile-time but rather a run-time setting, similar to a rounding mode. I think that this is an optimization well worth considering, but as you say, it's not in scope here. Well, you can still avoid handling inf. |
I noticed today that
exp2(i)
is slower that2.0^(i)
fori
and Int64.(exp2(f) for
f
aFloat64
is still fasterResults are:
@time t1()
0.280422 seconds (4 allocations: 160 bytes)@time t2()
0.016330 seconds (4 allocations: 160 bytes)The integer case is the easy case for finding the value.
Since it can be accomplished by some bit masking on the binary representation.
For comparison same test by cycling
ii
though:-1_000.0:1.0:1_000.0
@time t1()
0.362359 seconds (4 allocations: 160 bytes)@time t2()
3.474701 seconds (4 allocations: 160 bytes)This doesn't change much whether or not I enabled
@fastmath
The text was updated successfully, but these errors were encountered: