-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ComplexF64 division: combine four if-statements into two if-elseif-statements #29042
Conversation
By the way, the current implementation seems to be a strict port of LAPACK's implementation of the algorithm; probably following the initial suggestion in #5072. Right now, the checked version is ~7-9× slower than the naive, unchecked version. With the above change it is still ~5-7× slower. |
You can always use |
Right, I realized that rather late today; but thanks! |
Of course, but (correct me if I am wrong) I felt there was a hint of questioning regarding if the performance penalty was really worth it to gain this extra precision. I just wanted to point out that there is already a way for you to make that choice by using the |
You're not wrong at all: I even asked on Slack earlier (and had it explained to me) - it's good to be less ignorant now than I was earlier in the day :). It's great to know that the If I'm being entirely honest, my main interest in |
We definitely want this to be as fast as possible. @simonbyrne and @stevengj may find this of interest and/or have useful feedback. |
In general, it looks good. Would be a good case to add to https://github.com/JuliaCI/BaseBenchmarks.jl. Two more things that might be worth trying:
if ab >= floatmax(Float64)/(2*bs)
a/=2; b/=2; s*=2 # scale down a,b
else
a*=bs; b*=bs; s/=bs # scale up a,b
end |
base/complex.jl
Outdated
@@ -360,8 +360,8 @@ inv(z::Complex{<:Union{Float16,Float32}}) = | |||
# c + i*d | |||
function /(z::ComplexF64, w::ComplexF64) | |||
a, b = reim(z); c, d = reim(w) | |||
ab = max(abs(a), abs(b)) | |||
cd = max(abs(c), abs(d)) | |||
@fastmath ab = max(abs(a), abs(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note @fastmath
isn't really valid here: it implies that we can assume a
or b
are never Inf
, NaN
or subnormal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're quick. I thought I could already put my lessons above to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, I'll swap it to your initial suggestion instead, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fastmath
is probably okay in this case, but occasionally it can cause some problems when the compiler gets carried away. Probably better to be explicit.
Thanks @simonbyrne! Regarding your suggestions:
|
That's a pity, but thanks for trying. Pipelining & branch prediction make it difficult to figure out these sort of micro-optimisations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
From a purely stylistic point of view, you could probably also get rid of the half
and two
values: the compiler is able to optimise simple things like x/2
and x*2
into the optimal form. But that's not a big deal.
Good point; I had just kept the I just realized that, in principle, the tricks above (+ |
A separate PR might be easier (@ me on it). |
The Travis failure seems unrelated. Does this need further review? |
@simonbyrne please merge if this looks ok to you |
Thanks! |
This commit combines what was previously four
if
-statements into twoif
-elseif
-statements in the implementation of over/underflow-proof complex division, i.e. of/(z::ComplexF64, w::ComplexF64)
. This should allow branching to terminate sooner: in practice, testing with a pair of random numbers, I get a 1.22× speedup with this change.The rewrite from four
if
-statements to twoif
-elseif
-statements is allowable sincefloatmax(Float64)/2 > floatmin(Float64)*2/eps(Float64)
, so the previousab
-pairs ofif
-statements cannot be reached simultaneously; similarly for thecd
-pairs.There's also some minor restructuring, just to make the function a simpler read.
Overall, complex division still seems surprisingly slow - certainly, the slowdown is greater than what was quoted in the algorithm's paper. Most of that might be due to additional branching though.