implement faster floating-point isless

Previously `isless` relied on the C intrinsic `fpislt` in `src/runtime_intrinsics.c`, while the new implementation in Julia arguably generates better code, namely: 1. The NaN-check compiles to a single instruction + branch amenable for branch prediction in arguably most usecases (i.e. comparing non-NaN floats), thus speeding up execution. 2. The compiler now often manages to remove NaN-computation if the embedding code has already proven the arguments to be non-NaN. 3. The actual operation compares both arguments as sign-magnitude integers instead of case analysis based on the sign of one argument. This symmetric treatment may generate vectorized instructions for the sign-magnitude conversion depending on how the arguments are layed out. The actual behaviour of `isless` did not change and apart from the Julia-specific NaN-handling (which may be up for debate) the resulting total order corresponds to the IEEE-754 specified `totalOrder`. While the new implementation no longer generates fully branchless code I did not manage to construct a usecase where this was detrimental: the saved work seems to outweight the potential cost of a branch misprediction in all of my tests with various NaN-polluted data. Also auto-vectorization was not effective on the previous `fpislt` either. Quick benchmarks (AMD A10-7860K) on `sort`, avoiding the specialized algorithm: ```julia a = rand(1000); @Btime sort($a, lt=(a,b)->isless(a,b)); # before: 56.030 μs (1 allocation: 7.94 KiB) # after: 40.853 μs (1 allocation: 7.94 KiB) a = rand(1000000); @Btime sort($a, lt=(a,b)->isless(a,b)); # before: 159.499 ms (2 allocations: 7.63 MiB) # after: 120.536 ms (2 allocations: 7.63 MiB) a = [rand((rand(), NaN)) for _ in 1:1000000]; @Btime sort($a, lt=(a,b)->isless(a,b)); # before: 111.925 ms (2 allocations: 7.63 MiB) # after: 77.669 ms (2 allocations: 7.63 MiB) ```
JuliaLang · Jan 6, 2021 · 569ca03 · 569ca03
1 parent 1a333a9
commit 569ca03
Showing 1 changed file with 13 additions and 3 deletions.
diff --git a/base/float.jl b/base/float.jl
@@ -400,9 +400,19 @@ end
 isequal(x::Float16, y::Float16) = fpiseq(x, y)
 isequal(x::Float32, y::Float32) = fpiseq(x, y)
 isequal(x::Float64, y::Float64) = fpiseq(x, y)
-isless( x::Float16, y::Float16) = fpislt(x, y)
-isless( x::Float32, y::Float32) = fpislt(x, y)
-isless( x::Float64, y::Float64) = fpislt(x, y)
+
+# interpret as sign-magnitude integer
+@inline function _fpint(x)
+    IntT = signed(uinttype(typeof(x)))
+    ix = reinterpret(IntT, x)
+    return ifelse(ix < zero(IntT), ix ⊻ typemax(IntT), ix)
+end
+
+@inline function isless(a::T, b::T) where T<:IEEEFloat
+    (isnan(a) || isnan(b)) && return !isnan(a)
+
+    return _fpint(a) < _fpint(b)
+end
 
 # Exact Float (Tf) vs Integer (Ti) comparisons
 # Assumes: