Skip to content

Commit

Permalink
implement faster floating-point isless
Browse files Browse the repository at this point in the history
Previously `isless` relied on the C intrinsic `fpislt` in
`src/runtime_intrinsics.c`, while the new implementation in Julia
arguably generates better code, namely:

 1. The NaN-check compiles to a single instruction + branch amenable
    for branch prediction in arguably most usecases (i.e. comparing
    non-NaN floats), thus speeding up execution.
 2. The compiler now often manages to remove NaN-computation if the
    embedding code has already proven the arguments to be non-NaN.
 3. The actual operation compares both arguments as sign-magnitude
    integers instead of case analysis based on the sign of one
    argument. This symmetric treatment may generate vectorized
    instructions for the sign-magnitude conversion depending on how the
    arguments are layed out.

The actual behaviour of `isless` did not change and apart from the
Julia-specific NaN-handling (which may be up for debate) the resulting
total order corresponds to the IEEE-754 specified `totalOrder`.

While the new implementation no longer generates fully branchless code I
did not manage to construct a usecase where this was detrimental: the
saved work seems to outweight the potential cost of a branch
misprediction in all of my tests with various NaN-polluted data. Also
auto-vectorization was not effective on the previous `fpislt` either.

Quick benchmarks (AMD A10-7860K) on `sort`, avoiding the specialized
algorithm:

```julia
a = rand(1000);
@Btime sort($a, lt=(a,b)->isless(a,b));
    # before: 56.030 μs (1 allocation: 7.94 KiB)
    #  after: 40.853 μs (1 allocation: 7.94 KiB)
a = rand(1000000);
@Btime sort($a, lt=(a,b)->isless(a,b));
    # before: 159.499 ms (2 allocations: 7.63 MiB)
    #  after: 120.536 ms (2 allocations: 7.63 MiB)
a = [rand((rand(), NaN)) for _ in 1:1000000];
@Btime sort($a, lt=(a,b)->isless(a,b));
    # before: 111.925 ms (2 allocations: 7.63 MiB)
    #  after:  77.669 ms (2 allocations: 7.63 MiB)
```
  • Loading branch information
stev47 committed Jan 6, 2021
1 parent 1a333a9 commit 569ca03
Showing 1 changed file with 13 additions and 3 deletions.
16 changes: 13 additions & 3 deletions base/float.jl
Original file line number Diff line number Diff line change
Expand Up @@ -400,9 +400,19 @@ end
isequal(x::Float16, y::Float16) = fpiseq(x, y)
isequal(x::Float32, y::Float32) = fpiseq(x, y)
isequal(x::Float64, y::Float64) = fpiseq(x, y)
isless( x::Float16, y::Float16) = fpislt(x, y)
isless( x::Float32, y::Float32) = fpislt(x, y)
isless( x::Float64, y::Float64) = fpislt(x, y)

# interpret as sign-magnitude integer
@inline function _fpint(x)
IntT = signed(uinttype(typeof(x)))
ix = reinterpret(IntT, x)
return ifelse(ix < zero(IntT), ix typemax(IntT), ix)
end

@inline function isless(a::T, b::T) where T<:IEEEFloat
(isnan(a) || isnan(b)) && return !isnan(a)

return _fpint(a) < _fpint(b)
end

# Exact Float (Tf) vs Integer (Ti) comparisons
# Assumes:
Expand Down

0 comments on commit 569ca03

Please sign in to comment.