Reduce overflow via common denominators #24

timholy · 2021-09-15T09:38:02Z

When implementing r1 + r2, check whether they have the same
denominator, and if so skip the generic multiplication-of-denominators.
This is not as comprehensive as using gcd, but it is much faster,
and that is a key design goal for this package.

For consumers, one can exploit this by ensuring that all coefficients
for a single axis have the same denominator. For example, if you
wanted to construct a weighted average of a, b, and c, rather
than

cedge, cmid = SimpleRatio(1, 4), SimpleRatio(1, 2)
cedge*a + cmid*b + cedge*c

it would be better to use

cedge, cmid = SimpleRatio(1, 4), SimpleRatio(2, 4)

Fixes JuliaMath/Interpolations.jl#457

When implementing `r1 + r2`, check whether they have the same denominator, and if so skip the generic multiplication-of-denominators. This is not as comprehensive as using `gcd`, but it is *much* faster, and that is a key design goal for this package. For consumers, one can exploit this by ensuring that all coefficients for a single axis have the same denominator. For example, if you wanted to construct a weighted average of `a`, `b`, and `c`, rather than cedge, cmid = SimpleRatio(1, 4), SimpleRatio(1, 2) cedge*a + cmid*b + cedge*c it would be better to use cedge, cmid = SimpleRatio(1, 4), SimpleRatio(2, 4) Fixes JuliaMath/Interpolations.jl#457

codecov · 2021-09-15T09:40:40Z

Codecov Report

Merging #24 (9746a1c) into master (5a7fa1a) will increase coverage by 1.25%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #24      +/-   ##
==========================================
+ Coverage   89.13%   90.38%   +1.25%     
==========================================
  Files           1        1              
  Lines          46       52       +6     
==========================================
+ Hits           41       47       +6     
  Misses          5        5

Impacted Files	Coverage Δ
src/Ratios.jl	`90.38% <100.00%> (+1.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a7fa1a...9746a1c. Read the comment docs.

JeffreySarnoff · 2021-09-15T10:30:10Z

[I found this to work well for users of FastRationals and be unimpactful ]

Consider providing a function that accepts, say, 2..4 Ratios aor an NTuple of Ratios, and tries to common-denominator-ize the Ratios given. This allows clients to benefit from the shared denom speedup more frequently, as getting common-ized is no longer entirely up to the client 's internal logic.

The test for available commonality does not need to try very hard -- returning the values given unchanged is copacetic .. nothing is lost, particularly when the function uses simple bit operations and coarse logic. Whenever the function succeeds in commonalizing denoms, it is likely to enhance calculation: a non-zero-sum operation.

timholy · 2021-09-15T11:10:37Z

What's FasterRationals? JuliaHub doesn't seem to know about it.

JeffreySarnoff · 2021-09-15T12:42:10Z

FasterRationals.jl was the original name, before the package was registered.

The referenced work is available here: FastRationals.jl.

timholy · 2021-09-15T13:11:37Z

Thanks for the link. I decided to add common_denominator and I greatly expanded the README and docstrings. See https://github.com/timholy/Ratios.jl/tree/teh/less_overflow for a preview of the README. I linked both SaferIntegers and FastRationals.

mkitti · 2021-09-16T03:38:25Z

This looks really promising.

Since we now have common_denominator, would it make sense to catch the OverflowError, and try to apply common_denominator when thrown?

JeffreySarnoff · 2021-09-16T04:52:30Z

@mkitt
Klaus Crusius (@KlausC) and I implemented several approaches, including the one you suggest, and benchmarked them exaushtively and carefully.. We went a different way: https://github.com/JeffreySarnoff/FastRationals.jl.

Ratios may well have different performance sensitivities than has FastRationals.
The most sensible evaluation would supplement current benchmarking with our timing code: https://github.com/JeffreySarnoff/FastRationals.jl/tree/master/benchmarks

timholy · 2021-09-16T12:38:33Z

Since we now have common_denominator, would it make sense to catch the OverflowError, and try to apply common_denominator when thrown?

That's a lot of complexity for something that I think will probably never happen. Can you generate a real test case using Interpolations where this is necessary? (One where you're doing the obvious things to already avoid overflow, like creating the coefficients with the same denominator.) Otherwise it's a ton of extra armor plating that will never be used and which only makes things heavy and slow.

In case you don't know: a branch in your code prevents vectorization. So @simd, which can make everything faster by ~10x on modern processors, goes "poof" if you have branches in something that runs in an inner loop. With interpolations, last I checked there were no branches in itp(x, y) for itp = interpolate(A, mode) (extrapolate is a different story, and not SIMD-able, but people who care can write their code in a way that focuses on "interior" and "potentially exterior" to optimize performance).

JeffreySarnoff · 2021-09-16T13:22:08Z

@timholy The research and empiricism that scaffolds FastRationals included consideration of the probable patterns that accompany reducing on (or , better, before) Overflow. While this is largely irrelevant to how values combine within Interpolations, as Tim's intuition found, others may find this good to know:

this provides a scenario that is best considered as a "perhaps" event, specifics of the operational flow govern
there are situations where one may run an extended computation repeatedly without overflow
and there are others ... and there is this strong influence too
the narrower the underlying integer type, the more taxing are mixture probabilities

# presuming ordinary care is taken with one's source values
#    e.g. none are exceedingly large or small
# using these symbolic values

qx = xn // xd
qy = yn // yd
# qx != qy /\ xd != yd

qa = op1(qx, qy)
# qa == an // ad
qb = op1(qx, qy)
# qb == bn // bd

While pretending to run to run multistep arithmetic algorithms (we did, however your work
   may be of a different nature in its intermingling flow of provided and computed Rationals):

Usually , the first occasion where qza is to be determined from `qx, qy` resolves within the domain.

Often, the second occasion (where `qa` and `qb` are propagated into an additive `op`)
    may resolve a common denominator. 

With the third (and additional) revist, the attempt to resolve a common denominator often fails, overflowing.

timholy · 2021-09-16T17:22:20Z

Yep, I'm really glad FastRationals exists. It frees Ratios to do what's best for Interpolations, and people who don't like that can use FastRationals.

mkitti · 2021-09-16T17:38:36Z

src/Ratios.jl

+(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num + y.num, x.den) :
+                                                     SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den)
+-(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num - y.num, x.den) :
+                                                     SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den)


The one other question I had with this implementation is if using ifelse might be advantageous over the ternary operator in order to avoid branching logic. However, I have not found a good test case to evaluate this. Obviously on a small scale, doing only one calculation is better. The question is if there is any vectorization that occurs before this change and if that vectorization is preserved by introducing this branch.

Suggested change

+(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num + y.num, x.den) :

SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den)

-(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num - y.num, x.den) :

SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den)

+(x::SimpleRatio, y::SimpleRatio) = ifelse(x.den == y.den, SimpleRatio(x.num + y.num, x.den),

SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den) )

-(x::SimpleRatio, y::SimpleRatio) = ifelse(x.den == y.den, SimpleRatio(x.num - y.num, x.den),

SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den) )

Agreed in principle, but

julia> f(x, y) = x == y ? 1 : 1.0 f (generic function with 1 method) julia> @code_llvm f(2, 3) ; @ REPL[4]:1 within `f' define { {}*, i8 } @julia_f_413([8 x i8]* noalias nocapture align 8 dereferenceable(8) %0, i64 signext %1, i64 signext %2) { top: ; ┌ @ promotion.jl:410 within `==' %.not = icmp eq i64 %1, %2 ; └ %spec.select = select i1 %.not, { {}*, i8 } { {}* inttoptr (i64 140004302721120 to {}*), i8 -126 }, { {}*, i8 } { {}* inttoptr (i64 140004314812816 to {}*), i8 -127 } ret { {}*, i8 } %spec.select }

select already avoids a branch and is SIMD-friendly.

Curiously, the LLVM for ifelse looks much worse:

julia> fifelse(x, y) = ifelse(x == y, 1, 1.0) fifelse (generic function with 1 method) julia> @code_llvm fifelse(2, 3) ; @ REPL[6]:1 within `fifelse' define { {}*, i8 } @julia_fifelse_435([8 x i8]* noalias nocapture align 8 dereferenceable(8) %0, i64 signext %1, i64 signext %2) { top: ; ┌ @ promotion.jl:410 within `==' %.not = icmp eq i64 %1, %2 ; └ %3 = select i1 %.not, {}* inttoptr (i64 140004302721120 to {}*), {}* inttoptr (i64 140003489954752 to {}*) %4 = select i1 %.not, i1 icmp eq ({}* inttoptr (i64 140004302721120 to {}*), {}* null), i1 icmp eq ({}* inttoptr (i64 140003489954752 to {}*), {}* null) br i1 %4, label %postnull, label %nonnull nonnull: ; preds = %top %5 = bitcast {}* %3 to i64* %6 = getelementptr inbounds i64, i64* %5, i64 -1 %7 = load atomic i64, i64* %6 unordered, align 8 %8 = and i64 %7, -16 %9 = inttoptr i64 %8 to {}* br label %postnull postnull: ; preds = %nonnull, %top %10 = phi {}* [ null, %top ], [ %9, %nonnull ] %11 = icmp eq {}* %10, inttoptr (i64 140004383587200 to {}*) %12 = zext i1 %11 to i8 %13 = icmp eq {}* %10, inttoptr (i64 140004380990320 to {}*) %.op = or i8 %12, -128 %14 = select i1 %13, i8 -126, i8 %.op %15 = insertvalue { {}*, i8 } undef, {}* %3, 0 %16 = insertvalue { {}*, i8 } %15, i8 %14, 1 ret { {}*, i8 } %16 }

I am puzzled, but it seems clear which one we want.

I think it has to do with type stability.

julia> f(x, y) = x == y ? 1.0 : 2.0 f (generic function with 1 method) julia> @code_llvm f(2,3) ; @ REPL[23]:1 within `f` ; Function Attrs: uwtable define double @julia_f_542(i64 signext %0, i64 signext %1) #0 { top: ; ┌ @ promotion.jl:424 within `==` %.not = icmp eq i64 %0, %1 ; └ %spec.select = select i1 %.not, double 1.000000e+00, double 2.000000e+00 ret double %spec.select } julia> fifelse(x, y) = ifelse(x == y, 1.0, 2.0) fifelse (generic function with 1 method) julia> @code_llvm fifelse(2,3) ; @ REPL[25]:1 within `fifelse` ; Function Attrs: uwtable define double @julia_fifelse_544(i64 signext %0, i64 signext %1) #0 { top: ; ┌ @ promotion.jl:424 within `==` %.not = icmp eq i64 %0, %1 ; └ %2 = select i1 %.not, double 1.000000e+00, double 2.000000e+00 ret double %2 }

The ternary operator seems generate better code in either case

mkitti · 2021-09-16T20:15:10Z

Given the benchmarks in JuliaMath/Interpolations.jl#457 (comment), this seems to be a strict improvement. I don't see any downside to implementing this.

We might be able to optimize common denominator further for certain cases. For example, it might be a common and reasonable case to see if the largest denominator is actually the common denominator. This can be addressed later though.

Let's merge!

timholy mentioned this pull request Sep 15, 2021

Integer overflow in innocent looking linear interpolation. Should some warning be issued? JuliaMath/Interpolations.jl#457

Closed

timholy added 2 commits September 15, 2021 08:09

Add common_denominator

eaa6fb5

Add docstrings, amplify README

9746a1c

mkitti reviewed Sep 16, 2021

View reviewed changes

timholy merged commit 9631c4a into master Sep 16, 2021

timholy deleted the teh/less_overflow branch September 16, 2021 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce overflow via common denominators #24

Reduce overflow via common denominators #24

timholy commented Sep 15, 2021

codecov bot commented Sep 15, 2021 •

edited

Loading

JeffreySarnoff commented Sep 15, 2021 •

edited

Loading

timholy commented Sep 15, 2021

JeffreySarnoff commented Sep 15, 2021

timholy commented Sep 15, 2021

mkitti commented Sep 16, 2021

JeffreySarnoff commented Sep 16, 2021

timholy commented Sep 16, 2021 •

edited

Loading

JeffreySarnoff commented Sep 16, 2021 •

edited

Loading

timholy commented Sep 16, 2021

mkitti Sep 16, 2021

timholy Sep 16, 2021

mkitti Sep 16, 2021

mkitti commented Sep 16, 2021

Reduce overflow via common denominators #24

Reduce overflow via common denominators #24

Conversation

timholy commented Sep 15, 2021

codecov bot commented Sep 15, 2021 • edited Loading

Codecov Report

JeffreySarnoff commented Sep 15, 2021 • edited Loading

timholy commented Sep 15, 2021

JeffreySarnoff commented Sep 15, 2021

timholy commented Sep 15, 2021

mkitti commented Sep 16, 2021

JeffreySarnoff commented Sep 16, 2021

timholy commented Sep 16, 2021 • edited Loading

JeffreySarnoff commented Sep 16, 2021 • edited Loading

timholy commented Sep 16, 2021

mkitti Sep 16, 2021

Choose a reason for hiding this comment

timholy Sep 16, 2021

Choose a reason for hiding this comment

mkitti Sep 16, 2021

Choose a reason for hiding this comment

mkitti commented Sep 16, 2021

codecov bot commented Sep 15, 2021 •

edited

Loading

JeffreySarnoff commented Sep 15, 2021 •

edited

Loading

timholy commented Sep 16, 2021 •

edited

Loading

JeffreySarnoff commented Sep 16, 2021 •

edited

Loading