Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overflow via common denominators #24

Merged
merged 3 commits into from
Sep 16, 2021
Merged

Conversation

timholy
Copy link
Owner

@timholy timholy commented Sep 15, 2021

When implementing r1 + r2, check whether they have the same
denominator, and if so skip the generic multiplication-of-denominators.
This is not as comprehensive as using gcd, but it is much faster,
and that is a key design goal for this package.

For consumers, one can exploit this by ensuring that all coefficients
for a single axis have the same denominator. For example, if you
wanted to construct a weighted average of a, b, and c, rather
than

cedge, cmid = SimpleRatio(1, 4), SimpleRatio(1, 2)
cedge*a + cmid*b + cedge*c

it would be better to use

cedge, cmid = SimpleRatio(1, 4), SimpleRatio(2, 4)

Fixes JuliaMath/Interpolations.jl#457

When implementing `r1 + r2`, check whether they have the same
denominator, and if so skip the generic multiplication-of-denominators.
This is not as comprehensive as using `gcd`, but it is *much* faster,
and that is a key design goal for this package.

For consumers, one can exploit this by ensuring that all coefficients
for a single axis have the same denominator.  For example, if you
wanted to construct a weighted average of `a`, `b`, and `c`, rather
than

    cedge, cmid = SimpleRatio(1, 4), SimpleRatio(1, 2)
    cedge*a + cmid*b + cedge*c

it would be better to use

    cedge, cmid = SimpleRatio(1, 4), SimpleRatio(2, 4)

Fixes JuliaMath/Interpolations.jl#457
@codecov
Copy link

codecov bot commented Sep 15, 2021

Codecov Report

Merging #24 (9746a1c) into master (5a7fa1a) will increase coverage by 1.25%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #24      +/-   ##
==========================================
+ Coverage   89.13%   90.38%   +1.25%     
==========================================
  Files           1        1              
  Lines          46       52       +6     
==========================================
+ Hits           41       47       +6     
  Misses          5        5              
Impacted Files Coverage Δ
src/Ratios.jl 90.38% <100.00%> (+1.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a7fa1a...9746a1c. Read the comment docs.

@JeffreySarnoff
Copy link

JeffreySarnoff commented Sep 15, 2021

[I found this to work well for users of FastRationals and be unimpactful ]

Consider providing a function that accepts, say, 2..4 Ratios aor an NTuple of Ratios, and tries to common-denominator-ize the Ratios given. This allows clients to benefit from the shared denom speedup more frequently, as getting common-ized is no longer entirely up to the client 's internal logic.

The test for available commonality does not need to try very hard -- returning the values given unchanged is copacetic .. nothing is lost, particularly when the function uses simple bit operations and coarse logic. Whenever the function succeeds in commonalizing denoms, it is likely to enhance calculation: a non-zero-sum operation.

@timholy
Copy link
Owner Author

timholy commented Sep 15, 2021

What's FasterRationals? JuliaHub doesn't seem to know about it.

@JeffreySarnoff
Copy link

FasterRationals.jl was the original name, before the package was registered.

The referenced work is available here: FastRationals.jl.

@timholy
Copy link
Owner Author

timholy commented Sep 15, 2021

Thanks for the link. I decided to add common_denominator and I greatly expanded the README and docstrings. See https://github.com/timholy/Ratios.jl/tree/teh/less_overflow for a preview of the README. I linked both SaferIntegers and FastRationals.

@mkitti
Copy link

mkitti commented Sep 16, 2021

This looks really promising.

Since we now have common_denominator, would it make sense to catch the OverflowError, and try to apply common_denominator when thrown?

@JeffreySarnoff
Copy link

@mkitt
Klaus Crusius (@KlausC) and I implemented several approaches, including the one you suggest, and benchmarked them exaushtively and carefully.. We went a different way: https://github.com/JeffreySarnoff/FastRationals.jl.

Ratios may well have different performance sensitivities than has FastRationals.
The most sensible evaluation would supplement current benchmarking with our timing code: https://github.com/JeffreySarnoff/FastRationals.jl/tree/master/benchmarks

@timholy
Copy link
Owner Author

timholy commented Sep 16, 2021

Since we now have common_denominator, would it make sense to catch the OverflowError, and try to apply common_denominator when thrown?

That's a lot of complexity for something that I think will probably never happen. Can you generate a real test case using Interpolations where this is necessary? (One where you're doing the obvious things to already avoid overflow, like creating the coefficients with the same denominator.) Otherwise it's a ton of extra armor plating that will never be used and which only makes things heavy and slow.

In case you don't know: a branch in your code prevents vectorization. So @simd, which can make everything faster by ~10x on modern processors, goes "poof" if you have branches in something that runs in an inner loop. With interpolations, last I checked there were no branches in itp(x, y) for itp = interpolate(A, mode) (extrapolate is a different story, and not SIMD-able, but people who care can write their code in a way that focuses on "interior" and "potentially exterior" to optimize performance).

@JeffreySarnoff
Copy link

JeffreySarnoff commented Sep 16, 2021

@timholy The research and empiricism that scaffolds FastRationals included consideration of the probable patterns that accompany reducing on (or , better, before) Overflow. While this is largely irrelevant to how values combine within Interpolations, as Tim's intuition found, others may find this good to know:

this provides a scenario that is best considered as a "perhaps" event, specifics of the operational flow govern
there are situations where one may run an extended computation repeatedly without overflow
and there are others ... and there is this strong influence too
the narrower the underlying integer type, the more taxing are mixture probabilities

# presuming ordinary care is taken with one's source values
#    e.g. none are exceedingly large or small
# using these symbolic values

qx = xn // xd
qy = yn // yd
# qx != qy /\ xd != yd

qa = op1(qx, qy)
# qa == an // ad
qb = op1(qx, qy)
# qb == bn // bd
While pretending to run to run multistep arithmetic algorithms (we did, however your work
   may be of a different nature in its intermingling flow of provided and computed Rationals):

Usually , the first occasion where qza is to be determined from `qx, qy` resolves within the domain.

Often, the second occasion (where `qa` and `qb` are propagated into an additive `op`)
    may resolve a common denominator. 

With the third (and additional) revist, the attempt to resolve a common denominator often fails, overflowing.

@timholy
Copy link
Owner Author

timholy commented Sep 16, 2021

Yep, I'm really glad FastRationals exists. It frees Ratios to do what's best for Interpolations, and people who don't like that can use FastRationals.

Comment on lines +70 to +73
+(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num + y.num, x.den) :
SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den)
-(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num - y.num, x.den) :
SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one other question I had with this implementation is if using ifelse might be advantageous over the ternary operator in order to avoid branching logic. However, I have not found a good test case to evaluate this. Obviously on a small scale, doing only one calculation is better. The question is if there is any vectorization that occurs before this change and if that vectorization is preserved by introducing this branch.

Suggested change
+(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num + y.num, x.den) :
SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den)
-(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num - y.num, x.den) :
SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den)
+(x::SimpleRatio, y::SimpleRatio) = ifelse(x.den == y.den, SimpleRatio(x.num + y.num, x.den),
SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den) )
-(x::SimpleRatio, y::SimpleRatio) = ifelse(x.den == y.den, SimpleRatio(x.num - y.num, x.den),
SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den) )

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed in principle, but

julia> f(x, y) = x == y ? 1 : 1.0
f (generic function with 1 method)

julia> @code_llvm f(2, 3)
;  @ REPL[4]:1 within `f'
define { {}*, i8 } @julia_f_413([8 x i8]* noalias nocapture align 8 dereferenceable(8) %0, i64 signext %1, i64 signext %2) {
top:
; ┌ @ promotion.jl:410 within `=='
   %.not = icmp eq i64 %1, %2
; └
  %spec.select = select i1 %.not, { {}*, i8 } { {}* inttoptr (i64 140004302721120 to {}*), i8 -126 }, { {}*, i8 } { {}* inttoptr (i64 140004314812816 to {}*), i8 -127 }
  ret { {}*, i8 } %spec.select
}

select already avoids a branch and is SIMD-friendly.

Curiously, the LLVM for ifelse looks much worse:

julia> fifelse(x, y) = ifelse(x == y, 1, 1.0)
fifelse (generic function with 1 method)

julia> @code_llvm fifelse(2, 3)
;  @ REPL[6]:1 within `fifelse'
define { {}*, i8 } @julia_fifelse_435([8 x i8]* noalias nocapture align 8 dereferenceable(8) %0, i64 signext %1, i64 signext %2) {
top:
; ┌ @ promotion.jl:410 within `=='
   %.not = icmp eq i64 %1, %2
; └
  %3 = select i1 %.not, {}* inttoptr (i64 140004302721120 to {}*), {}* inttoptr (i64 140003489954752 to {}*)
  %4 = select i1 %.not, i1 icmp eq ({}* inttoptr (i64 140004302721120 to {}*), {}* null), i1 icmp eq ({}* inttoptr (i64 140003489954752 to {}*), {}* null)
  br i1 %4, label %postnull, label %nonnull

nonnull:                                          ; preds = %top
  %5 = bitcast {}* %3 to i64*
  %6 = getelementptr inbounds i64, i64* %5, i64 -1
  %7 = load atomic i64, i64* %6 unordered, align 8
  %8 = and i64 %7, -16
  %9 = inttoptr i64 %8 to {}*
  br label %postnull

postnull:                                         ; preds = %nonnull, %top
  %10 = phi {}* [ null, %top ], [ %9, %nonnull ]
  %11 = icmp eq {}* %10, inttoptr (i64 140004383587200 to {}*)
  %12 = zext i1 %11 to i8
  %13 = icmp eq {}* %10, inttoptr (i64 140004380990320 to {}*)
  %.op = or i8 %12, -128
  %14 = select i1 %13, i8 -126, i8 %.op
  %15 = insertvalue { {}*, i8 } undef, {}* %3, 0
  %16 = insertvalue { {}*, i8 } %15, i8 %14, 1
  ret { {}*, i8 } %16
}

I am puzzled, but it seems clear which one we want.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has to do with type stability.

julia> f(x, y) = x == y ? 1.0 : 2.0
f (generic function with 1 method)

julia> @code_llvm f(2,3)
;  @ REPL[23]:1 within `f`
; Function Attrs: uwtable
define double @julia_f_542(i64 signext %0, i64 signext %1) #0 {
top:
; ┌ @ promotion.jl:424 within `==`
   %.not = icmp eq i64 %0, %1
; └
  %spec.select = select i1 %.not, double 1.000000e+00, double 2.000000e+00
  ret double %spec.select
}

julia> fifelse(x, y) = ifelse(x == y, 1.0, 2.0)
fifelse (generic function with 1 method)

julia> @code_llvm fifelse(2,3)
;  @ REPL[25]:1 within `fifelse`
; Function Attrs: uwtable
define double @julia_fifelse_544(i64 signext %0, i64 signext %1) #0 {
top:
; ┌ @ promotion.jl:424 within `==`
   %.not = icmp eq i64 %0, %1
; └
  %2 = select i1 %.not, double 1.000000e+00, double 2.000000e+00
  ret double %2
}

The ternary operator seems generate better code in either case

@mkitti
Copy link

mkitti commented Sep 16, 2021

Given the benchmarks in JuliaMath/Interpolations.jl#457 (comment), this seems to be a strict improvement. I don't see any downside to implementing this.

We might be able to optimize common denominator further for certain cases. For example, it might be a common and reasonable case to see if the largest denominator is actually the common denominator. This can be addressed later though.

Let's merge!

@timholy timholy merged commit 9631c4a into master Sep 16, 2021
@timholy timholy deleted the teh/less_overflow branch September 16, 2021 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integer overflow in innocent looking linear interpolation. Should some warning be issued?
3 participants