-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce overflow via common denominators #24
Conversation
When implementing `r1 + r2`, check whether they have the same denominator, and if so skip the generic multiplication-of-denominators. This is not as comprehensive as using `gcd`, but it is *much* faster, and that is a key design goal for this package. For consumers, one can exploit this by ensuring that all coefficients for a single axis have the same denominator. For example, if you wanted to construct a weighted average of `a`, `b`, and `c`, rather than cedge, cmid = SimpleRatio(1, 4), SimpleRatio(1, 2) cedge*a + cmid*b + cedge*c it would be better to use cedge, cmid = SimpleRatio(1, 4), SimpleRatio(2, 4) Fixes JuliaMath/Interpolations.jl#457
Codecov Report
@@ Coverage Diff @@
## master #24 +/- ##
==========================================
+ Coverage 89.13% 90.38% +1.25%
==========================================
Files 1 1
Lines 46 52 +6
==========================================
+ Hits 41 47 +6
Misses 5 5
Continue to review full report at Codecov.
|
[I found this to work well for users of FastRationals and be unimpactful ] Consider providing a function that accepts, say, 2..4 Ratios aor an NTuple of Ratios, and tries to common-denominator-ize the Ratios given. This allows clients to benefit from the shared denom speedup more frequently, as getting common-ized is no longer entirely up to the client 's internal logic. The test for available commonality does not need to try very hard -- returning the values given unchanged is copacetic .. nothing is lost, particularly when the function uses simple bit operations and coarse logic. Whenever the function succeeds in commonalizing denoms, it is likely to enhance calculation: a non-zero-sum operation. |
What's |
The referenced work is available here: FastRationals.jl. |
Thanks for the link. I decided to add |
This looks really promising. Since we now have |
@mkitt Ratios may well have different performance sensitivities than has FastRationals. |
That's a lot of complexity for something that I think will probably never happen. Can you generate a real test case using Interpolations where this is necessary? (One where you're doing the obvious things to already avoid overflow, like creating the coefficients with the same denominator.) Otherwise it's a ton of extra armor plating that will never be used and which only makes things heavy and slow. In case you don't know: a branch in your code prevents vectorization. So |
@timholy The research and empiricism that scaffolds FastRationals included consideration of the probable patterns that accompany reducing on (or , better, before) Overflow. While this is largely irrelevant to how values combine within Interpolations, as Tim's intuition found, others may find this good to know: this provides a scenario that is best considered as a "perhaps" event, specifics of the operational flow govern
|
Yep, I'm really glad FastRationals exists. It frees Ratios to do what's best for Interpolations, and people who don't like that can use FastRationals. |
+(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num + y.num, x.den) : | ||
SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den) | ||
-(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num - y.num, x.den) : | ||
SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one other question I had with this implementation is if using ifelse
might be advantageous over the ternary operator in order to avoid branching logic. However, I have not found a good test case to evaluate this. Obviously on a small scale, doing only one calculation is better. The question is if there is any vectorization that occurs before this change and if that vectorization is preserved by introducing this branch.
+(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num + y.num, x.den) : | |
SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den) | |
-(x::SimpleRatio, y::SimpleRatio) = x.den == y.den ? SimpleRatio(x.num - y.num, x.den) : | |
SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den) | |
+(x::SimpleRatio, y::SimpleRatio) = ifelse(x.den == y.den, SimpleRatio(x.num + y.num, x.den), | |
SimpleRatio(x.num*y.den + x.den*y.num, x.den*y.den) ) | |
-(x::SimpleRatio, y::SimpleRatio) = ifelse(x.den == y.den, SimpleRatio(x.num - y.num, x.den), | |
SimpleRatio(x.num*y.den - x.den*y.num, x.den*y.den) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed in principle, but
julia> f(x, y) = x == y ? 1 : 1.0
f (generic function with 1 method)
julia> @code_llvm f(2, 3)
; @ REPL[4]:1 within `f'
define { {}*, i8 } @julia_f_413([8 x i8]* noalias nocapture align 8 dereferenceable(8) %0, i64 signext %1, i64 signext %2) {
top:
; ┌ @ promotion.jl:410 within `=='
%.not = icmp eq i64 %1, %2
; └
%spec.select = select i1 %.not, { {}*, i8 } { {}* inttoptr (i64 140004302721120 to {}*), i8 -126 }, { {}*, i8 } { {}* inttoptr (i64 140004314812816 to {}*), i8 -127 }
ret { {}*, i8 } %spec.select
}
select
already avoids a branch and is SIMD-friendly.
Curiously, the LLVM for ifelse
looks much worse:
julia> fifelse(x, y) = ifelse(x == y, 1, 1.0)
fifelse (generic function with 1 method)
julia> @code_llvm fifelse(2, 3)
; @ REPL[6]:1 within `fifelse'
define { {}*, i8 } @julia_fifelse_435([8 x i8]* noalias nocapture align 8 dereferenceable(8) %0, i64 signext %1, i64 signext %2) {
top:
; ┌ @ promotion.jl:410 within `=='
%.not = icmp eq i64 %1, %2
; └
%3 = select i1 %.not, {}* inttoptr (i64 140004302721120 to {}*), {}* inttoptr (i64 140003489954752 to {}*)
%4 = select i1 %.not, i1 icmp eq ({}* inttoptr (i64 140004302721120 to {}*), {}* null), i1 icmp eq ({}* inttoptr (i64 140003489954752 to {}*), {}* null)
br i1 %4, label %postnull, label %nonnull
nonnull: ; preds = %top
%5 = bitcast {}* %3 to i64*
%6 = getelementptr inbounds i64, i64* %5, i64 -1
%7 = load atomic i64, i64* %6 unordered, align 8
%8 = and i64 %7, -16
%9 = inttoptr i64 %8 to {}*
br label %postnull
postnull: ; preds = %nonnull, %top
%10 = phi {}* [ null, %top ], [ %9, %nonnull ]
%11 = icmp eq {}* %10, inttoptr (i64 140004383587200 to {}*)
%12 = zext i1 %11 to i8
%13 = icmp eq {}* %10, inttoptr (i64 140004380990320 to {}*)
%.op = or i8 %12, -128
%14 = select i1 %13, i8 -126, i8 %.op
%15 = insertvalue { {}*, i8 } undef, {}* %3, 0
%16 = insertvalue { {}*, i8 } %15, i8 %14, 1
ret { {}*, i8 } %16
}
I am puzzled, but it seems clear which one we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has to do with type stability.
julia> f(x, y) = x == y ? 1.0 : 2.0
f (generic function with 1 method)
julia> @code_llvm f(2,3)
; @ REPL[23]:1 within `f`
; Function Attrs: uwtable
define double @julia_f_542(i64 signext %0, i64 signext %1) #0 {
top:
; ┌ @ promotion.jl:424 within `==`
%.not = icmp eq i64 %0, %1
; └
%spec.select = select i1 %.not, double 1.000000e+00, double 2.000000e+00
ret double %spec.select
}
julia> fifelse(x, y) = ifelse(x == y, 1.0, 2.0)
fifelse (generic function with 1 method)
julia> @code_llvm fifelse(2,3)
; @ REPL[25]:1 within `fifelse`
; Function Attrs: uwtable
define double @julia_fifelse_544(i64 signext %0, i64 signext %1) #0 {
top:
; ┌ @ promotion.jl:424 within `==`
%.not = icmp eq i64 %0, %1
; └
%2 = select i1 %.not, double 1.000000e+00, double 2.000000e+00
ret double %2
}
The ternary operator seems generate better code in either case
Given the benchmarks in JuliaMath/Interpolations.jl#457 (comment), this seems to be a strict improvement. I don't see any downside to implementing this. We might be able to optimize common denominator further for certain cases. For example, it might be a common and reasonable case to see if the largest denominator is actually the common denominator. This can be addressed later though. Let's merge! |
When implementing
r1 + r2
, check whether they have the samedenominator, and if so skip the generic multiplication-of-denominators.
This is not as comprehensive as using
gcd
, but it is much faster,and that is a key design goal for this package.
For consumers, one can exploit this by ensuring that all coefficients
for a single axis have the same denominator. For example, if you
wanted to construct a weighted average of
a
,b
, andc
, ratherthan
it would be better to use
Fixes JuliaMath/Interpolations.jl#457