-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rmul over rdiv #1496
Use rmul over rdiv #1496
Conversation
fb1588d
to
c8c1ff7
Compare
bors try |
tryBuild failed: |
Dividing by 2 is exactly equivalent to multiplying by one half, so this should be optimized into a multiplication:
Note that this isn't true for values that aren't powers of two, e.g.
|
I think that optimization only works for literals: julia> d(x) = x/2;
julia> m(x) = x*2;
julia> d(x, y) = x/y;
julia> m(x, y) = x*y;
julia> @code_llvm debuginfo=:none d(2)
define double @julia_d_757(i64 signext %0) #0 {
top:
%1 = sitofp i64 %0 to double
%2 = fmul double %1, 5.000000e-01
ret double %2
}
julia> @code_llvm debuginfo=:none m(2)
define i64 @julia_m_759(i64 signext %0) #0 {
top:
%1 = shl i64 %0, 1
ret i64 %1
}
julia> @code_llvm debuginfo=:none d(4, 2)
define double @julia_d_761(i64 signext %0, i64 signext %1) #0 {
top:
%2 = sitofp i64 %0 to double
%3 = sitofp i64 %1 to double
%4 = fdiv double %2, %3
ret double %4
}
julia> @code_llvm debuginfo=:none m(4, 2)
define i64 @julia_m_763(i64 signext %0, i64 signext %1) #0 {
top:
%2 = mul i64 %1, %0
ret i64 %2
} So, we don't get any of those optimizations in julia> using ClimaCore.RecursiveApply
julia> @code_llvm debuginfo=:none RecursiveApply.rmul((;a=2,b=4), 2)
define void @julia_rmul_930([2 x i64]* noalias nocapture noundef nonnull sret([2 x i64]) align 8 dereferenceable(16) %0, [2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1, i64 signext %2) #0 {
top:
%3 = getelementptr inbounds [2 x i64], [2 x i64]* %1, i64 0, i64 0
%4 = getelementptr inbounds [2 x i64], [2 x i64]* %1, i64 0, i64 1
%5 = load i64, i64* %3, align 8
%6 = mul i64 %5, %2
%7 = load i64, i64* %4, align 8
%8 = mul i64 %7, %2
%.sroa.0.0..sroa_idx = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
store i64 %6, i64* %.sroa.0.0..sroa_idx, align 8
%.sroa.2.0..sroa_idx1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
store i64 %8, i64* %.sroa.2.0..sroa_idx1, align 8
ret void
}
julia> @code_llvm debuginfo=:none RecursiveApply.rdiv((;a=2,b=4), 2)
define void @julia_rdiv_932([2 x double]* noalias nocapture noundef nonnull sret([2 x double]) align 8 dereferenceable(16) %0, [2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1, i64 signext %2) #0 {
top:
%3 = sitofp i64 %2 to double
%4 = bitcast [2 x i64]* %1 to <2 x i64>*
%5 = load <2 x i64>, <2 x i64>* %4, align 8
%6 = sitofp <2 x i64> %5 to <2 x double>
%7 = insertelement <2 x double> poison, double %3, i64 0
%8 = shufflevector <2 x double> %7, <2 x double> poison, <2 x i32> zeroinitializer
%9 = fdiv <2 x double> %6, %8
%10 = bitcast [2 x double]* %0 to <2 x double>*
store <2 x double> %9, <2 x double>* %10, align 8
ret void
} which is where this PR's mostly replaced the divisions by multiplications. |
In your case it doesn't know the denominator at compile time. In the code it does:
|
The |
c8c1ff7
to
6f1980b
Compare
Ah, right, good catch! I've updated to apply these changes only to the |
bors try |
tryBuild failed: |
Interesting failure? |
Indeed. Is |
I don't think so, but 🤷 |
6f1980b
to
1156432
Compare
bors try |
tryBuild failed: |
1156432
to
29384d3
Compare
bors try |
tryBuild failed: |
29384d3
to
8de48bb
Compare
bors try |
tryBuild failed: |
8de48bb
to
b04d046
Compare
bors try |
tryBuild failed: |
c34cfda
to
e00d89f
Compare
Try to debug NaNs Fix test
e00d89f
to
3b9cb62
Compare
A wise man once told me "division is much more expensive than multiplication". This PR converts some
rdiv
srmul
s in a bunch of places.Checks off a box in CliMA/ClimaAtmos.jl#635