Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rmul over rdiv #1496

Merged
merged 2 commits into from
Feb 24, 2024
Merged

Use rmul over rdiv #1496

merged 2 commits into from
Feb 24, 2024

Conversation

charleskawczynski
Copy link
Member

A wise man once told me "division is much more expensive than multiplication". This PR converts some rdivs rmuls in a bunch of places.

Checks off a box in CliMA/ClimaAtmos.jl#635

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Oct 11, 2023
@bors
Copy link
Contributor

bors bot commented Oct 11, 2023

try

Build failed:

@simonbyrne
Copy link
Member

simonbyrne commented Oct 12, 2023

Dividing by 2 is exactly equivalent to multiplying by one half, so this should be optimized into a multiplication:

julia> f(x) = x/2
f (generic function with 1 method)

julia> @code_llvm f(2.0)
;  @ REPL[1]:1 within `f`
define double @julia_f_94(double %0) #0 {
top:
; ┌ @ promotion.jl:413 within `/` @ float.jl:411
   %1 = fmul double %0, 5.000000e-01
; └
  ret double %1
}

julia> @code_llvm f(2f0)
;  @ REPL[1]:1 within `f`
define float @julia_f_116(float %0) #0 {
top:
; ┌ @ promotion.jl:413 within `/` @ float.jl:411
   %1 = fmul float %0, 5.000000e-01
; └
  ret float %1
}

Note that this isn't true for values that aren't powers of two, e.g.

julia> g(x) = x/3
g (generic function with 1 method)

julia> @code_llvm g(2.0)
;  @ REPL[4]:1 within `g`
define double @julia_g_135(double %0) #0 {
top:
; ┌ @ promotion.jl:413 within `/` @ float.jl:411
   %1 = fdiv double %0, 3.000000e+00
; └
  ret double %1
}

@charleskawczynski
Copy link
Member Author

Dividing by 2 is exactly equivalent to multiplying by one half, so this should be optimized into a multiplication:

I think that optimization only works for literals:

julia> d(x) = x/2;

julia> m(x) = x*2;

julia> d(x, y) = x/y;

julia> m(x, y) = x*y;

julia> @code_llvm debuginfo=:none d(2)
define double @julia_d_757(i64 signext %0) #0 {
top:
  %1 = sitofp i64 %0 to double
  %2 = fmul double %1, 5.000000e-01
  ret double %2
}

julia> @code_llvm debuginfo=:none m(2)
define i64 @julia_m_759(i64 signext %0) #0 {
top:
  %1 = shl i64 %0, 1
  ret i64 %1
}

julia> @code_llvm debuginfo=:none d(4, 2)
define double @julia_d_761(i64 signext %0, i64 signext %1) #0 {
top:
  %2 = sitofp i64 %0 to double
  %3 = sitofp i64 %1 to double
  %4 = fdiv double %2, %3
  ret double %4
}

julia> @code_llvm debuginfo=:none m(4, 2)
define i64 @julia_m_763(i64 signext %0, i64 signext %1) #0 {
top:
  %2 = mul i64 %1, %0
  ret i64 %2
}

So, we don't get any of those optimizations in RecursiveApply.rdiv:

julia> using ClimaCore.RecursiveApply

julia> @code_llvm debuginfo=:none RecursiveApply.rmul((;a=2,b=4), 2)
define void @julia_rmul_930([2 x i64]* noalias nocapture noundef nonnull sret([2 x i64]) align 8 dereferenceable(16) %0, [2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1, i64 signext %2) #0 {
top:
  %3 = getelementptr inbounds [2 x i64], [2 x i64]* %1, i64 0, i64 0
  %4 = getelementptr inbounds [2 x i64], [2 x i64]* %1, i64 0, i64 1
  %5 = load i64, i64* %3, align 8
  %6 = mul i64 %5, %2
  %7 = load i64, i64* %4, align 8
  %8 = mul i64 %7, %2
  %.sroa.0.0..sroa_idx = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
  store i64 %6, i64* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
  store i64 %8, i64* %.sroa.2.0..sroa_idx1, align 8
  ret void
}

julia> @code_llvm debuginfo=:none RecursiveApply.rdiv((;a=2,b=4), 2)
define void @julia_rdiv_932([2 x double]* noalias nocapture noundef nonnull sret([2 x double]) align 8 dereferenceable(16) %0, [2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1, i64 signext %2) #0 {
top:
  %3 = sitofp i64 %2 to double
  %4 = bitcast [2 x i64]* %1 to <2 x i64>*
  %5 = load <2 x i64>, <2 x i64>* %4, align 8
  %6 = sitofp <2 x i64> %5 to <2 x double>
  %7 = insertelement <2 x double> poison, double %3, i64 0
  %8 = shufflevector <2 x double> %7, <2 x double> poison, <2 x i32> zeroinitializer
  %9 = fdiv <2 x double> %6, %8
  %10 = bitcast [2 x double]* %0 to <2 x double>*
  store <2 x double> %9, <2 x double>* %10, align 8
  ret void
}

which is where this PR's mostly replaced the divisions by multiplications.

@simonbyrne
Copy link
Member

In your case it doesn't know the denominator at compile time. In the code it does:

julia> using ClimaCore.RecursiveApply

julia> f(x) = RecursiveApply.rdiv(x,2)
f (generic function with 1 method)

julia> @code_llvm debuginfo=:none f((a=1,b=2))
define void @julia_f_549([2 x double]* noalias nocapture noundef nonnull sret([2 x double]) align 8 dereferenceable(16) %0, [2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1) #0 {
top:
  %2 = bitcast [2 x i64]* %1 to <2 x i64>*
  %3 = load <2 x i64>, <2 x i64>* %2, align 8
  %4 = sitofp <2 x i64> %3 to <2 x double>
  %5 = fmul <2 x double> %4, <double 5.000000e-01, double 5.000000e-01>
  %6 = bitcast [2 x double]* %0 to <2 x double>*
  store <2 x double> %5, <2 x double>* %6, align 8
  ret void
}

@simonbyrne
Copy link
Member

The invJ ones are worth doing though.

@charleskawczynski
Copy link
Member Author

In your case it doesn't know the denominator at compile time. In the code it does:

Ah, right, good catch!

I've updated to apply these changes only to the invJ cases.

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Oct 13, 2023
@bors
Copy link
Contributor

bors bot commented Oct 13, 2023

try

Build failed:

@simonbyrne
Copy link
Member

Interesting failure?

@charleskawczynski
Copy link
Member Author

Interesting failure?

Indeed. Is J or invJ ever zero?

@simonbyrne
Copy link
Member

I don't think so, but 🤷

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Oct 20, 2023
@bors
Copy link
Contributor

bors bot commented Oct 20, 2023

try

Build failed:

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Oct 20, 2023
@bors
Copy link
Contributor

bors bot commented Oct 20, 2023

try

Build failed:

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Oct 21, 2023
@bors
Copy link
Contributor

bors bot commented Oct 21, 2023

try

Build failed:

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Oct 22, 2023
@bors
Copy link
Contributor

bors bot commented Oct 22, 2023

try

Build failed:

@charleskawczynski charleskawczynski force-pushed the ck/mul branch 3 times, most recently from c34cfda to e00d89f Compare February 23, 2024 18:59
@charleskawczynski charleskawczynski merged commit 692dadf into main Feb 24, 2024
13 of 14 checks passed
@charleskawczynski charleskawczynski deleted the ck/mul branch February 24, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants