Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type broadcasting plus axis tensors #1636

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Mar 4, 2024

This PR is an extension to #1628. Supersedes #1628. Closes #1602 (checked locally).

This does in fact close #1602.

I would add a unit test, but it would require quite a bit of boiling things down (it uses both CloudMicrophysics and Thermodynamics, and the inlining heuristic makes things even more complicated). The example in #1602 does reproduce the issue.

@charleskawczynski charleskawczynski force-pushed the ck/fix_type_broadcasting_with_axis_tensors branch from 4034bce to ce2b7f2 Compare March 4, 2024 14:24
Update benchmarks compat

Update dependencies

Improve type broadcasting for axistensors
@charleskawczynski charleskawczynski force-pushed the ck/fix_type_broadcasting_with_axis_tensors branch from ce2b7f2 to c07ad0f Compare March 5, 2024 15:23
@charleskawczynski charleskawczynski marked this pull request as ready for review March 5, 2024 15:23
@charleskawczynski
Copy link
Member Author

This seems to work fine. Going to move forward with this.

@charleskawczynski
Copy link
Member Author

Ok, here's an update:

# this branch without inlining
julia> include("../perf/climacore_type_bc_inf.jl")
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Precompiling JET [c3a54625-cd67-489e-a8e7-0a5a0ff4e31b]
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 534 samples with 1 evaluation.
 Range (min  max):  8.960 ms   13.182 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     9.157 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.357 ms ± 493.608 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▅▅▃▁▃▁  ▃▄▄▄▂▁▁   ▁                                       
  ██████████████████▆▅███▇▅▅██▆▄▇▅▅▁▄▅▆▆▁▁▁▁▅▄▄▄▁▁▄▄▁▁▁▄▄▄▁▄▄ █
  8.96 ms      Histogram: log(frequency) by time      11.3 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

# main branch without inlining
julia> include("../perf/climacore_type_bc_inf.jl")
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 842 samples with 1 evaluation.
 Range (min  max):  4.900 ms  21.248 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.550 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.924 ms ±  1.421 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆ ▄▃▄▃▁                                                    
  ██▇██████▆█▄▄▅▄▃▃▃▃▂▃▃▂▃▃▃▃▂▂▃▃▁▂▃▂▂▂▂▂▂▂▂▁▂▁▁▂▃▁▁▂▂▂▂▂▁▁▂ ▃
  4.9 ms         Histogram: frequency by time        11.2 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

# main branch + inlining
julia> include("../perf/climacore_type_bc_inf.jl")
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 125 samples with 1 evaluation.
 Range (min  max):  35.453 ms  48.682 ms  ┊ GC (min  max):  0.00%  16.30%
 Time  (median):     41.151 ms              ┊ GC (median):    13.45%
 Time  (mean ± σ):   39.998 ms ±  3.252 ms  ┊ GC (mean ± σ):   7.96% ±  7.13%

   ▂▆ ▃  ▅ ▃                 ▃  ▂ ▂▅█▂▂                        
  ███▇██▇███▅▄▁▄▁▁▁▁▄▁▁▁▅▄▁▁▇█▇▅███████▄▅▅▁▁▁▅▁▁▁▁▄▁▅▁▁▁▄▁▁▁▄ ▄
  35.5 ms         Histogram: frequency by time        47.8 ms <

 Memory estimate: 69.58 MiB, allocs estimate: 720000.

# this branch + inlining
julia> include("../perf/climacore_type_bc_inf.jl")
Precompiling ClimaCore
  1 dependency successfully precompiled in 7 seconds. 140 already precompiled.
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 1136 samples with 1 evaluation.
 Range (min  max):  4.303 ms   5.111 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.390 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.402 ms ± 99.612 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂█▅▃▁      ▁▂ ▁                                             
  ██████▅▄▄▃▅████▆▄▃▄▃▃▃▃▂▃▂▂▂▄▄▄▄▄▄▄▃▃▂▂▁▁▂▂▂▁▁▁▁▂▂▂▂▁▁▂▂▂▂ ▃
  4.3 ms         Histogram: frequency by time        4.77 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

So, we'll temporarily see a performance hit, but this will allow us to inline PhasePartition in thermodynamics to recover the performance.

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Mar 5, 2024

I think merging is the right move. This seems like a more robust solution.

@charleskawczynski
Copy link
Member Author

This actually already passed CI here, so I'm going to disable the auto-merge and just manually merge.

@charleskawczynski charleskawczynski merged commit d179da0 into main Mar 5, 2024
14 of 15 checks passed
@charleskawczynski charleskawczynski deleted the ck/fix_type_broadcasting_with_axis_tensors branch March 5, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inference failure when inlining
1 participant