-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix type broadcasting plus axis tensors #1636
Merged
charleskawczynski
merged 1 commit into
main
from
ck/fix_type_broadcasting_with_axis_tensors
Mar 5, 2024
Merged
Fix type broadcasting plus axis tensors #1636
charleskawczynski
merged 1 commit into
main
from
ck/fix_type_broadcasting_with_axis_tensors
Mar 5, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
charleskawczynski
force-pushed
the
ck/fix_type_broadcasting_with_axis_tensors
branch
from
March 4, 2024 14:24
4034bce
to
ce2b7f2
Compare
Update benchmarks compat Update dependencies Improve type broadcasting for axistensors
charleskawczynski
force-pushed
the
ck/fix_type_broadcasting_with_axis_tensors
branch
from
March 5, 2024 15:23
ce2b7f2
to
c07ad0f
Compare
This seems to work fine. Going to move forward with this. |
Ok, here's an update: # this branch without inlining
julia> include("../perf/climacore_type_bc_inf.jl")
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Precompiling JET [c3a54625-cd67-489e-a8e7-0a5a0ff4e31b]
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 534 samples with 1 evaluation.
Range (min … max): 8.960 ms … 13.182 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.157 ms ┊ GC (median): 0.00%
Time (mean ± σ): 9.357 ms ± 493.608 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▅▅▅▃▁▃▁ ▃▄▄▄▂▁▁ ▁
██████████████████▆▅███▇▅▅██▆▄▇▅▅▁▄▅▆▆▁▁▁▁▅▄▄▄▁▁▄▄▁▁▁▄▄▄▁▄▄ █
8.96 ms Histogram: log(frequency) by time 11.3 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
# main branch without inlining
julia> include("../perf/climacore_type_bc_inf.jl")
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 842 samples with 1 evaluation.
Range (min … max): 4.900 ms … 21.248 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.550 ms ┊ GC (median): 0.00%
Time (mean ± σ): 5.924 ms ± 1.421 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█▆ ▄▃▄▃▁
██▇██████▆█▄▄▅▄▃▃▃▃▂▃▃▂▃▃▃▃▂▂▃▃▁▂▃▂▂▂▂▂▂▂▂▁▂▁▁▂▃▁▁▂▂▂▂▂▁▁▂ ▃
4.9 ms Histogram: frequency by time 11.2 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
# main branch + inlining
julia> include("../perf/climacore_type_bc_inf.jl")
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 125 samples with 1 evaluation.
Range (min … max): 35.453 ms … 48.682 ms ┊ GC (min … max): 0.00% … 16.30%
Time (median): 41.151 ms ┊ GC (median): 13.45%
Time (mean ± σ): 39.998 ms ± 3.252 ms ┊ GC (mean ± σ): 7.96% ± 7.13%
▂▆ ▃ ▅ ▃ ▃ ▂ ▂▅█▂▂
███▇██▇███▅▄▁▄▁▁▁▁▄▁▁▁▅▄▁▁▇█▇▅███████▄▅▅▁▁▁▅▁▁▁▁▄▁▅▁▁▁▄▁▁▁▄ ▄
35.5 ms Histogram: frequency by time 47.8 ms <
Memory estimate: 69.58 MiB, allocs estimate: 720000.
# this branch + inlining
julia> include("../perf/climacore_type_bc_inf.jl")
Precompiling ClimaCore
1 dependency successfully precompiled in 7 seconds. 140 already precompiled.
ClimaComms.device(cspace) = ClimaComms.CPUSingleThreaded()
[ Info: Compiling first...
[ Info: Collecting profile...
[ Info: Generating html...
[ Info: Benchmarking...
BenchmarkTools.Trial: 1136 samples with 1 evaluation.
Range (min … max): 4.303 ms … 5.111 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.390 ms ┊ GC (median): 0.00%
Time (mean ± σ): 4.402 ms ± 99.612 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂█▅▃▁ ▁▂ ▁
██████▅▄▄▃▅████▆▄▃▄▃▃▃▃▂▃▂▂▂▄▄▄▄▄▄▄▃▃▂▂▁▁▂▂▂▁▁▁▁▂▂▂▂▁▁▂▂▂▂ ▃
4.3 ms Histogram: frequency by time 4.77 ms <
Memory estimate: 0 bytes, allocs estimate: 0. So, we'll temporarily see a performance hit, but this will allow us to inline |
I think merging is the right move. This seems like a more robust solution. |
This actually already passed CI here, so I'm going to disable the auto-merge and just manually merge. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is an extension to #1628. Supersedes #1628. Closes #1602 (checked locally).
This does in fact close #1602.
I would add a unit test, but it would require quite a bit of boiling things down (it uses both CloudMicrophysics and Thermodynamics, and the inlining heuristic makes things even more complicated). The example in #1602 does reproduce the issue.