Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean more caches #2287

Merged
merged 15 commits into from
Oct 30, 2023
Merged

Clean more caches #2287

merged 15 commits into from
Oct 30, 2023

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Oct 25, 2023

Clean up cache for:

  • precipitation
  • edmfx_nh_pressure
  • subsidence
  • EDMFCoriolis
  • forcing_type
  • non_orographic_gravity_wave
  • orographic_gravity_wave
  • hyperdiffiusion
  • radiation
  • limiters

Add surface_setup to AtmosModel and removes dt from simulation

A quick test seem to indicate that this PR reduces compile time by 5-10 % (or more in some cases)

@Sbozzolo
Copy link
Member Author

To be merged after #2281

@Sbozzolo Sbozzolo force-pushed the gb/cache5 branch 9 times, most recently from c9d886a to 0c97bf3 Compare October 26, 2023 18:56
@Sbozzolo Sbozzolo mentioned this pull request Oct 26, 2023
@Sbozzolo Sbozzolo requested a review from szy21 October 26, 2023 19:18
@Sbozzolo Sbozzolo marked this pull request as ready for review October 26, 2023 19:19
@Sbozzolo
Copy link
Member Author

I take back the compile-time benefits. They are not uniform, and there's a regression for some EDMF cases (not all).

I am investigating this.

examples/hybrid/driver.jl Outdated Show resolved Hide resolved
@Sbozzolo Sbozzolo force-pushed the gb/cache5 branch 3 times, most recently from e5e89c6 to 2dc050c Compare October 26, 2023 21:39
@szy21
Copy link
Member

szy21 commented Oct 26, 2023

@dennisYatunin may want to take a look at the changes to radiation cache.

@Sbozzolo
Copy link
Member Author

@dennisYatunin may want to take a look at the changes to radiation cache.

We agreed to remove one level of nesting to the RRTMPGI cache and rename radiation_mode to radiation_model in a future PR.

Copy link
Member

@szy21 szy21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Oct 26, 2023

I thought I identified and reverted the issue for the slowdown, but that's not the case.

While on my computer, the PR leads to consistent performance/latency improvements, this is not the case on the cluster. Quite the opposite, this PR introduces some noticeable regressions.

I would like to still merge this (and the other, #2296) PR. They clean up and improve the cache, and they are a lot of work to maintain because pretty much every change touches the cache, leading to conflicts that are time consuming to solve.

Debugging this issue is particularly annoying because seemingly there's no correlation between my changes and the outcome (+ I need to run stuff on the cluster to clearly see the effects). I suspect that some of the newly introduce (and non identified) problems will be automatically solved by the mutable spaces.

What do you think?
@szy21 , @charleskawczynski , @dennisYatunin

@szy21
Copy link
Member

szy21 commented Oct 26, 2023

How much regression did you get on the cluster? If it's not too bad (e.g. ci can still finish in a reasonable time), then I think it's ok to merge it in.

@Sbozzolo
Copy link
Member Author

You can compare

https://buildkite.com/clima/climaatmos-ci/builds/14418#_ (which would be the "final" PR)

With master:

https://buildkite.com/clima/climaatmos-ci/builds/14401#_

Some jobs take 10 more minutes:

https://buildkite.com/clima/climaatmos-ci/builds/14418#018b6e11-f634-4293-b119-d2b0d8b59bac

vs

https://buildkite.com/clima/climaatmos-ci/builds/14401#018b6d70-d498-49e9-a8cf-8a3eab19e45c

I think that this is all in compiling solve_atmos!. If you look at everything else (including SYPD), the new PR is better.

@szy21
Copy link
Member

szy21 commented Oct 26, 2023

hmm, indeed a lot of jobs are slower. @charleskawczynski What do you think?

@@ -96,12 +96,12 @@ function compute_rld!(
FT = eltype(state)
if isnothing(out)
return RRTMGPI.array2field(
FT.(cache.radiation_model.face_lw_flux_dn),
FT.(cache.atmos.radiation.radiation_model.face_lw_flux_dn),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cache.atmos.radiation the same as cache.radiation? (I was wondering why this line is different from e.g. line23)

On a side note, we didn't test all the diagnostics in ci, so it would be hard to catch things like this.

@szy21
Copy link
Member

szy21 commented Oct 28, 2023

It seems the compile time is back to normal on the cluster now. Feel free to merge it once others' comments are addressed.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Oct 30, 2023

In #2244 I add one additional commit to this PR to complete the refactor of the cache by moving all the precomputed quantities in p.precomputed. Overall compile time is nearly identical to the master branch (https://buildkite.com/clima/climaatmos-ci/builds/14519). I would like to merge that and close the chapter of "cache refactoring".

@szy21 do you want to have a look at that commit too? If it the change is okay, I will move it over to this PR and merge it.

@szy21
Copy link
Member

szy21 commented Oct 30, 2023

The commit looks good to me, thanks! I need to think a bit more about what should be in core and what should be in precomputed, but we don't need to wait for that to merge this PR.

@Sbozzolo
Copy link
Member Author

bors r+

bors bot added a commit that referenced this pull request Oct 30, 2023
2287: Clean more caches r=Sbozzolo a=Sbozzolo

Clean up cache for:
- precipitation
- edmfx_nh_pressure
- subsidence
- EDMFCoriolis
- forcing_type
- non_orographic_gravity_wave
- orographic_gravity_wave
- hyperdiffiusion
- radiation
- limiters

Add surface_setup to AtmosModel and removes dt from simulation

A quick test seem to indicate that this PR reduces compile time by 5-10 % (or more in some cases)

Co-authored-by: Gabriele Bozzola <[email protected]>
@bors
Copy link
Contributor

bors bot commented Oct 30, 2023

Build failed:

@Sbozzolo
Copy link
Member Author

bors r+

@bors
Copy link
Contributor

bors bot commented Oct 30, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 098e92c into main Oct 30, 2023
6 checks passed
@bors bors bot deleted the gb/cache5 branch October 30, 2023 15:37
@Sbozzolo Sbozzolo mentioned this pull request Oct 30, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants