Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make swap_spaces not allocating and more GPU compatible #249

Merged
merged 1 commit into from
Feb 23, 2023

Conversation

valeriabarra
Copy link
Member

@valeriabarra valeriabarra commented Feb 6, 2023

Purpose

The purspose of this PR is to improve swap_space so that it is not allocating and to make it GPU compatible.

Closes #248

To-do

  • Remove internal call to ClimaCore parent This cannot be done internally in the function (see error: ERROR: LoadError: syntax: Global method definition around .... needs to be placed at the top level, or use "eval". because of the global method table). So it needs to be worked around wherever it's used.
  • Make it @inline?

Content

We remove the internal usage of ClimaCore parent by using ClimaCore's Fields.allow_mismatched_diagonalized_spaces() function as a temporary workaround, in the absence of a better ClimaCore's interface

Review checklist

I have:

In the Content, I have included

  • relevant unit tests, and integration tests,

  • appropriate docstrings on all functions, structs, and modules, and included relevant documentation.

  • I have read and checked the items on the review checklist.

@valeriabarra valeriabarra changed the title Make swap_spaces not allocating Use swap_spaces as not allocating Feb 6, 2023
@valeriabarra valeriabarra self-assigned this Feb 6, 2023
@valeriabarra valeriabarra marked this pull request as draft February 6, 2023 21:56
@valeriabarra valeriabarra force-pushed the valeria/swap_spaces branch 5 times, most recently from ab0213d to 528ec61 Compare February 8, 2023 00:39
@valeriabarra valeriabarra changed the title Use swap_spaces as not allocating Make swap_spaces not allocating and GPU compatible Feb 9, 2023
@valeriabarra valeriabarra force-pushed the valeria/swap_spaces branch 3 times, most recently from 8e488c8 to 213bd2a Compare February 13, 2023 21:50
@valeriabarra valeriabarra marked this pull request as ready for review February 16, 2023 01:52
@valeriabarra valeriabarra changed the title Make swap_spaces not allocating and GPU compatible Make swap_spaces not allocating and more GPU compatible Feb 16, 2023
@valeriabarra valeriabarra removed the request for review from simonbyrne February 16, 2023 02:38
@charleskawczynski
Copy link
Member

Hi @valeriabarra,

To clarify, parent on ClimaCore Fields is not allocating, parent returns a reference to the underlying array. Ideally, we would still like to avoid using parent so that users do not write code that depends on the underlying data layout, because doing so may make such code incompatible if they were to change the underlying Space.

Calling allow_mismatched_diagonalized_spaces is really a kluge, and we shouldn't use it like this here (overloading the method table at runtime like this is generally not a good idea since it will trigger recompilation). If we can't simply remove the call to parent, then I suggest we revert the parent-related changes from the PR (the SDI says that parent related changes are potential future improvements anyway). We already have an issue in ClimaCore for this problem: CliMA/ClimaCore.jl#1120, so no need to open one. I assume that simply applying these changes didn't work, in which case, let's revert the parent-related changes and apply the allocation-related changes needed.

swap_space! is allocating because of this line:

field_out = zeros(new_space)
, which allocates a field. It turns out that this is actually calling zeros defined on a ClimaCore space, so it may already be (or can be made on the ClimaCore-side) GPU-compatible-- but it's still allocating. I suggest you try to make swap_space! a 3-argument function, and pass field_out as an argument (and then make required changes to use this new 3-argument function). Doing so will allow us to push field_out outside of any loops and avoid repeated allocations.

@valeriabarra
Copy link
Member Author

Thank you so much for the thorough suggestion. It was previously suggested to me to use this allow_mismatched_diagonalized_spaces work-around and eliminate the internal call to parent, but yes, I recognize that this was a clumsy work-around and we should proceed the way you suggested, which is much better. I will apply your suggestions as soon as possible.

@valeriabarra valeriabarra force-pushed the valeria/swap_spaces branch 2 times, most recently from 02acd0d to b052662 Compare February 21, 2023 00:38
@valeriabarra
Copy link
Member Author

Hey @charleskawczynski I applied the suggestions. I made swap_space a 2-argument function (I really think 3 were not needed, as the space was passed just to create the zero field inside). This way, we just move the allocation from inside the swap_space to outside of it. There is no allocation gain, and only marginal flame graphs improvement. There are not explicit for loops where swap_space is called, so that the output field can be allocated only once out of the loop. But it is implicitly called by the coupling time stepping. Unless we want to pass the field_out all the way from the time-stepping routine, I am not sure how the proposed changes would improve things.

Can you please confirm that what I implemented aligns with what you had in mind? Or clarify otherwise? Thanks!

Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @valeriabarra . I left a comment on passing a boundary_field like we do with boundary_space right now. Hopefully that'll help with allocations. It would be good to demonstrate on a minimal working example (MWE) the approaches you tried to tackle this problem (as far as I know there are 1. the flag turning off the instance error, 2. creating a field dynamically as the swap_space is being called, and 3. creating a dummy field similar to the boundary_space as mentioned above). Since we are kind of blocked by this long-standing issue of mismatching ClimaCore space instances, we may need to defer this issue (hopefully it will be solved soon, now that it's tracked by CliMA/ClimaCore.jl#1120). That said, it would be good to justify this by summarising your findings using the MWE.

the temperature of the atmosphere at the lowest level, and the heigh
of the lowest level.
"""
function set_ρ_sfc!(ρ_sfc, T_S, integrator)
ts = integrator.p.ᶜts
thermo_params = CAP.thermodynamics_params(integrator.p.params)
ts_int = Spaces.level(ts, 1)
parent(ρ_sfc) .= parent(ρ_sfc_at_point.(thermo_params, ts_int, swap_space!(T_S, axes(ts_int))))
parent(ρ_sfc) .= parent(ρ_sfc_at_point.(thermo_params, ts_int, swap_space!(zeros(axes(ts_int)), T_S)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new field here, can we create it as part of the CouplerSimulation struct? Similar to how we treat boundary_space. @charleskawczynski is this what you meant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Here is the MWE gist where I compare all the different approaches. Creating it as part of the CoupledSimulation struct is fine, but we need to be careful to create a temp copy, wherever we have repeated usages, such as in these conservation checks. I can proceed with the proposed changes (Case 4 in the MWE gist) if you agree that's the best option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually seems like Case 2 in the gist is the best option for our purposes, so I will leave the current version of the code. I am going to squash/rebase and you're good to merge. Thanks for the review!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output of the gist at first run:

[ Info: Case 1:
alloc_u_ocn = 19493429
u_ocn = Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
alloc_u_ice = 0
u_ice = Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
[ Info: Case 2:
alloc_u_ocn = 12800
u_ocn = Float64-valued Field:
  [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0    1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
alloc_u_ice = 12800
u_ice = Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
[ Info: Case 3:
alloc_u_ocn = 33705
u_ocn = Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
alloc_u_ice = 1008
u_ice = Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
[ Info: Case 4:
alloc_u_ocn = 465631
u_ocn = Float64-valued Field:
  [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0    1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
alloc_u_ice = 1008
u_ice = Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
Float64-valued Field:
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0    2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]

@valeriabarra
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Feb 23, 2023
@LenkaNovak LenkaNovak self-requested a review February 23, 2023 01:11
Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Valeria. After adjusting the alloc limit, feel free to merge.

@valeriabarra
Copy link
Member Author

Thanks, Valeria. After adjusting the alloc limit, feel free to merge.

Thank you! Just to clarify, the alloc limit adjustment is not due to this PR but to upstream changes. Thanks, will do.

@valeriabarra
Copy link
Member Author

bors r+

bors bot added a commit that referenced this pull request Feb 23, 2023
249: Make `swap_spaces` not allocating and more GPU compatible r=valeriabarra a=valeriabarra

## Purpose 
The purspose of this PR is to improve `swap_space` so that it is not allocating and to make it GPU compatible.  

Closes #248 

## To-do
- [x] ~~Remove internal call to ClimaCore `parent`~~ This cannot be done internally in the function (see error: `ERROR: LoadError: syntax: Global method definition around .... needs to be placed at the top level, or use "eval".` because of the global method table). So it needs to be worked around wherever it's used.
- [x] Make it ``@inline`?`

## Content
We remove the internal usage of ClimaCore `parent` by using ClimaCore's `Fields.allow_mismatched_diagonalized_spaces()` function as a temporary workaround, in the absence of a better ClimaCore's interface 

Review checklist

I have:
- followed the codebase contribution guide: https://clima.github.io/ClimateMachine.jl/latest/Contributing/
- followed the style guide: https://clima.github.io/ClimateMachine.jl/latest/DevDocs/CodeStyle/
- followed the documentation policy: https://github.com/CliMA/policies/wiki/Documentation-Policy
- checked that this PR does not duplicate an open PR.

In the Content, I have included 
- relevant unit tests, and integration tests, 
- appropriate docstrings on all functions, structs, and modules, and included relevant documentation.


- [x] I have read and checked the items on the review checklist.


Co-authored-by: Valeria Barra <[email protected]>
@bors
Copy link
Contributor

bors bot commented Feb 23, 2023

try

Build failed:

@bors
Copy link
Contributor

bors bot commented Feb 23, 2023

Build failed:

@valeriabarra
Copy link
Member Author

bors r+

bors bot added a commit that referenced this pull request Feb 23, 2023
249: Make `swap_spaces` not allocating and more GPU compatible r=valeriabarra a=valeriabarra

## Purpose 
The purspose of this PR is to improve `swap_space` so that it is not allocating and to make it GPU compatible.  

Closes #248 

## To-do
- [x] ~~Remove internal call to ClimaCore `parent`~~ This cannot be done internally in the function (see error: `ERROR: LoadError: syntax: Global method definition around .... needs to be placed at the top level, or use "eval".` because of the global method table). So it needs to be worked around wherever it's used.
- [x] Make it ``@inline`?`

## Content
We remove the internal usage of ClimaCore `parent` by using ClimaCore's `Fields.allow_mismatched_diagonalized_spaces()` function as a temporary workaround, in the absence of a better ClimaCore's interface 

Review checklist

I have:
- followed the codebase contribution guide: https://clima.github.io/ClimateMachine.jl/latest/Contributing/
- followed the style guide: https://clima.github.io/ClimateMachine.jl/latest/DevDocs/CodeStyle/
- followed the documentation policy: https://github.com/CliMA/policies/wiki/Documentation-Policy
- checked that this PR does not duplicate an open PR.

In the Content, I have included 
- relevant unit tests, and integration tests, 
- appropriate docstrings on all functions, structs, and modules, and included relevant documentation.


- [x] I have read and checked the items on the review checklist.


Co-authored-by: Valeria Barra <[email protected]>
@bors
Copy link
Contributor

bors bot commented Feb 23, 2023

Build failed:

@valeriabarra
Copy link
Member Author

bors r+

@bors
Copy link
Contributor

bors bot commented Feb 23, 2023

@bors bors bot merged commit 5b7dfcf into main Feb 23, 2023
@bors bors bot deleted the valeria/swap_spaces branch February 23, 2023 05:42
@valeriabarra valeriabarra mentioned this pull request Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make swap_space as not allocating and more GPU compatible
3 participants