Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU exection of atm_divergence_damping_3d via OpenACC #1237

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

abishekg7
Copy link
Collaborator

This PR enables the GPU execution of atm_divergence_damping_3d subroutine.

Tested with a real and idealized test cases.

@mgduda mgduda requested review from mgduda and gdicker1 October 19, 2024 00:26
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Oct 19, 2024
Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested this today for the merged branch (develop+atmosphere/port_divergence_damping_3d) versus the base branch (develop), the results matched!

However, I have some comments to address:

@@ -2599,6 +2599,13 @@ subroutine atm_divergence_damping_3d( state, diag, mesh, configs, dts, edgeStart
rdts = 1.0_RKIND / dts
coef_divdamp = 2.0_RKIND * smdiv * config_len_disp * rdts

MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]')
!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, specZoneMaskEdge, &
!$acc theta_m, cellsOnEdge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specZoneMaskEdge and cellsOnEdge are invariant fields (the won't change after init). I believe your PR should make sure these variables are handled in atm_dynamics_init and shouldn't copyin or delete these fields in atm_divergence_damping_3d.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the convention we've been following so far? I assumed the routine to be ported has localized data movement clauses. I might also need to rework my other PRs if this is the case. Thanks for pointing it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to follow up.. Talked to @mgduda, and got this cleared up. Will make changes to these invariant fields. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the convention we've been following so far?

I've been following that pattern, it was set in #1176.

I've been doing things similarly to what may happen here - do all the data movement that is needed locally for the function being ported, and then separate out those invariant fields and handle them in the atm_dynamics_{init,finalize} routines. Typically these invariant fields are part of the mesh var_struct as described in the src/core_atmosphere/Registry.xml.

!$acc theta_m, cellsOnEdge)
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]')

!$acc parallel async
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async could be dangerous here, I don't believe acc exit data copyout(ru_p) will wait for this kernel to finish before beginning to copy.

I think it would be best to remove this async since we don't need it (yet) and it could create issues for the kernels that come next and depend on ru_p

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove the async. thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this async, but wanted to note that there's another one in atm_advance_scalars_work. Is that fine for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this async, but wanted to note that there's another one in atm_advance_scalars_work. Is that fine for now?

I think we're good to leave that async clause for now, it has a matching wait directive/clause later on.

@abishekg7
Copy link
Collaborator Author

I've addressed the review comments, and also checked the output restart file for bit-for-bit reproducibility.

Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fixes and my testing1, this PR has my approval.

Footnotes

  1. Using the regional test case for a few timesteps and comparing the restart, diag, and history output files. The baseline and "feature" branch tests matched!


MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]')
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m)
!$acc end data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this !$acc end data clause needed? (That's an actual, not a rhetorical, question!)

!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, theta_m)
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]')

!$acc parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the !$acc data present directive above on line 2611, it might be simpler and more robust to just add a default(present) clause to this parallel directive.

@gdicker1 Does this align with your understanding of best practices?

!$acc end parallel

MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]')
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of clarity, I would suggest either having separate directives for copyout and delete, or at least splitting the copyout and delete clauses onto separate lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants