-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU exection of atm_divergence_damping_3d via OpenACC #1237
base: develop
Are you sure you want to change the base?
Enable GPU exection of atm_divergence_damping_3d via OpenACC #1237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested this today for the merged branch (develop+atmosphere/port_divergence_damping_3d) versus the base branch (develop), the results matched!
However, I have some comments to address:
@@ -2599,6 +2599,13 @@ subroutine atm_divergence_damping_3d( state, diag, mesh, configs, dts, edgeStart | |||
rdts = 1.0_RKIND / dts | |||
coef_divdamp = 2.0_RKIND * smdiv * config_len_disp * rdts | |||
|
|||
MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]') | |||
!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, specZoneMaskEdge, & | |||
!$acc theta_m, cellsOnEdge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specZoneMaskEdge
and cellsOnEdge
are invariant fields (the won't change after init). I believe your PR should make sure these variables are handled in atm_dynamics_init
and shouldn't copyin or delete these fields in atm_divergence_damping_3d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the convention we've been following so far? I assumed the routine to be ported has localized data movement clauses. I might also need to rework my other PRs if this is the case. Thanks for pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to follow up.. Talked to @mgduda, and got this cleared up. Will make changes to these invariant fields. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the convention we've been following so far?
I've been following that pattern, it was set in #1176.
I've been doing things similarly to what may happen here - do all the data movement that is needed locally for the function being ported, and then separate out those invariant fields and handle them in the atm_dynamics_{init,finalize}
routines. Typically these invariant fields are part of the mesh
var_struct as described in the src/core_atmosphere/Registry.xml.
!$acc theta_m, cellsOnEdge) | ||
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]') | ||
|
||
!$acc parallel async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async
could be dangerous here, I don't believe acc exit data copyout(ru_p)
will wait for this kernel to finish before beginning to copy.
I think it would be best to remove this async
since we don't need it (yet) and it could create issues for the kernels that come next and depend on ru_p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove the async
. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this async, but wanted to note that there's another one in atm_advance_scalars_work
. Is that fine for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this async, but wanted to note that there's another one in atm_advance_scalars_work. Is that fine for now?
I think we're good to leave that async
clause for now, it has a matching wait
directive/clause later on.
I've addressed the review comments, and also checked the output restart file for bit-for-bit reproducibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]') | ||
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m) | ||
!$acc end data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this !$acc end data
clause needed? (That's an actual, not a rhetorical, question!)
!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, theta_m) | ||
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]') | ||
|
||
!$acc parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than the !$acc data present
directive above on line 2611, it might be simpler and more robust to just add a default(present)
clause to this parallel
directive.
@gdicker1 Does this align with your understanding of best practices?
!$acc end parallel | ||
|
||
MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]') | ||
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of clarity, I would suggest either having separate directives for copyout
and delete
, or at least splitting the copyout
and delete
clauses onto separate lines.
This PR enables the GPU execution of atm_divergence_damping_3d subroutine.
Tested with a real and idealized test cases.