-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ufs-release/public-v2: bit-for-bit reproducibility for regional runs when changing MPI decomposition #68
ufs-release/public-v2: bit-for-bit reproducibility for regional runs when changing MPI decomposition #68
Conversation
… warnings in model/dyn_core.F90
…GFDL_atmos_cubed_sphere into regional_decomp_b4b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed. It looks good to me.
@@ -749,6 +749,9 @@ subroutine tracer_2d_nested(q, dp1, mfx, mfy, cx, cy, gridstruct, bd, domain, np | |||
reg_bc_update_time, & | |||
iq ) | |||
enddo | |||
call timing_on('COMM_TOTAL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK but could likely be avoided by having the regional boundary update fill boundary rows that lie in the haloes too: ie. if this is a west edge, then have the BCs fill in (-2:0) x (jsd:jed) instead of just (-2:0) x (jsc:jec).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the nesting code already does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lharris4 Can you explain please what you mean? I tried to skip this one update call, and the results were different. I did that with every single change in this PR, actually. Took them out on by one but left the others and reran the tests. Every time they failed, thus I think that all of them are required (unless changes are made to the regional BC update code as you mentioned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello. That is exactly what I mean: I think that if the regional BC update code is changed, this mpp_update_domains call is unnecessary. Lucas
…
Got it, thank you. Do you think it is ok to get the changes into the public release code as they are now, or should we try to follow your suggestion. The reason I am asking is that the SRW App is on a tight release schedule. The code freeze was last week and we were hoping to get the b4b issues fixed and merged as an "emergency bugfix".
We should definitely follow your suggestions for the authoritative/development branches and not try to bring over the public release code changes. Doing so would probably allow us to remove several of the already existing calls to mpp_update_domain
, if I understand this correctly, but it will require a bit of time for the development to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello. That is exactly what I mean: I think that if the regional BC update
code is changed, this mpp_update_domains call is unnecessary.
Lucas
…On Mon, Feb 8, 2021 at 10:54 AM Dom Heinzeller ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In model/fv_tracer2d.F90
<#68 (comment)>
:
> @@ -749,6 +749,9 @@ subroutine tracer_2d_nested(q, dp1, mfx, mfy, cx, cy, gridstruct, bd, domain, np
reg_bc_update_time, &
iq )
enddo
+ call timing_on('COMM_TOTAL')
@lharris4 <https://github.com/lharris4> Can you explain please what you
mean? I tried to skip this one update call, and the results were different.
I did that with every single change in this PR, actually. Took them out on
by one but left the others and reran the tests. Every time they failed,
thus I think that all of them are required (unless changes are made to the
regional BC update code as you mentioned).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMUQRVDSCTWBEFB5JZW3L4TS6ACKJANCNFSM4XFTYLUA>
.
|
Hi, Dom. Sure, this is fine.
Lucas
…On Mon, Feb 8, 2021 at 12:25 PM Dom Heinzeller ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In model/fv_tracer2d.F90
<#68 (comment)>
:
> @@ -749,6 +749,9 @@ subroutine tracer_2d_nested(q, dp1, mfx, mfy, cx, cy, gridstruct, bd, domain, np
reg_bc_update_time, &
iq )
enddo
+ call timing_on('COMM_TOTAL')
Hello. That is exactly what I mean: I think that if the regional BC update
code is changed, this mpp_update_domains call is unnecessary. Lucas
… <#m_8089939353865741152_>
Got it, thank you. Do you think it is ok to get the changes into the
public release code as they are now, or should we try to follow your
suggestion. The reason I am asking is that the SRW App is on a tight
release schedule. The code freeze was last week and we were hoping to get
the b4b issues fixed and merged as an "emergency bugfix".
We should definitely follow your suggestions for the
authoritative/development branches and not try to bring over the public
release code changes. Doing so would probably allow us to remove several of
the already existing calls to mpp_update_domain, if I understand this
correctly, but it will require a bit of time for the development to happen.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMUQRVATV3XCRHRTWU3UXR3S6ANBXANCNFSM4XFTYLUA>
.
|
Thanks, Lucas. I'll post another note when all the regression tests have passed and we are ready to merge. |
@lharris4 this is ready to merge whenever it is convenient for you - thanks! |
With @lharris4 okaying this PR, I'll merge it now. |
Hi, Jun. We will merge these changes into master and release them alongside
our updates for the regional nesting and some other bugfixes.
Thanks,
Lucas
…On Tue, May 25, 2021 at 10:53 AM Jun Wang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In model/fv_tracer2d.F90
<#68 (comment)>
:
> @@ -749,6 +749,9 @@ subroutine tracer_2d_nested(q, dp1, mfx, mfy, cx, cy, gridstruct, bd, domain, np
reg_bc_update_time, &
iq )
enddo
+ call timing_on('COMM_TOTAL')
@lharris4 <https://github.com/lharris4> @bensonr
<https://github.com/bensonr> I would like to follow up with this issue,
do we have the boundary fill updates in the master and dev/emc branch? We
still have decomposition reproducibility issue with regional test and would
like to see if we can try the fix you mentioned.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMUQRVFIBHDFFQUXDR2RJP3TPO2XBANCNFSM4XFTYLUA>
.
|
That's great, thanks for letting me know! |
…erturbations (NOAA-GFDL#239) * Update .gitmodules and submodule pointers for ccpp-framework and ccpp-physics for gsl/develop branch * RUC ice for gsl/develop (replaces NOAA-GFDL#54 and NOAA-GFDL#56) (NOAA-GFDL#60) Implementation of RUC LSM ice model in CCPP * Fix bug in gfsphysics/GFS_layer/GFS_typedefs.F90 from merge * Remove lsm_ruc_sfc_sice from suite FV3_GSD_v0_unified_ugwp_suite and update submodule pointer for ccpp-physics * Remove sfc_sice from ccpp/suites/suite_FV3_GSD_v0_unified_ugwp_suite.xml * Update gsl/develop from develop 2020/12/08 (NOAA-GFDL#61) * Fix for updating stochastic physics on separate time-step. (NOAA-GFDL#199) This bug fix allows the random patterns in the stochastic physics persist the for a period of time (defined as SKEBINT,SPPTINT, etc.) before calculating new patterns. The fix is to move the allocation of the saved variables into the init section of stochastic_physics_wrapper, and remove the deallocates in the run section. * Bug fixes in (1) running with frac_grid=T and GFDL MP and (2) restarting with frac_grid=T (NOAA-GFDL#204) * -- Pointing to Moorthi's modifications in ccpp/physics, which fixed the crash when running GFDL MP with frac_grid=T; -- Not setting fice to zero in order to leave lake ice untouched; -- Restart in the coupled model with the default physics is reproducible, if bad water temperature is only filtered at initial time; Co-authored-with: Shrinivas Moorthi <[email protected]> Co-authored-with: Denise Worthen <[email protected]> * Revert change to .gitmodules and update submodule pointer for ccpp-physics * Update submodule pointer for ccpp-physics - MYNN surface layer updates and bugfixes (NOAA-GFDL#63) * Land stochastic perturbations (wrapper PR for NOAA-GFDL#65) (NOAA-GFDL#68) * Move initialization of stochastic physics after the physics initialization in CCPP. * Add albedo variables to land perturbations with lndp_type=2 option. Change to accommodate soil perturbations with RUC LSM. * Max/min soil moisture variables are introduced via GFS_Control_type variables instead of through the use of namelist_soilveg*. This is a more flexible way for different LSMs. * Added pores and resid variables for max/min soil moisture to GFS_typedefs.f90. * Remove tracer_sanitizer from all suites and from CCPP prebuild config * Add namelist option to apply land surface perturbations at every time step, clean up stochastic_physics/stochastic_physics_wrapper.F90 * Stochastic land perturbations: add roughness length over land to the perturbed variables (NOAA-GFDL#70) * Added roughness length over land to the perturbed variables. * Bugfix in gfsphysics/GFS_layer/GFS_typedefs.F90: remove Diag%cldcov, in particular the reset call because the variable is not allocated * Update .gitmodules and submodule pointer for GFDL_atmos_cubed_sphere for code review and testing * Revert change to .gitmodules for ccpp-physics, update submodule pointer for ccpp-physics * Revert change to .gitmodules and update submodule pointer for GFDL_atmos_cubed_sphere Co-authored-by: DomHeinzeller <[email protected]> Co-authored-by: Phil Pegion <[email protected]> Co-authored-by: shansun6 <[email protected]> Co-authored-by: tanyasmirnova <[email protected]>
This reverts commit 32b44d9.
* Revert "revise external_ic.F90 and fv_nudge.F90 (#68)" This reverts commit 32b44d9. * Revert "fixing call to pmaxmin to no longer get a compilation error when compiling with GNU." This reverts commit 2aa049c. * Revert "Update external_ic.F90 and fv_nudge.F90 to use allocatable arrays (#65)" This reverts commit 44211c0. * Revert "update external_ic.F90 and fv_nudge.F90 (#63)" This reverts commit 81b9be0. * Revert "Fix OVERLOAD_R4 ifdef block as suggested by @junwang-noaa (#60)" This reverts commit 63a4603. * Revert "Update code to use 'constantsR4_mod' module (#59)" This reverts commit 7b8ee4c.
This PR fixes b4b mismatches when changing the MPI decomposition for regional runs (see ufs-community/ufs-weather-model#409 and ufs-community/ufs-weather-model#288) for more details.
mpp_update_domain
forq
inmodel/fv_tracer2d.F90
#endif
statements inmodel/dyn_core.F90
model/sw_core.F90
Note 1. Each of the added/modified calls to update the domain or halos is required. After achieving b4b identical results, I removed each of the changes shown here and left the rest, and every time the results were no longer b4b.
Note 2. The perhaps most confusing change is to move the call to
call start_group_halo_update(i_pack(8), u, v, domain, gridtype=DGRID_NE)
to further down in the code inmodel/sw_core.F90
, because there is a call to a regional boundary update foru
andv
after the original location.Note 3. I am not an expert in the FV3 dycore, and it is possible that there are alternative and easier ways to achieve the same.
Associated PRs:
#68
NOAA-EMC/fv3atm#245
ufs-community/ufs-weather-model#409
For regression testing, see ufs-community/ufs-weather-model#409.