Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add split-explicit AB2 time stepping capability to the ocean component #5989

Merged
merged 12 commits into from
Nov 21, 2023

Conversation

hyungyukang
Copy link
Contributor

@hyungyukang hyungyukang commented Oct 12, 2023

The second-order Adams Bashforth (AB2) time stepping method is applied to the baroclinic system of MPAS-Ocean to enable faster computation of the baroclinic system. The AB2 time stepping method, one of multistep methods, computes time stepping procedure (Stage 1~3) once per time step, while the predictor-corrector which is a default scheme computes it twice per time step. In practice, the AB2 method can provide a speedup of 1.5x to 1.8x. Due to its high sensitivity to time step size, the predictor-corrector schem is only applied to the layer thickness equation.

All subroutines and the barotropic system advance (Stage 2) in the AB2 code are the same as used in the split-explicit code (mpas_ocn_time_integration_split.F).

To enable this option, users need to set config_time_integrator = 'split_explicit_ab2'.

Results of G-case (100 years) and WCYCL1850 (30 years) and more details of the method can be found here: https://acme-climate.atlassian.net/wiki/spaces/ImPACTS/pages/3826385232/AB2+time+stepping+for+the+baroclinic+system

[NML]
[non-BFB]

@hyungyukang hyungyukang added mpas-ocean BFB PR leaves answers BFB Stealth PR has feature which, if turned on, could change climate. fka FCC labels Oct 12, 2023
@github-actions
Copy link

github-actions bot commented Oct 12, 2023

PR Preview Action v1.4.4
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-5989/
on branch gh-pages at 2023-11-15 20:42 UTC

@sarats
Copy link
Member

sarats commented Oct 16, 2023

Tangentially, we need to discuss if we wish to include this PR in our planned high res GPU simulations
https://acme-climate.atlassian.net/wiki/spaces/ImPACTS/pages/3902799937/High-res+Ocean+GPU+simulations

@hyungyukang
Copy link
Contributor Author

Tangentially, we need to discuss if we wish to include this PR in our planned high res GPU simulations
https://acme-climate.atlassian.net/wiki/spaces/ImPACTS/pages/3902799937/High-res+Ocean+GPU+simulations

@sarats , I will run this code on GPU and report here. Thanks!

@hyungyukang
Copy link
Contributor Author

Tangentially, we need to discuss if we wish to include this PR in our planned high res GPU simulations
https://acme-climate.atlassian.net/wiki/spaces/ImPACTS/pages/3902799937/High-res+Ocean+GPU+simulations

@sarats , After modifying some OpenACC directives, this code runs well on GPUs as tested on PM-GPU for both the standalone MPAS-O and the ocean-ice coupled (GMPAS-IAF) case.

This PR does not address the issue of running simulations on PM-GPU. I've referenced @grnydawn 's fix for PM-GPU at here: https://github.com/E3SM-Project/E3SM/tree/ykim/mpas/gcase.

@mark-petersen
Copy link
Contributor

@hyungyukang can you comment here on the exact tests you did for performance and quality? Do you have scaling plots for particular resolutions and machines? If information is in slide format, you can just upload them.

@hyungyukang
Copy link
Contributor Author

hyungyukang commented Oct 24, 2023

@mark-petersen , I've run the standalone MPAS-O on RRS18to6v3 mesh with a real-world configuration using atmospheric forcing to check performance and quality.

  • Computational performance (Runtime was measured from 'se timestep')
    • Tested on the AF machine at OLCF, HPC11 (Miller), which has a nearly identical architecture to NERSC's Perlmutter.

image

  • Kinetic energy field (k=1) comparison
    • AB2 uses 75% of the default del4 diffusion coefficient

image

@mark-petersen
Copy link
Contributor

@hyungyukang, thank you for these results. They look fantastic!

I tested this branch with both optimized and debug, using gnu on perlmutter and intel on chrysalis. Using the default flags (including 'split-explicit' timestepping) everything passes and compares bfb against the master branch point, as expected. Using the new AB2 method (config_time_integrator = 'split_explicit_ab2') in debug, both gnu and intel produce the following results.

nightly suite output:

00:05 PASS ocean_baroclinic_channel_10km_default
00:03 PASS ocean_baroclinic_channel_10km_threads_test
00:03 PASS ocean_baroclinic_channel_10km_decomp_test
00:03 PASS ocean_baroclinic_channel_10km_restart_test
01:42 PASS ocean_global_ocean_QU240_mesh
00:47 PASS ocean_global_ocean_QU240_WOA23_init
00:35 PASS ocean_global_ocean_QU240_WOA23_performance_test
01:04 FAIL ocean_global_ocean_QU240_WOA23_restart_test
01:04 PASS ocean_global_ocean_QU240_WOA23_decomp_test
01:04 PASS ocean_global_ocean_QU240_WOA23_threads_test
00:37 PASS ocean_global_ocean_QU240_WOA23_analysis_test
00:35 PASS ocean_global_ocean_QU240_WOA23_RK4_performance_test
01:05 PASS ocean_global_ocean_QU240_WOA23_RK4_restart_test
01:04 PASS ocean_global_ocean_QU240_WOA23_RK4_decomp_test
01:04 PASS ocean_global_ocean_QU240_WOA23_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_WOA23_init
01:11 PASS ocean_global_ocean_QUwISC240_WOA23_performance_test
00:12 PASS ocean_ice_shelf_2d_5km_z-star_restart_test
00:13 PASS ocean_ice_shelf_2d_5km_z-level_restart_test
00:06 PASS ocean_ziso_20km_default
00:05 PASS ocean_ziso_20km_with_frazil
Note that the restart test fails with AB2. Do we need to include the previous two timestep states in the restart file for AB2, rather than just one? A bit-for-bit match between fields when comparing with and without a restart is required.

@hyungyukang
Copy link
Contributor Author

hyungyukang commented Oct 25, 2023

@mark-petersen , Thanks for testing! I don't think we need to include previous two steps, but I'm going to look into it today. For QU240 restart tests, AB2 passes QU240_PHC_restart_test but failed QU240_WOA23_restart_test. What's the difference between those two QU240 tests?

@xylar
Copy link
Contributor

xylar commented Oct 25, 2023

For QU240 restart tests, AB2 passes QU240_PHC_restart_test but failed QU240_WOA23_restart_test. What's the difference between two QU240 tests?

That sounds very odd because the difference between the two is just the initial condition (i.e. the T and S fields being interpolated from). Are you sure you ran the PHC restart test with config_time_integrator = 'split_explicit_ab2'?

@hyungyukang
Copy link
Contributor Author

hyungyukang commented Oct 25, 2023

For QU240 restart tests, AB2 passes QU240_PHC_restart_test but failed QU240_WOA23_restart_test. What's the difference between two QU240 tests?

That sounds very odd because the difference between the two is just the initial condition (i.e. the T and S fields being interpolated from). Are you sure you ran the PHC restart test with config_time_integrator = 'split_explicit_ab2'?

@xylar , Another odd thing is that AB2 passes the baroclinic channel restart tests on both Compy and Perlmutter...
Anyway, I left my Compy RSA authenticator at home so I can't check my configuration on Compy now. But I will setup the same tests on Perlmutter and look into results and codes again. Please let me know if you have any suggestions on a test side!

@xylar
Copy link
Contributor

xylar commented Oct 25, 2023

The method obviously doesn't require storing multiple time levels in restart files or no restart tests would be passing. This is obviously a different AB2 method than the one I used when I was a grad student (involving times n and n-1).

@mark-petersen
Copy link
Contributor

The restart tests work and are bfb for active ocean cells. For some reason on AB2 deeper land edges get that dummy value of -1e34, but only for restart runs and not for the very first forward run. I'm looking into it now.

cd ocean/global_ocean/QU240/WOA23/restart_test
ncdump -v normalVelocity restart_run/output.nc |tail -n3
  0, 0, 0, 0, -1e+34, -1e+34, -1e+34, -1e+34, -1e+34, -1e+34, -1e+34, -1e+34,
    -1e+34, -1e+34, -1e+34, -1e+34 ;
}

@hyungyukang hyungyukang force-pushed the hkang/ocean/AB2-timestep branch from d7d2896 to 54a3220 Compare October 26, 2023 12:57
@hyungyukang
Copy link
Contributor Author

hyungyukang commented Oct 26, 2023

@mark-petersen and @xylar, Thanks for reviewing and your suggestions! Based on your findings and suggestions, I added a fix to subroutine ocn_time_integrated_split_ab2_init to initialize normalVelocity=0 on inactive cells when restarting. This also fixes the G-case restart issue we discussed yesterday.

code change
      if ( config_do_restart ) then
         block => domain % blocklist
         call mpas_pool_get_subpool(block%structs, 'state', statePool)
         call mpas_pool_get_subpool(statePool, 'tracers', tracersPool)
         call mpas_pool_get_subpool(block%structs, 'mesh', meshPool)

         call mpas_pool_get_dimension(block % dimensions, 'nVertLevels', nVertLevels)
         call mpas_pool_get_dimension(block % dimensions, 'nEdges', nEdges)

         call mpas_pool_get_array(statePool, 'normalVelocity', &
                                              normalVelocity, 1)

         call mpas_pool_get_array(meshPool, 'minLevelEdgeBot', minLevelEdgeBot)
         call mpas_pool_get_array(meshPool, 'maxLevelEdgeTop', maxLevelEdgeTop)

         ! normalVelocity=0 on inactive cells
         do iEdge = 1, nEdges
            ! at top
            do k = 1, minLevelEdgeBot(iEdge)-1
               normalVelocity(k,iEdge) = 0.0_RKIND
            enddo

            ! at bottom
            do k = maxLevelEdgeTop(iEdge)+1, nVertLevels
               normalVelocity(k,iEdge) = 0.0_RKIND
            enddo
         enddo ! edge loop

      end if
nightly suite output

00:15 PASS ocean_baroclinic_channel_10km_default
00:15 PASS ocean_baroclinic_channel_10km_threads_test
00:09 PASS ocean_baroclinic_channel_10km_decomp_test
00:09 PASS ocean_baroclinic_channel_10km_restart_test
04:24 PASS ocean_global_ocean_QU240_mesh
03:48 PASS ocean_global_ocean_QU240_WOA23_init
03:46 PASS ocean_global_ocean_QU240_WOA23_performance_test
07:41 PASS ocean_global_ocean_QU240_WOA23_restart_test
06:09 PASS ocean_global_ocean_QU240_WOA23_decomp_test
05:44 PASS ocean_global_ocean_QU240_WOA23_threads_test
06:30 PASS ocean_global_ocean_QU240_WOA23_analysis_test
02:38 PASS ocean_global_ocean_QU240_WOA23_RK4_performance_test
05:50 PASS ocean_global_ocean_QU240_WOA23_RK4_restart_test
05:27 PASS ocean_global_ocean_QU240_WOA23_RK4_decomp_test
05:28 PASS ocean_global_ocean_QU240_WOA23_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_WOA23_init
05:31 PASS ocean_global_ocean_QUwISC240_WOA23_performance_test
00:48 PASS ocean_ice_shelf_2d_5km_z-star_restart_test
00:45 PASS ocean_ice_shelf_2d_5km_z-level_restart_test
00:18 PASS ocean_ziso_20km_default
00:12 PASS ocean_ziso_20km_with_frazil
Total runtime 66:37
PASS: All passed successfully!

And I double checked I'm using config_time_integrator = 'split_explicit_ab2' in namelist files for the nightly suite.

@hyungyukang
Copy link
Contributor Author

The attached is slides about this AB2 implementation.
MPASO_AB2_timestep.pdf

@mark-petersen
Copy link
Contributor

Thanks! I tested with the nightly suite with gnu and intel, and everything passes.

The second-order Adams Bashforth (AB2) time stepping method for
the baroclinic system (config_time_integrator = 'split_explicit_ab2')
is implemented. The AB2 time stepping method, one of multistep methods,
computes time stepping procedure (Stage 1~3) once per time step, while
the predictor-corrector which is default computes it twice per time step.
Therefore, the AB2 method can theoretically reduce model runtime by up to half.
In practice, the AB2 method can provide a speedup of 1.5x to 1.8x.

Due to its high sensitivity to time step size, the predictor-corrector scheme
is only applied to the layer thickness equation.

All subroutines and the barotropic system advance (Stage 2) in the AB2 code
are the same as in the split-explicit code (mpas_ocn_time_integration_split.F).
- Update Registry.xml and some codes to write
restart variables used in AB2 time stepping
- Revised OpenACC directives to fix issues when running on GPUs
@jonbob
Copy link
Contributor

jonbob commented Nov 14, 2023

Testing of this PR in conjunction with PR #6035 is being undertaken by the coupled model group. The run with these two PRs is 20231106.v3b01-AB2.piControl.chrysalis and the corresponding case without AB2 for an apples-to-apples comparison is 20231105.v3b01.piControl.chrysalis. Both runs have gone more than 200 years

@mark-petersen
Copy link
Contributor

Thanks @jonbob for the B-case comparisons. I spent an hour carefully comparing the two simulations. Here are my conclusions. In all screenshots we have MPAS-Analysis from years 151-200 with:

  1. Dynamic variables are identical by eye, within variability: transports, MHT, SST, EKE.
plots:

image
image
image
image

  1. AMOC is low in both, but slightly higher in AB2 (~9 Sv) versus control (~8 Sv). This difference might lie within the variability.
plots:

Right plot is squashed for axis comparisons:

image
image

  1. Sea ice fields look identical by eye.
plots:

image
image
image
image
image
image

  1. THE ONLY VISIBLE DIFFERENCE that I found is in the heat content trends. AB2 OHC drift is smaller, which is better for an 1850 control run.
plots:

Right plot is squashed for axis comparisons:

image
image

  1. Despite the difference in OHC trends, the global comparisons at these depths still look identical by eye, so there are no obvious problems with a particular region or process.
plots:

image
image
image

For example, Antarctic bottom temperature is very warm in both, but identical by eye between the control and AB2. Also for salinity.

plots:

image
image

In summary, the only visible differences with AB2 are the temperature trends. The temperature drifts are smaller than the control, so are preferred. In addition, the spatial distribution of the temperatures look nearly identical by eye, so there are no problems that stand out.

@mark-petersen mark-petersen self-requested a review November 15, 2023 17:49
Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving again based on the B-case simulations and results above. I feel comfortable with AB2 as the default baroclinic time stepping scheme for V3.

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused on the differences in temperature (and heat content) in the shallower ocean. I'm seeing small differences but they're much less than the biases compared with Argo:

AB2:
image

SE:
image

So I'm comfortable with the AB2 results. The performance gain is remarkable! Excellent work!

Copy link
Contributor

@cbegeman cbegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyungyukang and @mark-petersen I'm glad to hear the 200 year comparison turned out so well (similar to control). Great work!

!> \details
!> This module contains the routine for the split explicit time
!> integration scheme, where the second-order Adams Bashforth (AB2)
!> time stepping method is applied to the baroclinic system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyungyukang can we modify this comment to clarify that the AB2 stepping is for momentum only? The tracer is euler forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanroekel , Sure I will change it. Thanks a lot!

Copy link
Contributor

@vanroekel vanroekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyungyukang this is excellent, great work! I have one tiny request, when I looked through the code it appears AB2 is applied to momentum only and I believe you confirmed that. Could we clarify that in the header of the ab2 file?

@hyungyukang hyungyukang force-pushed the hkang/ocean/AB2-timestep branch from ada4136 to b2067b7 Compare November 15, 2023 20:41
Copy link
Contributor

@vanroekel vanroekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving based on visual inspection and testing from @hyungyukang @mark-petersen and @jonbob

@rljacob rljacob modified the milestones: v3.0beta01, v3.0beta02 Nov 16, 2023
jonbob added a commit that referenced this pull request Nov 20, 2023
Add split-explicit AB2 time stepping capability to the ocean component

The second-order Adams Bashforth (AB2) time stepping method is applied
to the baroclinic system of MPAS-Ocean to enable faster computation of
the baroclinic system. The AB2 time stepping method, one of multistep
methods, computes time stepping procedure (Stage 1~3) once per time
step, while the predictor-corrector which is a default scheme computes
it twice per time step. In practice, the AB2 method can provide a
speedup of 1.5x to 1.8x. Due to its high sensitivity to time step size,
the predictor-corrector scheme is only applied to the layer thickness
equation.

All subroutines and the barotropic system advance (Stage 2) in the AB2
code are the same as used in the split-explicit code
(mpas_ocn_time_integration_split.F).

[NML]
[non-BFB]
@jonbob
Copy link
Contributor

jonbob commented Nov 20, 2023

passes sanity testing with expected DIFFs -- merged to next

@jonbob jonbob added the NML label Nov 20, 2023
jonbob added a commit that referenced this pull request Nov 20, 2023
Update weights for ocean barotropic subcycling in split-explicit solver

This PR updates the weights config_btr_gam1_velWt1,
config_btr_gam2_SSHWt1 in the MPAS-Ocean barotropic solver based on
recent analysis of this scheme. This update applies to "split explicit"
time stepping schemes, i.e. config_time_integrator = 'split_explicit'
and the new config_time_integrator = 'split_explicit_ab2' in #5989. The
new weights allow for a barotropic time step config_btr_dt to be 25%
longer in the tests of EC30to60, thus speeding up the barotropic
subcycling. We expect that values of config_btr_dt can be increased by
25% for all meshes.

[NML]
[non-BFB]
@hyungyukang
Copy link
Contributor Author

@mark-petersen , @sarats , @xylar , @cbegeman , @vanroekel , and @jonbob , thank you so much for your review, testing, suggestions, comments, and approval of this PR!

@jonbob jonbob merged commit 317cf68 into master Nov 21, 2023
@jonbob jonbob deleted the hkang/ocean/AB2-timestep branch November 21, 2023 18:55
@jonbob
Copy link
Contributor

jonbob commented Nov 21, 2023

merged to master -- expected DIFFs will be blessed with PR #6035

jonbob added a commit that referenced this pull request Nov 21, 2023
Update weights for ocean barotropic subcycling in split-explicit solver

This PR updates the weights config_btr_gam1_velWt1,
config_btr_gam2_SSHWt1 in the MPAS-Ocean barotropic solver based on
recent analysis of this scheme. This update applies to "split explicit"
time stepping schemes, i.e. config_time_integrator = 'split_explicit'
and the new config_time_integrator = 'split_explicit_ab2' in #5989. The
new weights allow for a barotropic time step config_btr_dt to be 25%
longer in the tests of EC30to60, thus speeding up the barotropic
subcycling. We expect that values of config_btr_dt can be increased by
25% for all meshes.

[NML]
[non-BFB]
xylar added a commit to xylar/compass that referenced this pull request Dec 3, 2023
This merge updates the E3SM-Project submodule from [894b5b2](https://github.com/E3SM-Project/E3SM/tree/894b5b2) to [5d5f15c](https://github.com/E3SM-Project/E3SM/tree/5d5f15c).

This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list):
- [ ]  (ocn) E3SM-Project/E3SM#5945
- [ ]  (ocn) E3SM-Project/E3SM#5946
- [ ]  (ocn) E3SM-Project/E3SM#5947
- [ ]  (ocn) E3SM-Project/E3SM#5999
- [ ]  (ocn) E3SM-Project/E3SM#6037
- [ ]  (ocn) E3SM-Project/E3SM#5989
- [ ]  (ocn) E3SM-Project/E3SM#6035
- [ ]  (ocn) E3SM-Project/E3SM#6077
@amametjanov
Copy link
Member

It appears that after the merge nightly OpenACC test

./cime/CIME/scripts/create_test SMS_Ld1.T62_oEC60to30v3.CMPASO-NYF

stopped working with either --machine ascent --compiler pgigpu or --machine pm-gpu --compiler nvidiagpu.
I took a stab at fixing initial build errors, but there's still quite a few variables that need to be copied CPU-to-GPU: e.g.

FATAL ERROR: data in PRESENT clause was not found on device 1: name=layerthickness

@hyungyukang
Copy link
Contributor Author

It appears that after the merge nightly OpenACC test

./cime/CIME/scripts/create_test SMS_Ld1.T62_oEC60to30v3.CMPASO-NYF

stopped working with either --machine ascent --compiler pgigpu or --machine pm-gpu --compiler nvidiagpu. I took a stab at fixing initial build errors, but there's still quite a few variables that need to be copied CPU-to-GPU: e.g.

FATAL ERROR: data in PRESENT clause was not found on device 1: name=layerthickness

@amametjanov , Thanks for reporting it. I was able to run the AB2 code on both pm-gpu and Frontier-gpu, but I had to do some modifications for a macro -DMPAS_OPENACC which will not be activated just for using pm-gpu. I've referenced @grnydawn 's fix for PM-GPU at here: https://github.com/E3SM-Project/E3SM/tree/ykim/mpas/gcase, but I believe this modification is not merged yet.

I also found that I made a mistake in mpas_ocn_diagnostics_variables.F when I commit this code for GPU runs. I will open a new PR with some modifications to run the code on GPUs properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpas-ocean NML non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants