Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-line self attraction and loading for global tidal simulation #4472

Merged

Conversation

sbrus89
Copy link
Contributor

@sbrus89 sbrus89 commented Aug 18, 2021

Self attraction and loading (SAL) forcing involves a spherical harmonic transform (SHT) of the SSH (Ray, 1998). The spherical harmonic coefficients are scaled by the load Love numbers, then an inverse transformation is performed to compute the SAL field. The SAL forcing is incorporated in the pressure gradient term along with the tidal potential.

Two different approaches are implemented for performing the SHT:

  1. SSH is gathered onto a single MPI rank and interpolated to a Gaussian mesh. A very fast library, SHTns, is used to compute the forward/inverse SHT. The SAL field is then interpolated back to the unstructured mesh and scattered back to the other MPI ranks. This work was done by @knbarton.
  2. The SHT integration is performed on each subdomain and the SH coefficients are globally summed with an all-reduce.

Approach 1 ) performs better at low core-counts, but 2) scales better for larger runs.

This PR includes an init mode option for verification testing of the forward/inverse SHT.

This work was done as a part of the tides task of the ICoM project.

[NML]
[BFB] (for all standard E3SM tests, and for tests with SAL on when config_parallel_self_attraction_loading_bfb = .true.)

@sbrus89
Copy link
Contributor Author

sbrus89 commented Sep 3, 2021

Here are some verification results for both SHT algorithms:

Approximation errors with different SH orders are shown for two different analytical functions. The first is on a quasi uniform resolution mesh with resolutions: QU60 (165k cells), QU30 (659k cells), and QU15 (2.6M cells). In all cases the SH order for the serial approach is N = Nlat/2 - 1. Where Nlat is the number of latitudes in the Gaussian grid.

spiral_with_background

The second analytical function is on an "equatorial refined" mesh with resolutions: ER120to60 (60k cells), ER12to30 (118k cells), and ER12to15 (286k cells). The oscillations vary according to mesh resolution such that there are approximately 30 grid cells per period.

EC120to15_with_background

Convergence results for the QU meshes are:

QU_convergence
The convergence of the serial approach is slower due to interpolation errors. Convergence for the parallel approach is limited by discretization error due to mid-point rule integration.

And convergence results for the ER meshes are:

EC_convergence

Looking across equivalent accuracy levels, the performance of the two SHT algorithms for the QU meshes are:

QU60
QU60_scaling

QU30
QU30_scaling

QU15
QU15_scaling

The performance comparisons for the ER meshes are:

ER120to60
EC120to60_scaling

ER120to30
EC120to30_scaling

ER120to15
EC120to15_scaling

All runs were done on Compy with Intel compilers.

@knbarton
Copy link
Contributor

knbarton commented Sep 3, 2021

I performed harmonic analysis comparison to benchmark TPXO8 tidal data from a single-layer barotropic run on Icosahedron 9 (16km) uniform mesh. This shows the serial version of SAL in this pull request gives the same tidal results as the serial version built previously. I have also attached a comparison with the scalar SAL approximation to show the improvement due to full SAL calculation.

This was build on Cori with these settings : make gnu-nersc USE_PIO2=true OPENMP=false DEBUG=false GEN_F90=true


M2 Constituent

M2_plot
M2_plot
M2_plot

M2 Complex RMSE Values in Deep water (<1000m) between 66N and 66S :
Serial before : 0.06399980070877255m
Serial currently: 0.06399973949098071m
Scalar approximation: 0.072925093765029m

K1 Constituent

K1 Complex RMSE Values in Deep water (<1000m) between 66N and 66S :
Serial before : 0.02360560164062904m
Serial currently: 0.023605630922693552m
Scalar approximation: 0.02605653002810445m

N2 Constituent

N2 Complex RMSE Values in Deep water (<1000m) between 66N and 66S :
Serial before : 0.01316337401656041m
Serial currently: 0.013163374004691027m
Scalar approximation: 0.014364836226592457m

O1 Constituent

O1 Complex RMSE Values in Deep water (<1000m) between 66N and 66S :
Serial before : 0.014989959013047288m
Serial currently: 0.014989994509038505m
Scalar approximation: 0.021686930435006916m

S2 Constituent

S2 Complex RMSE Values in Deep water (<1000m) between 66N and 66S :
Serial before : 0.02929243376361691m
Serial currently: 0.02929240084208756m
Scalar approximation: 0.034033753955931854m


We do expect a slight difference in results between the previous serial SAL calculation and the one in this PR due to changing from N = Nlat - 1 to N = Nlat/2 - 1.

Currently, there is an error when using the parallel SAL. When that is resolved, I will share those harmonic analysis results as well.

@sbrus89 sbrus89 force-pushed the sbrus89/ocn/add-self-attraction-loading branch from 28b03d8 to f9e0e11 Compare September 17, 2021 18:11
@knbarton
Copy link
Contributor

Here are the results for the parallel SAL implementation on Icosahedron 9 mesh using the same settings as before.

RMSE Values in Deep water (<1000m) between 66N and 66S in meters :
M2 Constituent: 0.06416788614061514
K1 Constituent: 0.023685223640391814
N2 Constituent: 0.013199779825957164
O1 Constituent: 0.015113880782811768
S2 Constituent: 0.02942663706041666

M2 Constituent plots:
M2_plot

@mark-petersen
Copy link
Contributor

@sbrus89 thanks. Please check my last commit d6daa07. Stand-alone was not compiling with SHTNS=false because it referenced some shtns variables. I just put those lines within the cpp constructs, so should be fine for the parallel version.

@@ -829,7 +829,7 @@ subroutine ocn_vel_vmix_tend_implicit_spatially_variable_mannings(meshPool, forc
! averaging the two bottomDepth cells.
implicitCd = max(0.0025_RKIND, &
min(0.1_RKIND, &
0.16_RKIND/log(250.0_RKIND*(bottomDepth(cell1)+bottomDepth(cell2)))**2 ))
0.16_RKIND/(log(250.0_RKIND*(bottomDepth(cell1)+bottomDepth(cell2))))**2 ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this parenthesis correction was added to this PR because config_use_implicit_bottom_roughness is only used in the tidal simulations used in this PR.

call associatedLegendrePolynomials(n, m, startIdx, endIdx, l, pmnm2, pmnm1, pmn)

! Compute local integral contribution
do iCell = endIdx, startIdx, -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbrus89 we typically place OMP pragmas on the outer loop, e.g.

               !$omp parallel
               !$omp do schedule(runtime)
               do iCell = 1, nCellsAll
                   ...
               end do
               !$omp end do
               !$omp end parallel

with private clauses for internal indices as needed. That said, I think the $OMP would go on the outermost loop m here, if m iterations are independent, which they appear to be.

OpenMP is particularly important for the parallel implementation, but could also be used for serial now that we found that bash error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, OMP is not needed for init mode. It is much less important for init subroutines in forward mode.

@mark-petersen
Copy link
Contributor

mark-petersen commented Sep 20, 2021

@sbrus89 and/or @knbarton could you, with parallel and serial with full SAL on, conduct these three tests after the OpenMP commit, in optimized mode:

  1. restart test: run (say) a day, restart an extra day, and separately two days, compare restarts at end
  2. partition test: run identical short test with two partitions (say 16 vs 32 cores for a day)
  3. thread test: run with OPENMP=false, run for a day, recompile with OPENMP=true, set OMP_NUM_THREADS=2, run for day.

for all three, use a bfb comparison of last restart file, with python or ncdiff. Simple ncdiff commands:

ncdiff file1.nc file2.nc diff.nc
ncwa -O -y mabs -v normalVelocity,layerThickness diff.nc diffmabs.n

max absolute difference should be exactly zero. You can use compass for this, but if you already have SAL runs set up, might be easier to just copy your settings from there.

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged into master locally and retested. Passes

SMS_D_Ln9.T62_oQU120_ais20.MPAS_LISIO_TEST.chrysalis_gnu
SMS_D_Ln9.T62_oQU120_ais20.MPAS_LISIO_TEST.chrysalis_intel
PET_Ln3.T62_oEC60to30v3wLI.GMPAS-DIB-IAF-ISMF.chrysalis_gnu

Also compiled with gnu and intel on grizzly with SHTNS off (default). Passes pr test suite for both, with bfb match to compass reference. Compiled gnu with SHTNS=true but off using config flags. That also passes pr suite with both debug and optimized gnu.

@rljacob rljacob assigned jonbob and unassigned mark-petersen Oct 6, 2021
@sbrus89 sbrus89 closed this Oct 11, 2021
@sbrus89 sbrus89 force-pushed the sbrus89/ocn/add-self-attraction-loading branch from 0051799 to 336b521 Compare October 11, 2021 21:08
@sbrus89 sbrus89 reopened this Oct 11, 2021
@sbrus89
Copy link
Contributor Author

sbrus89 commented Oct 13, 2021

Here are the results I got for BFB testing for the spherical harmonic transform (SHT) only (via init mode, config_init_configuration = 'test_sht') :

Compy: Intel 19.0.5, DEBUG=false

serial max diff parallel max diff
1 - 2 nodes BFB 1.55431223447522e-15
2 - 4 nodes BFB 1.77635683940025e-15
unthreaded - 2 threads (2 nodes) (compile error for openMP) 3.5527136788005e-15
unthreaded - 4 threads (2 nodes) (compile error for openMP) 3.99680288865056e-15

The non-BFB behavior for the parallel SHT is due to the reproducibility of the MPI_AllReduce call and !$omp atomic statements.

For a forward mode run on the icos9 mesh, the parallel version is BFB for restarts, but not across partitions due to the differences in the SHT above. The max difference after two days between a 32 node and 64 node run was 8.83346729096957e-11 for layerThickness and 1.7656587997239e-11 for normalVelocity. The serial version is BFB for restarts and across partitions for this case.

@mark-petersen mark-petersen self-requested a review October 13, 2021 22:02
Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, based on lengthy review and testing by @sbrus89 @knbarton and myself described above. Only remaining issue is that the parallel version of the SHT is non-bfb across partitions due to operation order of the global sums (see previous comment), but it is machine precision. I think that result is sufficient for merging in the PR.

@jonbob
Copy link
Contributor

jonbob commented Oct 26, 2021

@philipwjones is working on implementing a reproducible sum in MPAS, which would solve the problem of non-BFB results on different processor counts.

@jonbob jonbob requested a review from mark-petersen November 1, 2021 15:19
@jonbob
Copy link
Contributor

jonbob commented Nov 1, 2021

@mark-petersen - I re-requested your review on this PR because it would be more helpful once the reproducible sum has been added. As-is, it really isn't ready to be merged

@sbrus89
Copy link
Contributor Author

sbrus89 commented Dec 15, 2021

@jonbob and @mark-petersen: Now that #4700 has been submitted, I'll work on testing this with the reproducible sum implemented by @philipwjones .

@mark-petersen mark-petersen self-requested a review March 29, 2022 21:17
Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retested with today's commits using gnu debug on badger with SAL and tidal potential on, and bfb flag first on. The full nightly suite passes.

@jonbob this is ready to merge when the repo is open.

@jonbob
Copy link
Contributor

jonbob commented Mar 31, 2022

@philipwjones and @matthewhoffman - can you please review after the changes have been made? Thanks

@hollyhan
Copy link
Contributor

hollyhan commented Apr 4, 2022

This comment is to confirm that the issue I experienced in my previous comment (#4472 (comment)) has been addressed by @knbarton's commit (badff01).

Previously, I experienced a bound-check error during coupled MALI - Sea Level Model simulations in which the subroutine interpolation adopted from the code components/mpas-ocean/src/shared/mpas_ocn_vel_self_attraction_loading.F is utilized. The error only appeared in the DEBUG mode. The coupled MALI-SLM simulations were terminated with a runtime error with the following message: "Fortran runtime error: Index '1627' of dimension 1 of array 'rowvalues' above upper bound of 1626" (The array size of 'rowValues', 'colValues' and 'sValues (or n_S)' in the mapfile I used is 1626.)

After incorporating this fix, coupled MALI - Sea Level Model simulations compiled in DEBUG mode do not experience any errors, and the simulation results (i.e., MALI thickness data interpolated between the MALI mesh and the global Gaussian grid) are correct.

Thanks again, @knbarton, @mark-petersen, @sbrus89!

@sbrus89
Copy link
Contributor Author

sbrus89 commented Apr 4, 2022

Thanks for pointing this out and helping to test the fix @hollyhan!

@mark-petersen
Copy link
Contributor

@philipwjones can you check the acc changes I just made in 03a5551?

Copy link
Contributor

@philipwjones philipwjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for acc look fine. Approving based on all the other discussion/testing.

@mark-petersen mark-petersen added BFB PR leaves answers BFB NML Ready to merge labels Apr 5, 2022
@jonbob
Copy link
Contributor

jonbob commented Apr 7, 2022

@sbrus89 - the scripts that update the bld files to be consistent with Registry changes only came up with a missing blank line in namelist_definition_mpaso.xml. Is it OK to push to your branch? Or I can tell you where the line is if you prefer to handle it

@sbrus89
Copy link
Contributor Author

sbrus89 commented Apr 7, 2022

@jonbob, thanks for catching that. Sure, it's totally fine to push to my branch.

matthewhoffman added a commit to MALI-Dev/E3SM that referenced this pull request Apr 19, 2022
This PR adds a 1D gravitationally consistent sea-level model (Kendall et al.,
2005; Gomez et al., 2010; Han et al., 2022;
https://github.com/GomezGroup/1DSeaLevelModel_FWTW) as a submodule within MALI,
adding the capability of regional sea-level projection associated with ice
sheet changes solved by MALI. In this new addition, the sea-level model
computes changes in the height of sea surface equipotential and viscoelastic
solid Earth associated with surface (ice and ocean) loading changes, enabling
MALI to take into account gravitational, rotational and deformational effects
of the loading changes.

The implemented requirements and modified/added codes in the PR are as follow:

1. MALI to optionally build SLM from the top level directory
(mpas-albany-landice):  modified `Makefile` in MALI's top level directory such
that `SLM=True` in make command builds the SLM.
2. MALI to initialize SLM at t=0 of a simulation:  added wrapper subroutine
'slmodel_init' for initializing and the SLM in `mpas_li_bedtopo.F` in which
sea-level model sub-modules are called.
3. MALI to call SL solver at every coupling time interval:  added wrapper
subroutine 'slmodel_solve' for calling the SLM in `mpas_li_bedtopo.F` in which
sea-level model sub-modules are called.
4. MALI to call the SLM at every coupling time interval (i.e., SLM timestep):
1) Modified `Registry.xml` to include a namelist value of the coupling time, 2)
modified `mpas_li_core.F` to add an alarm to the land ice core clock object for
the coupling time interval, and 3) modified `mpas_li_bedtopo.F` to use
`mpas_timekeeping` and check/reset the alarm for the coupling time interval.
5. MALI needs to accommodate SLM running on a single processor while running on
multiple processors: Implement MPI scatter, gather and halo updates in
`mpas_li_bedtopo.F`. MALI data (ice thickness) is gathered on the head node
after which the SLM is called.
6. Interpolation between MALI and SLM native grids should be done: Scripfiles
`mpas_to_grid.nc` and `grid_to_mpas.nc` for the regional, unstructured MALI
mesh and the global Gaussian grid are generated using MPAS-mesh tool
`from_mpas.py` and `ncremap`, respectively. Subroutines `interpolate_init` and
`interpolate` and MPI scatter/gather functions are copied from the ocean SAL
PR#4472 [E3SM-Project#4472]
7. MALI and SLM must exchange model data: changes on the head node, MALI
provides thickness as input to SLM through SLM subroutine `sl_solver` that
calculates and outputs total bedtopography, which is used to update bedrock
topography in `mpas_li_bedtopo.F`
8. Include the SLM namelist file such that SLM parameters can be accessed and
modifed in MALI top-level directory and that the SLM does not need to be
recompiled the parameters are modified.

* origin/hollyhan/mali/add_SLmodel: (43 commits)
  Change the variable name 'topoChange' to 'bedTopographyChange'
  Add a check for consistency in coupling interval prescribed in MALI and SLM settings
  Modify the src/Makefile such that 'make clean' removes files in the SLM submodule
  Fix the while loop in subroutine 'interpolate'
  Fix the file name for the 'gridToMpasFile'
  Update the SeaLevelModel submodule
  Change variable name `unit_num` in MALI to avoid conflict with the one in SLM
  Update the SeaLevelModel submodule
  Clean up and add an address to the SLM Github repo
  Update SLM submodule URL and hash
  Remove white trailing spaces
  deallocate MPI variables across routines
  Change a variable name to camelCase
  Add an error statement for when compile flag and namelist config have a conflict
  Update sea-level model submodule
  Change the variable name 'UpliftDiff' to 'TopoChange'
  Update sea-level model submodule
  Change 'err' to 'err_tmp' in calling routines
  Clean up and reorginize the code
  Include a wrapper subroutine that reads in the SLM namelist
  ...
@rljacob
Copy link
Member

rljacob commented Apr 19, 2022

Is this ready?

@sbrus89
Copy link
Contributor Author

sbrus89 commented Apr 19, 2022

As far as I know, it is.

jonbob added a commit that referenced this pull request Apr 20, 2022
Add in-line self attraction and loading for global tidal simulation

Self attraction and loading (SAL) forcing involves a spherical harmonic
transform (SHT) of the SSH (Ray, 1998). The spherical harmonic
coefficients are scaled by the load Love numbers, then an inverse
transformation is performed to compute the SAL field. The SAL forcing is
incorporated in the pressure gradient term along with the tidal
potential. Two different approaches are implemented for performing the
SHT:
1. SSH is gathered onto a single MPI rank and interpolated to a Gaussian
   mesh. A very fast library, SHTns, is used to compute the
   forward/inverse SHT. The SAL field is then interpolated back to the
   unstructured mesh and scattered back to the other MPI ranks. This work
   was done by @knbarton.
2. The SHT integration is performed on each subdomain and the SH
   coefficients are globally summed with an all-reduce.
Approach 1 ) performs better at low core-counts, but 2) scales better
for larger runs. This PR includes an init mode option for verification
testing of the forward/inverse SHT.

This work was done as a part of the tides task of the ICoM project.

[NML]
[BFB] for all standard E3SM tests
@jonbob
Copy link
Contributor

jonbob commented Apr 20, 2022

Passes:

  • ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel
  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
  • SMS_D_Ld1.ne30pg2_r05_EC30to60E2r2.WCYCL1850.chrysalis_intel
  • PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel

with expected NML DIFFs.

@jonbob
Copy link
Contributor

jonbob commented Apr 20, 2022

merged to next

jonbob added a commit that referenced this pull request Apr 21, 2022
#4472)

Re-merge to pick up missed changes in mpas-ocean build-namelist file
@jonbob
Copy link
Contributor

jonbob commented Apr 21, 2022

Missed changes that needed to go in the mpas-ocean build-namelist file. Retesting passes:

  • ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel
  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
  • SMS_D_Ld1.ne30pg2_r05_EC30to60E2r2.WCYCL1850.chrysalis_intel
  • PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel

Still with expected NML DIFFs. Re-merged to next

@jonbob jonbob merged commit d36c7bc into E3SM-Project:master Apr 25, 2022
@jonbob
Copy link
Contributor

jonbob commented Apr 25, 2022

merged to master and expected NML DIFFs blessed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants