Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ocn_glcshelf coupling test #95

Conversation

matthewhoffman
Copy link
Collaborator

This PR adds a new test that exercises the ocn_glcshelf coupling. That coupling was added in PR E3SM-Project#2726, but a test was never added.

This PR creates a new
SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.mpaso-ocn_glcshelf
test that runs a G-case with the ocn_glcshelf coupling turned on.
That coupling was added in PR #2726, but a test was never added.
@matthewhoffman matthewhoffman requested review from xylar and jonbob May 3, 2024 03:19
@matthewhoffman
Copy link
Collaborator Author

@xylar , I ran the new test on Chrysalis like:
./create_test --walltime 0:30:00 SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf
It currently is dying here:

473: forrtl: severe (408): fort: (2): Subscript #1 of the array EFFECTIVEDENSITYSCRATCH has value 762 which is greater than the upper bound of 761
473:
473: Image              PC                Routine            Line        Source
473: libpnetcdf.so.3.0  000015554B93B1A2  for_emit_diagnost     Unknown  Unknown
473: e3sm.exe           0000000003A7FF3F  ocn_effective_den         185  mpas_ocn_effective_density_in_land_ice.f90
473: libiomp5.so        000015554C4053F3  __kmp_invoke_micr     Unknown  Unknown
473: libiomp5.so        000015554C389273  Unknown               Unknown  Unknown
473: libiomp5.so        000015554C38821E  Unknown               Unknown  Unknown
473: libiomp5.so        000015554C4058CC  Unknown               Unknown  Unknown
473: libpthread-2.28.s  000015554561614A  Unknown               Unknown  Unknown
473: libc-2.28.so       0000155545345DC3  clone                 Unknown  Unknown

@xylar xylar changed the base branch from master to alternate May 3, 2024 04:10
@xylar xylar changed the base branch from alternate to master May 3, 2024 04:10
@xylar
Copy link
Collaborator

xylar commented May 3, 2024

Great, I'll debug that tomorrow. If it's easy to fix, I will. If it's easier to just comment out effectivDensity, I might go down that route instead.

@xylar
Copy link
Collaborator

xylar commented May 3, 2024

@matthewhoffman, before you move to E3SM, you will need to rename the branch using - instead of _. I also think that ocn-glc might be a better "topic" than cime because this doesn't directly relate to cime itself but I'm not that well versed on what people use in these circumstances. It might be fine.

With this fix, effective density smoothing only involves *valid*
neighbors of cells.
@xylar
Copy link
Collaborator

xylar commented May 3, 2024

With the fix I just pushed, I'm seeing:

$ ./create_test --walltime 0:30:00 SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf --wait
Using project from config_machines.xml: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 160 cores simultaneously
Creating test directory /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf.20240503_133857_w9d2mu
RUNNING TESTS:
  SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf
Starting CREATE_NEWCASE for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished CREATE_NEWCASE for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 0.955405 seconds (PASS)
Starting XML for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished XML for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 0.325480 seconds (PASS)
Starting SETUP for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished SETUP for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 6.682981 seconds (PASS)
Starting SHAREDLIB_BUILD for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished SHAREDLIB_BUILD for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 141.214634 seconds (PASS)
Starting MODEL_BUILD for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 6 procs
Finished MODEL_BUILD for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 283.624796 seconds (PASS)
Starting RUN for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 proc on interactive node and 512 procs on compute nodes
Finished RUN for test SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 2.895139 seconds (PEND). [COMPLETED 1 of 1]
Waiting for tests to finish
PASS SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf RUN
    Case dir: /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf.20240503_133857_w9d2mu
test-scheduler took 1265.0347168445587 seconds

Not the fastest test in the world (15 minutes on 8 nodes) so we should improve that with future AIS meshes.

@xylar
Copy link
Collaborator

xylar commented May 3, 2024

@matthewhoffman, should we do an exact restart test instead of a smoke test? I'm trying the following right now:

$ ./create_test --walltime 0:30:00 ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf --wait

@xylar
Copy link
Collaborator

xylar commented May 3, 2024

ERS also passed:

$ ./create_test --walltime 0:30:00 ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf --wait
Using project from config_machines.xml: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 160 cores simultaneously
Creating test directory /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf.20240503_140547_z7jcy3
RUNNING TESTS:
  ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf
Starting CREATE_NEWCASE for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished CREATE_NEWCASE for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 1.270281 seconds (PASS)
Starting XML for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished XML for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 0.323577 seconds (PASS)
Starting SETUP for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished SETUP for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 6.950484 seconds (PASS)
Starting SHAREDLIB_BUILD for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 procs
Finished SHAREDLIB_BUILD for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 137.674649 seconds (PASS)
Starting MODEL_BUILD for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 6 procs
Finished MODEL_BUILD for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 264.120663 seconds (PASS)
Starting RUN for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf with 1 proc on interactive node and 512 procs on compute nodes
Finished RUN for test ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf in 3.127767 seconds (PEND). [COMPLETED 1 of 1]
Waiting for tests to finish
PASS ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf RUN
    Case dir: /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/ERS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.chrysalis_intel.mpaso-ocn_glcshelf.20240503_140547_z7jcy3
test-scheduler took 1637.260846376419 seconds

Copy link
Collaborator

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewhoffman, I'll leave it up to you if you want to stick with SMS or switch to ERS.

@@ -249,6 +249,7 @@
"SMS_D_Ld1.T62_oQU240wLI.GMPAS-IAF-PISMF.mpaso-impl_top_drag",
"SMS_D_Ld1.T62_oQU240.GMPAS-IAF.mpaso-harmonic_mean_drag",
"SMS_D_Ld1.T62_oQU240.GMPAS-IAF.mpaso-upwind_advection",
"SMS_P512_D_Ld5.T62_oEC60to30v3wLI_ais20.MPAS_LISIO_TEST.mpaso-ocn_glcshelf",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonbob, should we leave off the P512 and let the infrastructure handle it? I think @matthewhoffman was finding that it ran on 15 Chrysalis nodes by default, which seemed to me like a lot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave off the P512 and add an entry to the config_pes_tests.xml instead, if necessary. P512 is awkward if it gets run on multiple machines, since the pes/node is variable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonbob , what's the procedure for updating config_pes_tests.xml? Would it be better up update config_pes.xml? I've always been a little uncertain about how to specify compset/grid combinations properly in those files.

@matthewhoffman
Copy link
Collaborator Author

@xylar , thanks for the quick feedback and the updates! Glad to see you got it running to completion easily.

I agree that switching to an ERS test would be more robust, even if it takes longer, but I'll wait to let @jonbob chime in before changing the PR.

In terms of timing, the GLC cost is irrelevant, and it's the OCN and ICE cost (as I think we expected with this mesh):

    TOT Run Time:     336.309 seconds      168.155 seconds/mday         1.41 myears/wday
    CPL Run Time:      10.223 seconds        5.112 seconds/mday        46.31 myears/wday
    ATM Run Time:       0.234 seconds        0.117 seconds/mday      2023.18 myears/wday
    LND Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday
    ICE Run Time:     134.197 seconds       67.099 seconds/mday         3.53 myears/wday
    OCN Run Time:     197.678 seconds       98.839 seconds/mday         2.39 myears/wday
    ROF Run Time:       0.018 seconds        0.009 seconds/mday     26301.37 myears/wday
    GLC Run Time:       0.364 seconds        0.182 seconds/mday      1300.62 myears/wday
    WAV Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday
    IAC Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday
    ESP Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday
    CPL COMM Time:     14.300 seconds        7.150 seconds/mday        33.11 myears/wday

My thought is to add this test with the grid currently in the PR (because we need ice-shelf cavities to exist), even if the test is expensive, and then update OCN/ICE to QU120 or QU240 when we have the time to create one (or both) of those with ISC. I will start a PR to update the GLC Antarctica meshes soon, so we can presumably move away from the AIS20km mesh at that time too. What are your thoughts on that?

We could reduce the duration from 5 days, so 2 or 3 days if we wanted to make the test run faster and still cover what we want. @jonbob , do you have any advice about this?

This provides a low res ocean mesh with ice-shelf cavities to permit
faster tests of the ocn-glcshelf coupling.  Needed mapping files are
listed but the filenames are not added yet.
@jonbob
Copy link
Collaborator

jonbob commented May 16, 2024

@matthewhoffman -- we will probably need a new compset to run this new resolution? The current LISIO compset is hard-wired to use CORE-II forcing:

  <alias>MPAS_LISIO_TEST</alias>
  <lname>2000_DATM%NYF_SLND_MPASSI_MPASO_DROF%NYF_MALI%SIA_SWAV</lname>

We could add an MPAS_LISIO_JRA1p5 compset to this PR:

  <alias>MPAS_LISIO_JRA1p5</alias>
  <lname>2000_DATM%JRA-1p5_SLND_MPASSI_MPASO_DROF%JRA-1p5_MALI%SIA_SWAV</lname>

I tested with the new resolution and this compset and it ran successfully

@matthewhoffman
Copy link
Collaborator Author

@jonbob , I actually created a 'GG' compset in my Greenland OCN->GLC PR here: a9224b8
But there are a couple differences:

  • I used MPASO%DATMFORCED for OCN (which I copied from the JRA G case), but you have just MPASO. Do you know what the difference is, and which would be better to use? (cc: @xylar )
  • I used MALI, but you used MALI%SIA for GLC. I think we still want both flavors. The LISIO/MALI%SIA combination is useful for testing because it is supported everywhere. The version I have in my other PR is for actual science evaluation and more rigorous testing with the correct MALI dynamics. Do you agree that we should keep both? If so, then maybe we are just stuck with having an updated JRA LISIO plus a more realistic 'GG'.

@jonbob
Copy link
Collaborator

jonbob commented May 17, 2024

@matthewhoffman -- thanks for catching that. I think MPASO%DATMFORCED is preferable, and my first shot just a modification of the existing LISIO compset. Apparently it did not get updated when we introduced the DATMFORCED option. I think it's fine having both flavors, though I am not set on the LISIO version -- I mostly was trying to find a compset to test the new resolution and associated files with.

@matthewhoffman
Copy link
Collaborator Author

Thanks, @jonbob . Maybe you, me, and @xylar can hash this out after the mpas devops meeting on Tuesday if a definitive plan is not obvious before that.

This commit makes two updates to the new test:
* It replaces the MPAS_LISIO_TEST compset with MPAS_LISIO_JRA1p5.
  The old MPAS_LISIO_TEST will no longer be supported.
* It deletes the new test that was added in a previous commit to the
  e3sm_ocnice_stealth_features suite and instead updates the existing
  SMS.T62_oQU120_ais20.MPAS_LISIO_TEST in the e3sm_developer suite in a
  number of ways:

  1. uses the updated compset
  2. has glcshelf coupling enabled
  3. switches from SMS to ERS
  4. uses TL319_oQU240wLI_ais20 mesh
@matthewhoffman
Copy link
Collaborator Author

@jonbob and @xylar , I've updated the test as we discussed this morning. I ran it on chrysalis with:

./create_test --walltime 0:30:00 ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf

and it appears to have taken just a couple minutes for the ERS test to run and it passes comparisons. But it would be good to have someone else confirm I'm interpreting the test results correctly. Unless either of you see anything amiss, I think this PR is ready to move to the main repo.

Copy link
Collaborator

@jonbob jonbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved based on visual inspection and testing

@xylar
Copy link
Collaborator

xylar commented May 22, 2024

closed in favor of E3SM-Project#6437

@xylar xylar closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants