Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating Icepack to hash 6ca316f, adding wlat for FSD, updating colpkg #16

Merged

Conversation

eclare108213
Copy link
Owner

@eclare108213 eclare108213 commented Apr 14, 2023

Updates icepack to the cice-consortium/E3SM-icepack-initial-integration branch. This is the first icepack submodule update in this branch since #3.

Runs with icepack are not BFB. Updating icepack by itself failed with a segfault in the frzmlt_bottom_lateral subroutine. Adding wlat to MPAS-SI's step_therm1 call fixed the problem (so this implementation does not appear to be backwards compatible).

Colpkg ('column') tests are not BFB with the prior version. The colpkg updates here enable BFB tests in single-column mode.

@eclare108213 eclare108213 requested a review from apcraig April 14, 2023 20:53
@eclare108213 eclare108213 self-assigned this Apr 14, 2023
@apcraig
Copy link
Collaborator

apcraig commented Apr 14, 2023

Do you want me to look into the wlat issue?

@apcraig
Copy link
Collaborator

apcraig commented Apr 14, 2023

Looks like you have the correct version of Icepack here.

@apcraig
Copy link
Collaborator

apcraig commented Apr 14, 2023

This is set as a draft PR, if it's ready for review, click on the "Ready for review" and then it can be merged.

@eclare108213
Copy link
Owner Author

Do you want me to look into the wlat issue?

Yes please.

I'm not ready for this to be merged - I want to understand where the non-BFB-ness is coming from. According to the Icepack PR descriptions, I think this should be BFB in our current configuration. I've added some notes at https://acme-climate.atlassian.net/wiki/spaces/ICE/pages/3740172289/2023-04+Icepack+Merge+Hackathon+notes and https://acme-climate.atlassian.net/wiki/spaces/ICE/pages/3537993995/Icepack+Integration+Branch+Status+and+Testing.

@apcraig
Copy link
Collaborator

apcraig commented Apr 14, 2023

I haven't looked at all the changes in Icepack, but I don't think it will be bit-for-bit. What I'm checking is that the "E3SM" CICE+Icepack remains bit-for-bit with the equivalent version in CICE-Consortium. So a bit-for-bit change on both that produces identical results is noted as being bit-for-bit. If you are comparing E3SM mpassi+icepack(Dec) and E3SM mpassi+icepack(Apr), those are probably not bit-for-bit. It may depend in part what options you're using in Icepack. Let me know if you have questions or if I can help.

I'll look at the wlat issue on the Consortium side.

@eclare108213
Copy link
Owner Author

If you are comparing E3SM mpassi+icepack(Dec) and E3SM mpassi+icepack(Apr), those are probably not bit-for-bit. It may depend in part what options you're using in Icepack.

Yes, this is what I'm doing. According to the PR descriptions, I don't think the options that I'm using should produce non-BFB results. But there are some suspect lines of code, that seem to be active even when those options are off (so I'm not sure why they would have been BFB in Icepack or CICE tests).

What I'm checking is that the "E3SM" CICE+Icepack remains bit-for-bit with the equivalent version in CICE-Consortium. So a bit-for-bit change on both that produces identical results is noted as being bit-for-bit.

Is this what you mean by "BFB with Consortium main" in E3SM-Project/Icepack#21?

@apcraig
Copy link
Collaborator

apcraig commented Apr 14, 2023

Correct, when I merge into E3SM/Icepack, I compare E3SM CICE+Icepack vs Consortium CICE+Icepack to verify they are identical (bit-for-bit). There are a few cases that are not (I think modal tests in Icepack, but that's expected at this point). I have NOT been testing E3SM CICE+Icepack (prior version) vs E3SM CICE+Icepack (new version).

@eclare108213
Copy link
Owner Author

Okay, that means that the issue is probably not in Icepack itself, but there could be an issue in the calls from MPAS-SI to icepack (or some assumption in MPAS-SI that isn't consistent with Icepack).

@eclare108213
Copy link
Owner Author

eclare108213 commented Apr 15, 2023

Do you want me to look into the wlat issue?

Yes please.

I confirmed that this error is reproducible, not just anvil being flaky. Here's the error message:

261: [b538:56885:0:56885] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))

453: ==== backtrace (tid:  98716) ====
453:  0 0x0000000000b086f6 icepack_therm_vertical_mp_icepack_step_therm1_()  /lcrc/group/e3sm/ac.eclare/E3SM-Polar/D3.nset01.updateIcepack20230414.update_icepack_2.eclare108213.anvil/build/cmake-bld/core_seaice/icepack/columnphysics/icepack_therm_vertical.f90:5
79
453:  1 0x00000000007170ab seaice_icepack_mp_column_vertical_thermodynamics_()  /lcrc/group/e3sm/ac.eclare/E3SM-Polar/D3.nset01.updateIcepack20230414.update_icepack_2.eclare108213.anvil/build/cmake-bld/core_seaice/shared/mpas_seaice_icepack.f90:1805
453:  2 0x0000000000702d21 seaice_icepack_mp_seaice_icepack_predynamics_time_integration_()  /lcrc/group/e3sm/ac.eclare/E3SM-Polar/D3.nset01.updateIcepack20230414.update_icepack_2.eclare108213.anvil/build/cmake-bld/core_seaice/shared/mpas_seaice_icepack.f90:112
7

line 579 is an empty line in subroutine frzmlt_bottom_lateral, so the numbering is off...

    561       !-----------------------------------------------------------------
    562       ! Identify grid cells where ice can melt.
    563       !-----------------------------------------------------------------
    564 
    565       rside = c0
    566       fside = c0
    567       Tbot  = Tf
    568       fbot  = c0
    569       wlat  = c0
    570 
    571       if (aice > puny .and. frzmlt < c0) then ! ice can melt
    572 
    573       !-----------------------------------------------------------------
    574       ! Use boundary layer theory for fbot.
    575       ! See Maykut and McPhee (1995): JGR, 100, 24,691-24,703.
    576       !-----------------------------------------------------------------
    577 
    578          deltaT = max((sst-Tbot),c0)
    579 
    580          ! strocnx has units N/m^2 so strocnx/rho has units m^2/s^2
    581          ustar = sqrt (sqrt(strocnxT**2+strocnyT**2)/rhow)
    582          ustar = max (ustar,ustar_min)

I'm only guessing that the issue is the wlat = c0 line, but adding wlat to the MPAS-SI driver does fix it!

@apcraig
Copy link
Collaborator

apcraig commented Apr 16, 2023

I duplicated the error in CICE for wlat and am testing a fix. Should have a PR to the Consortium soon that we can merge into the E3SM Icepack.

Separately, I update the Icepack Recent Changes page, https://github.com/CICE-Consortium/Icepack/wiki/Icepack-Recent-Changes, since the last release. I think the current Icepack version in E3SM is based on the prior release in early Dec. Nothing jumps out as being error changing except for fsd and snwgrain. Let me know if you need any help trying to understand the answer changes in E3SM.

@eclare108213
Copy link
Owner Author

eclare108213 commented Apr 17, 2023

I'm at a bit of a loss as to what's changing BFB-ness. I've tried changing a few things back, including replacing the entire mpas_seaice_icepack.F with the version in PR#10, which is before your snicarad branch and my therm1 changes, and there are still differences. So it seems to be an interaction of the new changes in icepack that we're bringing in and the older implementation, which basically was just the initial setup and the radiation calculations. I guess I could try turning off dEdd...

The next thing to do would be to come at it from a different direction -- go back to one of the very early hashes in this branch (e.g. at PR#4 or 5), update to the latest icepack and see if it still changes answers. If so, then that would indicate that there's something fundamentally different (wrong?) in the basic implementation of Icepack in MPAS-SI, I think. If not, then try the same thing with newer PRs to see when it starts happening. Other ideas?

EDIT: I'm going back to the code at PR#5, will update to the latest Icepack and check whether the new results are BFB with the original.

One thing I did notice is that we changed CCSMCOUPLED to CESMCOUPLED in Icepack. MPAS-SI is still using CCSMCOUPLED, and it is in fact turned on in my tests. That mainly affects the orbital stuff (E3SM will want to use shared code instead of local icepack code); other instances are taken care of by icepack_parameter settings or are for different configurations than the one I'm running. At the moment I think it shouldn't matter for this nonBFB problem I'm looking for, but we will need to figure out how to address the orbital differences in Icepack.

@eclare108213
Copy link
Owner Author

eclare108213 commented Apr 18, 2023

PR#5 is BFB in 3-month D cases for both k000 (column) and k001 (icepack) when updated with the latest icepack from the E3SM fork. I am now stepping through the PRs since then to identify which one becomes non-BFB.

Update:
PR#9 k001 failed to run
PR#10 ran successfully and is BFB
PR#12 ran successfully and is BFB

PR#11 k001 failed to run due to the wlat problem. Copied changes from apcraig/Icepack@dd2ef5a, which allowed it to run. It is not BFB. So the issue has been narrowed down to the vertical_thermodynamics PR#11.

njeffery and others added 2 commits April 28, 2023 14:44
-changes initialization of counter to 1 in atm-boundary
nonBFB because it changes the number of interations

-Initializes airOceanDrag ratio to 1
BFB except when atmos_boundary_layer = 'constant' and only changes
restart output for airOceanDrag ratio.

-Corrects uVel -> vVel in atmos_boundary (bugfix, nonBFB)
…e-icepack-2

Adds changes for consistency with icepack and bugfix for boundary layer
@eclare108213 eclare108213 marked this pull request as ready for review May 3, 2023 15:58
@darincomeau
Copy link
Collaborator

@eclare108213 I have two 5 year D-cases in the queue on anvil at the head of this branch (2978bdf), one icepack, one column_physics, for CICE-QC testing. You mentioned a third case that was pre BFB-column package changes? I'm not sure which has exactly here you're looking for?

@eclare108213
Copy link
Owner Author

@darincomeau Don't worry about the third case, for now. The next time I update this icepack-integration branch from E3SM main would be a better time to compare with the icepack and column runs. This branch is behind E3SM main at the moment so the comparison might be more confusing than helpful.

@darincomeau
Copy link
Collaborator

icepack vs. column_package passes CICE-QC testing:

Running QC test on the following directories:
  /lcrc/group/acme/ac.dcomeau/scratch/anvil/20230525.DTEST-JRA1p5.TL319_EC30to60E2r2.anvil.icepack.intel/run
  /lcrc/group/acme/ac.dcomeau/scratch/anvil/20230525.DTEST-JRA1p5.TL319_EC30to60E2r2.anvil.column.intel/run
Number of files: 61
2 Stage Test Passed
Quadratic Skill Test Passed for Northern Hemisphere
Quadratic Skill Test Passed for Southern Hemisphere
Creating map of the data (ice_thickness_20230525.DTEST-JRA1p5.TL319_EC30to60E2r2.anvil.icepack.intel.png)
Creating map of the data (ice_thickness_20230525.DTEST-JRA1p5.TL319_EC30to60E2r2.anvil.column.intel.png)
Creating map of the data (ice_thickness_20230525.DTEST-JRA1p5.TL319_EC30to60E2r2.anvil.icepack.intel_minus_20230525.DTEST-JRA1p5.TL319_EC30to60E2r2.anvil.column.intel.png)

Quality Control Test PASSED

@darincomeau
Copy link
Collaborator

@eclare108213 eclare108213 changed the title updating Icepack to hash 6ca316f, adding wlat for FSD updating Icepack to hash 6ca316f, adding wlat for FSD, updating colpkg May 26, 2023
@eclare108213 eclare108213 merged commit 79526bd into eclare108213/seaice/icepack-integration May 26, 2023
@eclare108213 eclare108213 deleted the update_icepack_2 branch November 8, 2023 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants