Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change MPAS-Ocean high-frequency output mode to append #6434

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

xylar
Copy link
Contributor

@xylar xylar commented May 21, 2024

Without this, the first output is missing after each restart

fixes #6432

@xylar
Copy link
Contributor Author

xylar commented May 21, 2024

@zhangshixuan1987 and @wlin7, can you verify that this fixes #6432? I am happy to do some testing but I think you are in a better position to verify this.

@xylar
Copy link
Contributor Author

xylar commented May 21, 2024

Note: an alternative fix would be to change:
https://github.com/E3SM-Project/E3SM/blob/master/components/mpas-ocean/cime_config/buildnml#L862
to append instead of truncate. This may be the preferred solution, since we're not 100% sure that the fields typically included in the high frequency stream are available after a restart.

@xylar xylar requested a review from mark-petersen May 21, 2024 14:50
@rljacob rljacob added this to the v3.0.1 milestone May 22, 2024
@zhangshixuan1987
Copy link
Contributor

zhangshixuan1987 commented May 25, 2024

Note: an alternative fix would be to change: https://github.com/E3SM-Project/E3SM/blob/master/components/mpas-ocean/cime_config/buildnml#L862 to append instead of truncate. This may be the preferred solution, since we're not 100% sure that the fields typically included in the high frequency stream are available after a restart.

Hi @xylar: I conducted a set of 2-month short-term simulations on Compy with the code changes you suggested:

  • The code for this simulation is master branch on March 21 2024 (commit 2cfa5f184848e9de083d71a7c670a8af8efcfde4).
  • Sensitivity experiments were conducted without (/compyfs/zhan391/e3sm_project/E3SMv3_testings/20240521.v3.default.WCYCL1850.ne30pg2_r05_IcoswISC30E3r5.compy) and with (see /compyfs/zhan391/e3sm_project/E3SMv3_testings/20240521.v3.f6434.WCYCL1850.ne30pg2_r05_IcoswISC30E3r5.compy)) the changes in the buildnml suggested above.

The changes here indeed solved the issues:

  • Without change: ncdump -t -v xtime 20240521.v3.default.WCYCL1850.ne30pg2_r05_IcoswISC30E3r5.compy.0161-01-01_00.00.00.nc
data:

xtime =
 "0161-01-06_00:00:00",
 "0161-01-11_00:00:00",
 "0161-01-16_00:00:00",
 "0161-01-21_00:00:00",
 "0161-01-26_00:00:00",
 "0161-01-31_00:00:00" ;
  • With change: ncdump -t -v xtime 20240521.v3.f6434.WCYCL1850.ne30pg2_r05_IcoswISC30E3r5.compy.mpaso.hist.am.highFrequencyOutput.0161-01-01_00.00.00.nc
data:
xtime =
 "0161-01-01_00:00:00",
 "0161-01-06_00:00:00",
 "0161-01-11_00:00:00",
 "0161-01-16_00:00:00",
 "0161-01-21_00:00:00",
 "0161-01-26_00:00:00",
 "0161-01-31_00:00:00" ;

@xylar
Copy link
Contributor Author

xylar commented May 25, 2024

Those error messages are annoying for sure but ultimately harmless I think. Did you make both changes? compute/write on startup and append? If so, you only want append and not compute/write on startup.

@xylar
Copy link
Contributor Author

xylar commented May 25, 2024

Do the high frequency files have all the expected time entries?

@zhangshixuan1987
Copy link
Contributor

Do the high frequency files have all the expected time entries?

Hi @xylar : Yes, I think that this change can solve the issues I originally encountered. However, when I employ this solution in a new G-case simulation, it seems to me that it can potentially trigger a new potential issue for the following scenario:

  • When you restart the model to rerun a piece of simulation. For example, my simulation stopped at 0199-03-01_00.00.00.nc due to hitting the job walltime limit. So I restarted the simulation at 0199-01-01_00.00.00.nc to continue my simulation. Note that previous run already generated files at 0199-02-01_00.00.00 and 0199-03-01_00.00.00. The restart simulation are supposed to overwrite these files (as we restart the model at 0199-01-01_00.00.00). However, using append instead of truncate triggers the following error from mpaso:
ERROR: Writing to stream 'highFrequencyOutput' would overwrite record    1 in file '20240524.v3-LR.GMPAS-BC.ne30pg2_r05_icos30.anvil.mpaso.hist.am.high-frequency output.0199-02-01_00.00.00.nc',
ERROR:     but clobber_mode is set to 'append'.

I think this error is expected from the purpose of append (not allowed to overwrite) and truncate (allowed to overwrite). I believe that truncate is what we want in this scenario. Overall, my point is that truncate seems to be a better choice instead of append for most of the cases.

@zhangshixuan1987
Copy link
Contributor

Those error messages are annoying for sure but ultimately harmless I think. Did you make both changes? compute/write on startup and append? If so, you only want append and not compute/write on startup.

Hi @xylar : Actually this is not the case. For the test simulation as I mentioned above, I ONLY made changes for https://github.com/E3SM-Project/E3SM/blob/master/components/mpas-ocean/cime_config/buildnml#L862. No other changes have been made.

@zhangshixuan1987
Copy link
Contributor

zhangshixuan1987 commented May 25, 2024

@xylar: I think I overlooked the changes in this pull request. I thought that #6434 and #6432 were different. In fact, the pull request along with the changes here is indeed the correct one. To avoid confusion, I would like to clarify:

  1. changing the two flags in mpas-ocean/bld/namelist_files/namelist_defaults_mpaso.xml as suggested in Potential bug in MPAS ocean output  #6432 is the solution
  2. the alternative fix with changes in https://github.com/E3SM-Project/E3SM/blob/master/components/mpas-ocean/cime_config/buildnml#L862 is NOT a preferred solution to me.

Thank you!

@xylar
Copy link
Contributor Author

xylar commented May 25, 2024

I believe that append mode is functioning as I should, and that one simply has to ignore the disconcerting error messages if one is restarting from an earlier point and cannot overwrite the data.

@xylar
Copy link
Contributor Author

xylar commented May 25, 2024

I think the append mode change remains the preferred solution despite the annoying error messages. In general, we cannot guarantee that fields selected for output are available at startup. Fields may contain all zeros. So it is safer to keep the results written before the last run wrote its final restart than to overwrite with new values after restart.

You all may use the other solution of computing/writing at startup for the alternating B- and G-cases but that won't be what we ultimately do in E3SM.

This prevents the last entry from the previous run before a
restart from being clobbered by a new run.
@xylar xylar force-pushed the ocn/fix-high-freq-output branch from 0022636 to 1a5f91d Compare May 28, 2024 10:51
@xylar xylar changed the title Compute and write ocean high-frequency output on startup Change MPAS-Ocean high-frequency output mode to append May 28, 2024
@xylar
Copy link
Contributor Author

xylar commented May 28, 2024

@wlin7, @zhangshixuan1987 and @mark-petersen, as discussed above, I'm going with the "alternative" method of fixing this issue where the output is in "append" mode rather than "truncate". This has the annoying side effect that error messages may appear after each restart (particularly if a run timed out) saying that entries could not be overwritten because the mode is "append". This was the preferred fix in our MPAS DevOps call last week.

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. The inconvenience of extra messages from a bad restart is preferred to having a time slice missing. Thanks for everyone's detailed review and testing above.

@xylar
Copy link
Contributor Author

xylar commented May 28, 2024

The inconvenience of extra messages from a bad restart is preferred to having a time slice missing. Thanks for everyone's detailed review and testing above.

@mark-petersen, do you also agree that there's some risk in setting things to compute/write at startup? That worked for what @zhangshixuan1987 and @wlin7 need for their purposes but I think there are enough fields that start out as all zeros until after the first time step of a restart run, so that the solution I now have in this branch is preferable. But I would be keen to make sure you agree.

@xylar xylar removed the request for review from wlin7 June 4, 2024 14:01
@xylar
Copy link
Contributor Author

xylar commented Jun 4, 2024

@wlin7, I'm taking you off as a reviewer. I would still welcome your input if you want to provide it but I don't think it is necessary before we merge this.

@rljacob
Copy link
Member

rljacob commented Jun 7, 2024

@jonbob please start merging this.

@jonbob jonbob added BFB PR leaves answers BFB and removed NML labels Jun 12, 2024
jonbob added a commit that referenced this pull request Jun 12, 2024
Change MPAS-Ocean high-frequency output mode to append

Without this, the first output is missing after each restart

Fixes #6432

[BFB]
@jonbob
Copy link
Contributor

jonbob commented Jun 12, 2024

Passes:

  • ERS_Ld5.T62_oQU120.CMPASO-NYF.chrysalis_intel
  • ERP_Ld3.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-pioroot1

merged to next

@jonbob jonbob merged commit dc9beef into E3SM-Project:master Jun 13, 2024
13 checks passed
@jonbob
Copy link
Contributor

jonbob commented Jun 13, 2024

merged to master

@xylar xylar deleted the ocn/fix-high-freq-output branch June 13, 2024 17:16
@xylar
Copy link
Contributor Author

xylar commented Jun 13, 2024

Thank you, @jonbob!

xylar added a commit to xylar/compass that referenced this pull request Jun 30, 2024
This merge updates the E3SM-Project submodule from [8939709](https://github.com/E3SM-Project/E3SM/tree/8939709) to [752f281b15](https://github.com/E3SM-Project/E3SM/tree/752f281b15).

This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list):
- [ ]  (ocn) E3SM-Project/E3SM#6412
- [ ]  (ocn) E3SM-Project/E3SM#6434
- [ ]  (ocn) E3SM-Project/E3SM#6420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB bug fix PR mpas-ocean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential bug in MPAS ocean output
5 participants