Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long run restart fix #1098

Merged
merged 7 commits into from
Nov 10, 2023
Merged

Long run restart fix #1098

merged 7 commits into from
Nov 10, 2023

Conversation

rgknox
Copy link
Contributor

@rgknox rgknox commented Oct 16, 2023

Description:

We had been calling a routine (canopy_structure()) during the restart read process. The purpose was to facilitate rebuilding the canopy. However, this had an inadvertent effect of changing results from base, because this procedure would not just rebuild, but modify the cohort compositions due to termination and promotion/demotion in the canopy. We realized that we could simply bypass this call, and add two variables to the restart file to achieve b4b restarts in tests.

This is an API change PR, and needs to be synchronized with ESCOMP/CTSM#2199

Fixes #1051

Collaborators:

@mvdebolskiy identified the locus of the problem which enabled its fix, with team contributions and troubleshooting from @mvertens , @glemieux , @rosiealice and @ckoven

Expectation of Answer Changes:

No answer changes aside from a fixed bug.

Checklist

If this is your first time contributing, please read the CONTRIBUTING document.

All checklist items must be checked to enable merging this pull request:

Contributor

  • The documentation has been assessed to determine if updates are necessary

Integrator

  • FATES PASS/FAIL regression tests were run
  • Evaluation of test results for answer changes was performed and results provided

Documentation

Test Results:

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

@rgknox
Copy link
Contributor Author

rgknox commented Oct 23, 2023

Update, this set of changes does enable some tests to pass the restart compare test that did not previously (ie ERS 5month f10/f45), however longer ERS tests a la 10 months still do not pass.

@glemieux glemieux changed the title long run restart fix WIP: long run restart fix Oct 23, 2023
@glemieux glemieux added the PR status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. label Oct 23, 2023
@rosiealice
Copy link
Contributor

Oh boo. Should we push these changes anyway and start a new line of investigation for the subsequent failure?

Just checking whether it is nocomp or full fates mode that fails? Or both?

@mvdebolskiy
Copy link

mvdebolskiy commented Oct 24, 2023

Can confirm, ERS_Lm11... passes, ERS_Lm12 and 13 do not.

@rosiealice
Copy link
Contributor

Can you decide those for me @mvdebolskiy ? (Are they both nocomp?)

@mvdebolskiy
Copy link

@rosiealice both nocomp. Also spitfire mode 0.

… so it doesnt happen an extra time on the restart call (unlikely but possible)
@rgknox
Copy link
Contributor Author

rgknox commented Oct 29, 2023

All, I've made some more updates. See comments in #1051 regarding spread. I also realized that we had not added the variable that remembers a plants net carbon uptake at the leaf layer, to the restart. This finally adds it (although its a memory hog, but a necessary one right now).
I now have 13 month restart tests passing for full fates with this branch. Please give it a try and stress test it.

@rgknox rgknox removed the PR status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. label Oct 29, 2023
@rgknox rgknox closed this Oct 31, 2023
@rgknox rgknox deleted the nocomp-fix-tests branch October 31, 2023 18:10
@rgknox rgknox restored the nocomp-fix-tests branch October 31, 2023 19:41
@glemieux glemieux reopened this Oct 31, 2023
@glemieux glemieux changed the title WIP: long run restart fix Long run restart fix Nov 2, 2023
@rgknox rgknox requested a review from glemieux November 8, 2023 15:09
Copy link
Contributor

@glemieux glemieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor area of comment cleanup suggested, but we can also punt on it until later.

@@ -1124,25 +1133,35 @@ subroutine define_restart_vars(this, initialize_variables)

! Only register satellite phenology related restart variables if it is turned on!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup; since we use hlm_use_sp in a bunch of places still, I think moving this up to be included in a general comment would be ideal. Or we can strike it altogether since it's not relevant in this specific place in the code anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @glemieux will incorporate into future PR

@rgknox rgknox merged commit 44ba97a into NGEET:main Nov 10, 2023
1 check was pending
@glemieux
Copy link
Contributor

glemieux commented Nov 16, 2023

It looks like this fixed #897 and ESCOMP/CTSM#667. This can be seen via #1096 (comment), which is the next pull request to be tested after this was integrated.

@glemieux glemieux linked an issue Nov 16, 2023 that may be closed by this pull request
peterdschwartz added a commit to E3SM-Project/E3SM that referenced this pull request Jan 30, 2024
… next (PR #6018)

This pull requests updates the ed_update_site call in elmfates_interfacemod to pass a flag for when this procedure is called during restart.
This update should be coordinated with NGEET/fates#1098, which addresses the long duration exact restart issue NGEET/fates#1051.

Additionally this pull request resolves #5548 by expanding the fates regression test coverage to include
more run mode options for fates at a variety of resolutions and runtimes.

[non-BFB] for FATES
Fixes #5548
peterdschwartz added a commit to E3SM-Project/E3SM that referenced this pull request Feb 1, 2024
This pull requests updates the ed_update_site call in elmfates_interfacemod to pass a flag
for when this procedure is called during restart. This update should be coordinated with NGEET/fates#1098,
which addresses the long duration exact restart issue NGEET/fates#1051.

Additionally this pull request resolves #5548 by expanding the fates regression
test coverage to include more run mode options for fates at a variety of resolutions and runtimes.

[non-BFB] for FATES
Fixes #5548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failure in ERS 3 month nocomp test ERS tests at least one year long failing across multiple test mods
4 participants