Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FATES branch runs #2955

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

samsrabin
Copy link
Collaborator

@samsrabin samsrabin commented Feb 6, 2025

Description of changes

In FATES branch runs, prevents attempts to re-allocate arrays that already are allocated (in the restart file).

This prevents run failures. However, it exposes a failure in the COMPARE_base_hybrid step of ERI tests, due to a bad hist_mfilt. This PR will also fix that.

Remaining tasks:

  • Fix COMPARE_base_hybrid failure.
  • Add FATES ERI tests to aux_clm and fates test suites.

Notes for testing

Baseline comparisons may fail for the following testmods:

  • Fates
  • FatesColdAllVars
  • FatesColdDryDepSatPhen
  • FatesColdHydro
  • FatesColdMeganSatPhen
  • FatesColdSatPhen
  • fire_emis
  • glcMEC_long

However, this is only because I have changed hist_mfilt for those to 1. (From 365 for the Fates testmods; from 30 for fire_emis; from 12 for glcMEC_long). This ensures that the dates in the history filenames are the same for both parts of the ERI test.

When this is next in the merge queue:

  1. Rebase onto the latest tag.
  2. Check out the commit from before changing mfilt. Run aux_clm and fates suites, comparing to the latest baseline, expecting no diffs.
  3. Check out the tip of this branch. Run aux_clm and fates suites, comparing to the latest baseline, expecting baseline comparison failures (a) only for those tests and (b) only because of missing baseline files.

Specific notes

Contributors other than yourself, if any: @ckoven

CTSM Issues Fixed:

Are answers expected to change (and if so in what way)? No, although see "Notes for testing" above.

Any User Interface Changes (namelist or namelist defaults changes)? No

Does this create a need to change or add documentation? Did you do so? No

Testing performed, if any: ERI_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates passes.

@samsrabin samsrabin added bug something is working incorrectly test: aux_clm Pass aux_clm suite before merging test: fates Pass fates test suite before merging labels Feb 6, 2025
@samsrabin samsrabin added this to the ctsm6.0.0 (code freeze) milestone Feb 6, 2025
@samsrabin samsrabin added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Feb 6, 2025
@samsrabin
Copy link
Collaborator Author

samsrabin commented Feb 6, 2025

Two files are getting produced in each run, but only the cpl.hi output file has matching dates:

ERI_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0206-090735de.cpl.hi.2002-02-20-00000.nc.base
ERI_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0206-090735de.cpl.hi.2002-02-20-00000.nc.hybrid

The clm2.h0 output file has mismatched dates:

ERI_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0206-090735de.clm2.h0.2002-01-28-00000.nc.base
ERI_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0206-090735de.clm2.h0.2002-01-02-00000.nc.hybrid

The entire ref2 run is completing, so it's not that. It just seems like either it's not writing the outputs or the archive step is messing up.

@ekluzek any idea why this might be happening?

@samsrabin
Copy link
Collaborator Author

Figured it out—mfilt is set wrong. Fixing.

@samsrabin samsrabin marked this pull request as ready for review February 10, 2025 20:04
@samsrabin samsrabin requested review from rgknox and glemieux February 10, 2025 20:04
@wwieder
Copy link
Contributor

wwieder commented Feb 13, 2025

@rgknox has new ERI tests that are coming in, but they maybe on the FATES side (not aux_clm).
@samsrabin suggesting we migrate an ERS test to an ERI test, which may be good enough for now, but this requires daily history output to avoid date conflicts.

@samsrabin samsrabin self-assigned this Feb 13, 2025
@samsrabin samsrabin added PR status: awaiting review Work on this PR is paused while waiting for review. and removed next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Feb 13, 2025
Copy link
Collaborator

@glemieux glemieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing this.

@samsrabin samsrabin added PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete and removed PR status: awaiting review Work on this PR is paused while waiting for review. labels Feb 16, 2025
@@ -738,7 +738,7 @@ subroutine initialize2(ni,nj, currtime)
deallocate(wt_nat_patch)

! Initialise the fates model state structure
if ( use_fates .and. .not.is_restart() .and. finidat == ' ') then
if ( use_fates .and. .not. (is_restart() .or. nsrest .eq. nsrBranch) .and. finidat == ' ') then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsrabin I was expecting the top part of this logical block to also exclude hybrid runs along with branch runs. Is a hybrid run equivalent to is_restart()=.true. ?
For future work I feel like we should clean some of this logic up, I cobbled a lot of this together many years ago when I did not have a good sense of restart options. But my take is that our decision making on how to restart fates (ie cold or from pre-existing vegetation structure in a restart file) really comes down to if it is a startup, branch or hybrid. The finidat string is less straight forward to use here in decide what part of the initialization sequence to use. My understanding is that the finidat should have a meaningful file path only in the latter two scenarios (branch and hybrid), so its better to use the switches that are the root of deciding these things rather variables like finidat that are "downstream" of them. Does that make sense? Again, I'd like to make an issue of this and flag it for future work and not complicate this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do know that is_restart() returns true in hybrid runs, but I don't know if they're fully equivalent. As for the rest of it, since you think that work makes sense to do somewhere else, would you mind making an issue with your thoughts and then resolving this conversation?

@samsrabin samsrabin added next this should get some attention in the next week or two. Normally each Thursday SE meeting. PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete and removed PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete test: aux_clm Pass aux_clm suite before merging test: fates Pass fates test suite before merging
Projects
Status: In progress - master/b4b-dev
4 participants