Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates and fixes to history zeroing and flushing #1208

Merged
merged 9 commits into from
Jun 11, 2024

Conversation

rgknox
Copy link
Contributor

@rgknox rgknox commented May 23, 2024

Description:

Some multi-dimensioned history variables were not being reported correctly because they were not being zero'd. These variables were likely incremented over a patch-cohort loop, but being incremented on top of an uninitialized number. This set of changes seeks to make the flushing and zeroing more consistent across all of the different groups of history variables.

This also reduces confusion by moving flushing and zeroing to occur inside the routine that performs the updates. These flushing and zeroing calls still happen redundantly in the host models, should be removed, but are not a liability to generating incorrect results. These calls are just superfluous.

Fixes #1207

Collaborators:

@samsrabin

Expectation of Answer Changes:

Checklist

If this is your first time contributing, please read the CONTRIBUTING document.

All checklist items must be checked to enable merging this pull request:

Contributor

  • The in-code documentation has been updated with descriptive comments
  • The documentation has been assessed to determine if updates are necessary

Integrator

  • FATES PASS/FAIL regression tests were run
  • Evaluation of test results for answer changes was performed and results provided

Documentation

Test Results:

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

@samsrabin
Copy link
Contributor

Is there an addition or change we can/should make to the CTSM test list to avoid a bug like this in the future? E.g., I first noticed this with FATES_PATCHAREA_AP—is it not on any output file in the tests?

@rgknox rgknox force-pushed the history-zerofixes branch from 641e75b to c82918f Compare May 28, 2024 18:17
@rgknox
Copy link
Contributor Author

rgknox commented May 29, 2024

There are other things to do with cleaning up the history files, but they are out of scope, so I think this PR is feature complete and ready for review.
I ran a one-off test and it passsed ERS_Ld10.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars

@rgknox
Copy link
Contributor Author

rgknox commented May 29, 2024

@samsrabin we periodically need to update the FatesColdAllVars test to make sure it has all of the variables. We have this script: tools/FindInactive.py that trawls the code and updates the list, seems like we need to update it.

Copy link
Contributor

@glemieux glemieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note and clarification on possible cleanup. I'll start testing on this as soon as derecho is back up.


! ======================================================================================

subroutine flush_all_hvars(this,nc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note per discussion with Ryan this is staged for the next PR mentioned here: #1208 (comment)

main/FatesHistoryInterfaceMod.F90 Show resolved Hide resolved
@glemieux
Copy link
Contributor

glemieux commented Jun 4, 2024

regression testing underway

@glemieux
Copy link
Contributor

glemieux commented Jun 5, 2024

Regression testing on cheyenne against fates-sci.1.76.3_api.35.1.0-ctsm5.2.007 shows that most of the tests are returning B4B or the only diffs are in the fill values. That said, the following tests are return DIFFS:

FAIL ERP_Ld9.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPRT2 BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL ERS_D_Ld5.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL ERS_Lm12.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesFireLightningPopDens BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL SMS_Lm1.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdBasic BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL SMS_Lm3_D_Mmpi-serial.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF

I need to take a look at these to see if they make sense, but there two that jump out at me at first glance. The ERP_Ld9.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars and ERS_Lm12.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesFireLightningPopDens tests returns huge DIFFS in FATES_MORTALITY_PF due to the new values being huge (i.e. 1e+38). Spitfire is set to mode 1 in the AllVars test and tree damage is on as well. Spitfire is set to mode 4.

@rgknox
Copy link
Contributor Author

rgknox commented Jun 10, 2024

The dynamics time-scale complex variables were not being flushed and zero'd correctly, this was corrected, the MORTALITY_PF variable is an example of that

@glemieux
Copy link
Contributor

glemieux commented Jun 10, 2024

Per our conversation, I went back and looked and realized I misread the cprnc output. I keep thinking the forth line below is the current commit, but it's actually the baseline:

FATES_MORTALITY_PF   (lon,lat,fates_levpft,time)  t_index =      2     2
       12996    39744  (    33,     2,     1,     1) (    23,    21,    10,     1) (    33,     2,     1,     1) (    33,     2,     1,     1)
                12996   1.225934268493151E-01          2.799785209598797E-03 1.1E+38  1.225934268493151E-01 1.0E+00  1.225934268493151E-01
                12996   1.050000000000002E+38          1.050000000000002E+38          1.050000000000002E+38          1.050000000000002E+38
                39744  (    33,     2,     1,     1) (    33,     2,     1,     1)
           avg abs field values:    6.526914797498996E-02    rms diff: 1.0E+38   avg rel diff(npos):  1.0E+00
                                    1.050000000000126E+38                        avg decimal digits(ndif):  0.0 worst:  0.0
RMS FATES_MORTALITY_PF               1.0500E+38            NORMALIZED  2.0000E+00

So this is indeed corrected as you explained. I think we can integrate.

For postertity, the test location on derecho: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr1208-fates

@glemieux glemieux merged commit b8e4eee into NGEET:main Jun 11, 2024
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

FATES_PATCHAREA_AP only has data in chronologically-first history file
3 participants