Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynlakes: fix some subtle issues #3

Merged
merged 6 commits into from
Sep 28, 2020

Conversation

billsacks
Copy link
Owner

Description of changes

Fix a variety of subtle issues with dynamic lakes - particularly the accounting of total water and energy.

Specific notes

This branch contains the following commits:

  • a9fa875: This is needed to avoid counting lake water in the begwb and endwb terms, which is needed because these are used to calculate gridcell total water store (TWS), which in turn influences the methane code. Because the methane code was tuned around old values of TWS, changing TWS would lead to unintentional – and potentially large – changes in methane terms. Eventually we'd like to remove methane's dependence on TWS, but for now this workaround is needed to avoid changing behavior too much. See Subtract dynbal baselines from begwb and endwb ESCOMP/CTSM#659 (comment) for more details.

  • 52105c4: This minor fix is needed for the sake of water tracers / water isotopes. It shouldn't have any impact outside of that (because the tracer_ratio of bulk water is 1)

  • de3e12c and acf0984: This one is especially subtle; it is needed for backwards compatibility with old restart files. The main changes are in de3e12c; acf0984 is just a minor tweak on top of that. The problem is that, on existing initial conditions files, there can be already-existing DYNBAL_BASELINE variables (for LIQUID, ICE & HEAT). But these pre-existing variables will have baseline values of 0 for lake. Before this commit, when you started up from an old initial conditions file, the code would use these 0 values for lake baselines (because baselines are only reset if the user explicitly asks them to be reset with a namelist flag). This commit adds some code to detect if the initial conditions file is old, and if so, recomputes dynbal baselines for lake using the new definition. Note that some even older initial conditions files didn't have the DYNBAL_BASELINE variables at all; those would have been okay before this change: the problem is with initial conditions files that are somewhat but not very old - so have DYNBAL_BASELINE variables on them that use the old definition (where lake baseline values were 0).

  • 8088c3c: Minor fix for a pre-existing issue

  • a31875d: I'm not sure if this is actually needed, but I thought it would be good to group together the lake water content and the roughly equal-but-opposite baselines, so that these can cancel to near zero before adding the smaller terms. In principle, this should help maintain precision in these smaller terms. I thought this might help resolve some of the larger-than-expected answer changes I was seeing in testing, but I don't think it actually does... but I still thought this would be good to keep in place. I have double and triple checked these changes, but it would be good to have an extra set of eyes on them to make sure I did this reordering correctly. In particular, I think there were some subtleties about when a term should accumulate on top of an existing value vs. setting the initial value of a variable.

This is needed for water tracer masses to be counted correctly
We have changed the definition of total column water for lake columns,
so the baseline values for lakes are incorrect on old initial conditions
files. This commit adds some code to check if we're using an old initial
conditions file, and if so, resets the dynbal baseline values for lakes
to use the new definition.
I went back and forth about whether we should do this, but I actually
feel that it's best if we do reset the lake baselines in a branch or
continue run, if using an older restart file. If we didn't do this, we'd
want to add some logic for writing out the issue-fixed metadata for any
further restart files written from these runs, to note that this issue
isn't actually fixed yet on these restart files.
These seem to have been missing for a while (forever?).
@billsacks
Copy link
Owner Author

billsacks commented Sep 9, 2020

@Ivanderkelen I'd like your review of this. With the changes here, I am satisfied with the results of the full CTSM test suite (except one single-point test that I want to look into a bit more to convince myself that the level of answer changes is acceptable), so I think the dynamic lakes code (without the mksurfdata_map changes for now) is ready to come to master once you give your okay to this final set of changes. However, I have NOT run any dynamic lakes runs with these changes, and that seems important to do. I do plan to soon create a single-point dynamic lakes test and run it before and after these changes to verify for myself that I haven't broken things, but I'd like your help testing this, too.

So what I'd like from you is:

  • Can you please look over these changes (either as a batch or commit-by-commit) and give your okay or make any comments on these? See my above comment for notes on the individual commits on this branch.

  • Can you please run a bit of testing with a dynamic lakes case to verify that the dynbal adjustments still look correct with dynamic lakes? Depending on what your initial conditions were for your earlier testing, it's possible that de3e12c would lead to significant changes – due to a correction in the treatment of dynbal lake baselines on old initial conditions files. (If so, I think you could set the namelist flag reset_dynbal_baselines = .true. when running with old code in order to fix the issue in old code; I think you should not need that flag with this branch.) Other than that, I expect small but not significant answer changes from this branch, and it would be great if you could verify that.

Please let me know if you'd like any help with how to do this, or if you'd like to talk more about any of this.

@Ivanderkelen
Copy link

Apologies for my late answer. I had a detailed look at your commits, and they look all fine by me.

I cannot comment on a31875d, as I don't have enough experience on the precision of accumulating large values with existing or intial values, but the reordering seems alright.

In addition, I performed some tests with dynamical lakes, running similar cases to my earlier testing a year ago.
Running with the https://github.com/billsacks/ctsm/tree/dynlakes_avoid_tws_changes branch led to a small increase of dynbal fluxes, although they stay within the same order of magnitude as the fluxes simulated with https://github.com/billsacks/ctsm/tree/dynlakes_master_notools. Setting reset_dynbal_baselines= .true. did not change values on either of the simulations.
If you want, I can share the simulated dynbal fluxes.

@billsacks
Copy link
Owner Author

@Ivanderkelen thanks for looking at this, and now it's my turn to apologize for a delay in responding!

Based on what you described from your testing, my guess is that the finidat files in your runs did NOT have the three DYNBAL_BASELINE_* variables (DYNBAL_BASELINE_LIQ, DYNBAL_BASELINE_ICE, DYNBAL_BASELINE_HEAT). Can you confirm if that's true? If so: The case with dynlakes_master_notools would have used baseline values calculated from cold start initial conditions, which I think have lake ice fraction = 0, whereas the case with dynlakes_avoid_tws_changes would have used baseline values calculated from the spunup initial conditions, which could have a non-zero lake ice fraction as well as different lake temperatures. I believe that the differences in the dynbal fluxes (when differencing the results of the new branch vs. the old branch) would then be almost exactly equal and opposite for liquid vs. ice dynbal fluxes. Does this seem like what you're seeing? I'm not sure what to expect for the dynbal energy/heat fluxes... my intuition is that they should actually be smaller in magnitude in the run from the new branch, but I'm not positive.

If this isn't clear or doesn't sound like what you're seeing, then I'd be interested in looking at the results myself if it's easy for you to share them. I just want to be sure that the differences you're seeing make sense, and that I haven't introduced a bug. For example, I especially want to make sure that I didn't introduce any issues with a31875d.

I don't understand why setting reset_dynbal_baselines = .true. leads to the same results with the older code, though that may not be too important to figure out.

@billsacks
Copy link
Owner Author

If this isn't clear or doesn't sound like what you're seeing, then I'd be interested in looking at the results myself if it's easy for you to share them.

If it's about as easy or easier for you to share your run setup – including the necessary input files – then I'd be happy to reproduce this myself. (This would also let me look at the differences that arise from the individual different commits.)

@Ivanderkelen
Copy link

You are right about the finidat files in my runs, they did not include the DYNBAL_BASELINE_* variables. For reference, they used this file: /glade/p/cesmdata/cseg/inputdata/lnd/clm2/initdata_map/clmi.I2000Clm50BgcCrop.2011-01-01.1.9x2.5_gx1v7_gl4_simyr2000_c190312.nc

As I am getting very confused about the different branches and commits, and the values I get are not matching my or your reasoning, I rather share my run setup script and namelist settings.

I use the script /glade/u/home/ivanderk/for_Bill/setup_test_dynlakes_master_notools.sh to setup and run the case. You would only need to update the SCRIPTSDIR variable to point to your clm5 repository. The script uses the namelist settings, including the inputfiles in /glade/u/home/ivanderk/for_Bill/nl_clm_dynlakes_master_test.sh. The case is set to run for 1968-1970, as in 1970 two large reservoirs appear (Aswan dam; and one in Russia), causing large fluxes.

Please let me know if you need any additional information, and thank you for having a close look at this!

@billsacks
Copy link
Owner Author

Thanks a lot @Ivanderkelen . I have run your test case on each commit in this PR to examine the incremental diffs. Everything looks as I expected. In particular, in my testing, I did see differences in the original (dynlakes_master_notools) branch when setting reset_dynbal_baselines. And the dynbal fluxes in this PR are just roundoff-level different from the dynbal fluxes in dynlakes_master_notools when setting reset_dynbal_baselines = .true. in the latter. I also spot-checked a few grid cells for their diffs in QFLX_LIQ_DYNBAL and QFLX_ICE_DYNBAL before and after all of the changes in this PR. They look reasonable based on my expectations. (The differences arise because, before this PR, the lake baselines had 0 ice.)

I'm happy to share more details if you're interested.

@billsacks billsacks merged commit 2cb54b5 into dynlakes_master_notools Sep 28, 2020
@billsacks billsacks deleted the dynlakes_avoid_tws_changes branch September 29, 2020 01:55
billsacks pushed a commit that referenced this pull request Apr 19, 2021
billsacks pushed a commit that referenced this pull request Jun 11, 2021
billsacks pushed a commit that referenced this pull request Dec 3, 2021
Run fsurdat_modifier via an appropriate python version on cheyenne
billsacks pushed a commit that referenced this pull request Jan 30, 2022
Updating flags wording to avoid confusions
billsacks added a commit that referenced this pull request Mar 7, 2022
Ideally we would do year-2000 tests to have more crop cover and thus
potentially be more useful tests. However, there are problems running a
year-2000 ciso test with crop. These problems exist even with an SMS
test on master:

I tried tests like
SMS_Ly1_P72x1.f10_f10_mg37.I2000Clm45BgcCrop.cheyenne_gnu.clm-ciso--clm-cropMonthOutput,
but both debug & non-debug, intel & gnu versions.

Debug tests fail like this (from SMS_D_Ly1_P72x1.f10_f10_mg37.I2000Clm45BgcCrop.cheyenne_gnu.clm-ciso--clm-cropMonthOutput):

30:Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
30:
30:Backtrace for this error:
13:
13:Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
13:
13:Backtrace for this error:
13:#0  0x2b9d1acc4aff in ???
30:#0  0x2b9d1acc4aff in ???
13:#1  0xf63fff in cisofluxcalc
13:     at /glade/work/sacks/ctsm_code/ctsm/src/biogeochem/CNCIsoFluxMod.F90:1555
30:#1  0xf63fff in cisofluxcalc
30:     at /glade/work/sacks/ctsm_code/ctsm/src/biogeochem/CNCIsoFluxMod.F90:1555
30:#2  0xf6b489 in __cncisofluxmod_MOD_cisoflux1
30:     at /glade/work/sacks/ctsm_code/ctsm/src/biogeochem/CNCIsoFluxMod.F90:153
13:#2  0xf6b489 in __cncisofluxmod_MOD_cisoflux1
13:     at /glade/work/sacks/ctsm_code/ctsm/src/biogeochem/CNCIsoFluxMod.F90:153
13:#3  0xe45657 in __cndrivermod_MOD_cndrivernoleaching
13:     at /glade/work/sacks/ctsm_code/ctsm/src/biogeochem/CNDriverMod.F90:559
30:#3  0xe45657 in __cndrivermod_MOD_cndrivernoleaching
30:     at /glade/work/sacks/ctsm_code/ctsm/src/biogeochem/CNDriverMod.F90:559

An intel test dies in the same place.

Non-debug versions die like this (both for gnu and intel):

30: set_curr_delta ERROR: found unexpected non-zero delta mid-year
30: Dribbler name: hrv_xsmrpool_to_atm_c_13
30: i, delta =            2                       NaN
30: Start of time step date (yr, mon, day, tod) =         2000           1          15       57600
30: This indicates that some non-zero flux was generated at a time step
30: other than the first time step of the year, which this dribbler was told not to expect.
30: If this non-zero mid-year delta is expected, then you can suppress this error
30: by setting allows_non_annual_delta to .true. when constructing this dribbler.
30:iam = 30: local  gridcell index = 2
30:iam = 30: global gridcell index = 103
30:iam = 30: gridcell longitude    =  285.0000000
30:iam = 30: gridcell latitude     =  -10.0000000
30: ENDRUN:
30: ERROR: set_curr_delta: found unexpected non-zero delta mid-year: ERROR in /glade/work/sacks/ctsm_code/ctsm/src/utils/AnnualFluxDr
ibbler.F90 at line 276

So there is some issue with year-2000 ciso tests with crop. This issue
exists on master, for clm45 and clm50 tests. (e.g., for clm50, I tried
SMS_D_Ly1_P72x1.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_gnu.clm-ciso--clm-cropMonthOutput.)
slevis-lmwg pushed a commit that referenced this pull request Jul 9, 2024
updating input data file paths
billsacks pushed a commit that referenced this pull request Aug 24, 2024
Update `fates_harvest_mode` to use characters for namelist option select
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants