-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grid cell-level error check for H2O #1228
Conversation
Works for non-transient simulations. In this attempt I convert most column-level variables to grid cell-level locally in subr. BalanceCheck. Using grid cell-level variables calculated in subr. lnd2atm did not work because subr. lnd2atm is called after subr. BalanceCheck.
@slevisconsulting please let me know when you feel this is ready for review. |
@billsacks I would first like to discuss the fact that
Bill clarified that this comment does not pertain to |
Add ACTIVE (T/F) column to master hist fields table and alphabetize
@slevisconsulting I don't understand the connection between that compiler bug comment and the difference in gridcell-level values between balance check and lnd2atm. Also, I looked at what is called between those two points, and the main thing that I could imagine changing some water terms is the call to ch4. Is there really some changes to water fluxes or states happening in the ch4 routine? Maybe we need to talk about this further. |
@billsacks somehow I confused myself during our call today. Now I am convinced again that calling |
Sorry, I don't understand what you mean here. Let me know if you'd like to talk more tomorrow. |
I hope this explains it better... I am considering splitting my existing
This differs from our approach for C, N, and CH4 where we used a single subroutine to perform both the column and the grid cell checks. |
@slevisconsulting thanks for explaining. That is more clear. Your plan sounds fine, but I wonder if another possibility would be to move the column-level balance checks also to after the call to lnd2atm: At least for water, I would think that should give the same result, unless ch4 is doing something to the water variables... and in that case, we may run into trouble with the gridcell-level balance checks, too. My gut feeling is to slightly prefer that (moving the column balance checks to later), but I don't feel strongly at this point if you feel that would be a big can of worms. |
Since I haven't tried either, yet, I will start with your preferred option of moving the whole |
As agreed, I moved the I ran my single point test twice. The first time I kept the locally calculated grid cell-level variables. The second time I switched to the non-local endwb_grc (only). Both times I got the same result, which seems promising. I want to show you the test result @billsacks before I replace the remaining non-local variables: So the test
Show stopper or expected in your opinion @billsacks ? |
@slevisconsulting thanks a lot for the update and for your continued work on this. This is somewhere in between show stopper and expected. It definitely is NOT a show stopper: ERRH2O is the small difference between large terms, so it is normal for the normalized RMS difference to be large for that variable even when there are only very small changes in the underlying terms. However, I think our expectation was that moving the end balance check call shouldn't change things at all, because we didn't expect any changes to water fluxes or states in between the old placement and the new placement; do you agree? If so, I think this could warrant a bit more investigation so that we understand why this is changing, to make sure that it is safe to move this call (e.g., looking at which component(s) of errh2o are changing, and seeing where in the code they change between the old and new placement). Does that make sense to you? |
Yes, this makes sense, and I will try to track it down. |
I have solved it. Subroutine lnd2atm changes qrgwl like this:
The alternative of not commenting out the above line and removing qflx_liq_from_ice_col from the accounting, results in a change of order of operations and a diff in ERRH2O that is smaller than the one I posted on Dec 10. If we want no diffs from the baseline, we need to split subr. BalanceCheck into the column part and the grid cell part. I am open to your recommendations @billsacks |
@slevisconsulting I feel it's totally fine to have answer changes in ERRH2O for this PR, as long as we understand them and are fairly confident that the column-level balance check is still working correctly. If I understand your comment correctly, one of your options involves:
If so, I'd support that solution. In looking into this, though, I saw another possible solution that may be a bit more intuitive for the sake of long-term maintenance and would keep the column-level balance check more similar to the gridcell-level, at the expense of adding an extra column-level variable to waterlnd2atm_type: I noticed that the column-level errh2o calculation currently includes: - qflx_ice_runoff_snwcp(c) &
- qflx_ice_runoff_xs(c) & In lnd2atm, it looks like the sum of those two terms gets put into qflx_ice_runoff_col; then either (a) qflx_ice_runoff_col is kept as is and qflx_liq_from_ice_col is kept at 0, or (b) qflx_liq_from_ice_col is set to qflx_ice_runoff_col and qflx_ice_runoff_col is set to 0. So I think what this means (but it would be good to double-check it) is that you could:
I'd slightly prefer this alternative. But since you've been looking at this for a while, I'm fine going with whichever seems better to you. |
I did it the way that you suggested and got bfb the same ERRH2O diff in the cprnc.out file as with my earlier test. Next I will replace the rest of the locally calculated _grc terms with their non-local counterparts. |
These tests PASS: ERP_D.f10_f10_musgs.IHistClm50Bgc.cheyenne_gnu.clm-decStart ERS_Ly3_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropQianRs.cheyenne_gnu.clm-cropMonthOutput
@billsacks |
src/biogeophys/BalanceCheckMod.F90
Outdated
! qflx_liq_dynbal_grc => waterflux_inst%qflx_liq_dynbal_grc , & ! Input: [real(r8) (:) ] slevis: place holder | ||
! qflx_ice_dynbal_grc => waterflux_inst%qflx_ice_dynbal_grc , & ! Input: [real(r8) (:) ] slevis: place holder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups, I will remove these two in the next commit.
Fix ndep from coupler There was a bug in ndep forcing (forc_ndep_grc) when receiving ndep from CAM (ndep_from_cpl true). This first appeared in ctsm5.1.dev002. NDEP forcings have been garbage since then when receiving ndep from CAM. This tag fixes that issue.
doc/ChangeLog
Outdated
|
||
[X] ctsm5_0-nwp | ||
|
||
[X] clm4_5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am realizing that I should have considered the rms diff (not the normalized diff), and I should remove the Xs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
Test suite on izumi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. Thanks so much for your work on it, and I'm really sorry it took me so long to review it! I have some relatively minor comments below. In addition, I have two other comments:
-
I think it's important to add the handling discussed here: Perform gridcell-level water balance checks bracketing the entire run loop #201 (comment) . It looks like
for_testing_zero_dynbal_fluxes
isn't triggered in any aux_clm tests, but it is used in some fully-active (B compset) tests, via the allactive-cism-test_coupling testmod. I think (but am not positive) that you could reproduce the behavior in an I compset by running a 5-day smoke test with a compset with active CISM (compset ending with "G"), by creating a testmods directory that has the shell_commands, user_nl_cism and user_nl_clm from https://github.com/ESCOMP/CESM/tree/master/cime_config/testmods_dirs/allactive/cism/test_coupling . I don't think we need to add a test like this to the test suite, but it would be good to run this as a one-time thing to verify that we can get it to pass. I expect it will fail currently from this branch. My suggestion in the above comment was to add a new variable to track the lack of conservation, but I'm now thinking that it would be okay to just bypass the gridcell-level balance check if this flag is set. I'm okay with either approach, if you have a preference. -
In the original issue (Perform gridcell-level water balance checks bracketing the entire run loop #201 ) we discussed the possibility of only doing the column-level balance checks if the gridcell-level checks fail. The only purpose of this would be for performance reasons. You could check the overall timing of balchk and/or put a temporary timer block just around the column-level balance checks to check if the timing of this is a concern. Maybe check this in the PFS test as well as one of the NWP configuration tests. If the timing of this column-level balance check is insignificant, then let's not worry about it, but if it is significant, then we may want to rework this logic somewhat to only do the column-level balance check if the gridcell-level check fails. See Perform gridcell-level water balance checks bracketing the entire run loop #201 (comment) and Perform gridcell-level water balance checks bracketing the entire run loop #201 (comment) - particularly, the suggestion of always doing the col-level checks in debug mode.
src/main/lnd2atmMod.F90
Outdated
c2l_scale_type= 'urbanf', l2g_scale_type='unity' ) | ||
do g = bounds%begg, bounds%endg | ||
water_inst%waterdiagnosticbulk_inst%tws_grc(g) = water_inst%waterdiagnosticbulk_inst%tws_grc(g) + water_inst%wateratm2lndbulk_inst%volr_grc(g) / grc%area(g) * 1.e-3_r8 | ||
water_inst%waterdiagnosticbulk_inst%tws_grc(g) = water_inst%waterbalancebulk_inst%endwb_grc(g) + water_inst%wateratm2lndbulk_inst%volr_grc(g) / grc%area(g) * 1.e-3_r8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [optional] I noticed that this line and some of the others in this file are really long. I don't understand why nag didn't complain, but would you be willing to break these long lines so they are less than 132 characters?
src/biogeophys/WaterBalanceType.F90
Outdated
@@ -124,6 +133,9 @@ subroutine InitAllocate(this, bounds, tracer_vars) | |||
call AllocateVar1d(var = this%errh2o_col, name = 'errh2o_col', & | |||
container = tracer_vars, & | |||
bounds = bounds, subgrid_level = BOUNDS_SUBGRID_COLUMN) | |||
call AllocateVar1d(var = this%errh2o_grc, name = 'errh2o_grc', & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [optional] If we don't output this to the history file, then we could make it subroutine-local rather than adding it to the class.
src/biogeophys/BalanceCheckMod.F90
Outdated
! wa(c) gets added to liquid_mass in ComputeLiqIceMassNonLake | ||
if(use_aquifer_layer) then | ||
do fc = 1, num_nolakec | ||
c = filter_nolakec(fc) | ||
if (col%hydrologically_active(c)) then | ||
if(zwt(c) <= zi(c,nlevsoi)) then | ||
wa(c) = aquifer_water_baseline | ||
end if | ||
end if | ||
end do | ||
endif | ||
|
||
! NOTES subroutines Compute*Mass* are in TotalWaterAndHeatMod.F90 | ||
! endwb is calculated in HydrologyDrainageMod & LakeHydrologyMod | ||
call ComputeWaterMassNonLake(bounds, num_nolakec, filter_nolakec, & | ||
waterstate_inst, waterdiagnostic_inst, & | ||
subtract_dynbal_baselines = .false., & | ||
water_mass = begwb_col(begc:endc)) | ||
|
||
call ComputeWaterMassLake(bounds, num_lakec, filter_lakec, & | ||
waterstate_inst, lakestate_inst, & | ||
add_lake_water_and_subtract_dynbal_baselines = .false., & | ||
water_mass = begwb_col(begc:endc)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It looks like this block duplicates code in BeginWaterColumnBalanceSingle. There's enough subtle complexity here that I'd be more comfortable if this were moved into a shared subroutine, like ComputeBegwbCol. (It looks like you'll need to pass in a begwb_col variable, since in the column version you are setting
waterbalance_inst%begwb_col
whereas in the gridcell version you are setting a local variable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, as I think about this more: the main subtlety relates to the setting of wa(c) = aquifer_water_baseline under certain conditions. I forget exactly what this is doing – I remember this was a real subtlety that tripped me up a few years ago, and I know I have some notes about this that I could dig up if needed – but I'm thinking that this isn't directly related to balance checks, so should maybe be extracted to its own routine (like reset_wa) that is called early in the driver sequencing (exactly when may depend on when wa and zwt are set... my guess is that you'd want to do this before initializing the gridcell balance checks, but you may want to confirm that wa and zwt aren't changed between there and the initialization of the column balance checks). If you do that, then I think the remaining duplication (the two calls to ComputeWaterMass*) can stay as is - i.e., you don't need to do my above suggestion of introducing a shared routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented out the if(use_aquifer_layer)
code here (see cb6a840)
...and this test passed:
ERP_D.f10_f10_musgs.IHistClm50Bgc.cheyenne_gnu.clm-decStart
If the izumi test-suite also passes, I will move the if(use_aquifer_layer)
code to its own routine as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Izumi test-suite OK (expected diffs from baseline, which for now is dev019 in my case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just spent too much time trying to get my head back into this issue. I think this is okay the way you've done it now - just doing this reset in BeginWaterGridcellBalanceSingle. The one question I have is this: When we initialize a new column (due to dynamic landunits/columns), we set wa in the new column to wa in some template column on that grid cell. This initialization happens after the gridcell balance check initialization. With the old logic (where wa was reset to aquifer_water_baseline in the column-level balance check initialization), this resetting of wa happened after the dynamic landunit column initialization.
So my question is this... I think mainly for @swensosc -
- Is it a problem when use_aquifer_layer=true, if for one time step (the first timestep after initialization of a new column in a transient case) we have not reset wa(c) = aquifer_water_baseline? i.e., is it critical that wa(c) is reset to aquifer_water_baseline at the beginning of every time step when use_aquifer_layer = true, or is it okay for this not to be the case for one time step?
I think that, if the answer is that we need to do this resetting every time step, then the right place to do it is in dynInitColumnsMod:copy_state (I think that doing this resetting at the start of the column balance check, where it was done before, could potentially break the gridcell-level balance check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slevisconsulting I would give a different explanation than the one you gave. First off, while the dynamic landuse adjustment did play in to our discussion of this (and influenced our choice of how to implement this), the non-conservation term in the latest implementation really is unrelated to dynamic landuse, at least the way that I'm thinking about it: the adjustments in wa due to dynamic landuse are already accounted for by the dynbal dribblers. Instead, I would explain this as follows:
We want to adjust wa back to the baseline under certain conditions. The right way to do this might be to use explicit fluxes from some other state, but in this case we don't have a source to pull from, so we just adjust wa without any explicit fluxes. Because we do this adjustment before initializing the column-level balance check, the column-level check is completely unaware of the adjustment. However, since this adjustment happens after initializing the gridcell-level balance check, we do have to account for this adjustment in the gridcell-level balance check. The normal way to account for an adjustment like this would be to include the flux in the balance check. But here we don't have an explicit flux, so instead track the non-conservation state. (In principle, we could calculate an explicit flux and use that, but we don't gain anything from using an explicit flux in this case.)
(This doesn't explain why we chose to use this implementation. That explanation would tie-in with subtleties of dynamic land use. I can take a stab at explaining that, too, if you'd like.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included your explanation in the latest commit. Thanks, @billsacks
Explaining the subtleties of dynamic landuse that come into play seems like a good idea for when we look at this again later. I for one will not remember these subtleties even a couple of months from now :-)
In the explanation, I am interested in understanding why it makes sense to place the code below at the initialization rather than the finalizing of the grc-level balance check:
if (use_aquifer_layer) then
call c2g( bounds, &
wa_reset_nonconservation_gain_col(begc:endc), &
wa_reset_nonconservation_gain_grc(begg:endg), &
c2l_scale_type='urbanf', l2g_scale_type='unity' )
else
wa_reset_nonconservation_gain_grc(begg:endg) = 0._r8
end if
[...]
do g = begg, endg
begwb_grc(g) = begwb_grc(g) - qflx_liq_dynbal_left_to_dribble(g) &
- qflx_ice_dynbal_left_to_dribble(g) &
- wa_reset_nonconservation_gain_grc(g)
end do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be abs. clear, if this were moved to the finalizing of the grc-level bal. check, the sign in front of wa_reset_... would change to +
.
So again, does this code need to be run at initialization instead of when finalizing? If either is ok, I'm wondering whether it is more intuitive at the finalizing because the wa_reset_... term is from the same time step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ah, yes, sorry: I missed that when reviewing your latest changes. Indeed, I was imagining that wa_reset_nonconservation_gain_grc would adjust endwb_grc, not begwb_grc. I think you want to subtract it from endwb_grc. (Imagine that wa was 4000 at the start of the time step, and the baseline is 5000. Then wa would have been increased by 1000, and wa_reset_nonconservation_gain will also be 1000. To keep the balance check working, I think we need to subtract this 1000 from endwb... if you feel otherwise, please say so, because I'm not 100% sure about this.)
If you want to say something about the relationship to dynamic landuse, I would probably put a comment like this in BeginWaterColumnBalanceSingle: This resetting of wa needs to be done after dynamic landuse so that it is done after initializing any new columns; therefore, we cannot do this resetting before initializing the gridcell-level balance check (since that is done before dynamic landuse). We do it here before initializing the column-level balance check so that we don't need to account for this non-conservation in the column-level balance check, although we still need to account for this non-conservation in the gridcell-level balance check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, and I apologize for misunderstanding the first time around. Now I am less inclined to include this most recent explanation, so I will leave it out.
src/biogeophys/BalanceCheckMod.F90
Outdated
do g = begg, endg | ||
begwb_grc(g) = begwb_grc(g) - qflx_liq_dynbal_left_to_dribble(g) & | ||
- qflx_ice_dynbal_left_to_dribble(g) | ||
end do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The sign here (- instead of +) looks wrong. It very well may actually be right (I remember there being some not-entirely-intuitive things about the sign convention of these dynbal fluxes), but assuming it's right, it would be worth adding a comment here asserting this, and on the similar lines for endwb_grc. I can help you with the wording if you aren't sure yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled with this during testing. I had a hard time convincing myself that the +
sign was wrong. But as soon as I decided to flip the sign, the previously failing conservation test passed. Two hints led me to flip the sign:
- The size of the conservation error
- The
-
signs here:
Lines 376 to 379 in 097ff98
do g = bounds%begg, bounds%endg water_inst%waterlnd2atmbulk_inst%qflx_rofliq_qgwl_grc(g) = water_inst%waterlnd2atmbulk_inst%qflx_rofliq_qgwl_grc(g) - water_inst%waterfluxbulk_inst%qflx_liq_dynbal_grc(g) water_inst%waterlnd2atmbulk_inst%qflx_rofliq_grc(g) = water_inst%waterlnd2atmbulk_inst%qflx_rofliq_grc(g) - water_inst%waterfluxbulk_inst%qflx_liq_dynbal_grc(g) enddo
Lines 396 to 398 in 097ff98
do g = bounds%begg, bounds%endg water_inst%waterlnd2atmbulk_inst%qflx_rofice_grc(g) = water_inst%waterlnd2atmbulk_inst%qflx_rofice_grc(g) - water_inst%waterfluxbulk_inst%qflx_ice_dynbal_grc(g) enddo
I would not mind a suggestion regarding the wording, since you offered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like this (I don't love it, but it's something...):
These dynbal dribblers actually store the delta state, (end - beg). Thus, the amount dribbled out is the negative of the amount stored in the dribblers. Similarly, to achieve conservation, we need to subtract (rather than add) the amount remaining to dribble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @billsacks
I slightly reworded the comment and added to both locations.
I will not investigate this further unless you want me to; however, I can't help but wonder why the C-balance code uses +
signs in CNBalanceCheck. Without digging deeper, I suspect an inconsistency in the handling of the C vs. H2O dribble terms. Let me know if you'd like me to look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that the two are inconsistent, and that's my fault. In retrospect, I think the convention I used for the water and energy dribblers is counter-intuitive. Probably what we should do is to store the negative of what we currently store in those dribblers. To be honest, I'm not sure I have the mental energy to make this change right now, but I'm certainly open to your doing it if you think the benefit would be significant. One of the things stopping me from doing it is that it will cause incorrect fluxes in the first year of any transient run that starts mid-year from an old restart file (with the current sign convention). That probably won't impact many people (and the impact should be small), so that is probably okay if you are inclined to tackle this. I'm fine either way (leaving it for now or your spending the time to change this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine not addressing it now. If you don't mind, I can include some of this info in the comment that I'm adding about the -
sign.
src/biogeophys/BalanceCheckMod.F90
Outdated
|
||
end if | ||
|
||
! Snow balance check at the grid cell level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is this comment right ("at the grid cell level")? It looks to me like this check is column-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops...
src/biogeophys/BalanceCheckMod.F90
Outdated
call waterflux_inst%qflx_liq_dynbal_dribbler%get_amount_left_to_dribble_end(bounds, qflx_liq_dynbal_left_to_dribble(bounds%begg:bounds%endg)) | ||
call waterflux_inst%qflx_ice_dynbal_dribbler%get_amount_left_to_dribble_end(bounds, qflx_ice_dynbal_left_to_dribble(bounds%begg:bounds%endg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please break these long lines
@billsacks I am revisiting this conversation bc I see failures in these glcMEC tests:
Our conversation left me with the impression that I would NOT need to add |
I have two other tests failing: In the first one of these, might you have expected the "prescribed" testmods to fail at the grid cell level? The second seems like a legitimate error that I will have to investigate and it shows an error in the thousands of mm. This may be the answer to my previous post. The issue affecting the glcMEC cases may be the same as this one. |
Good point about the prescribed soil moisture streams. Indeed, I see that prescribed soil moisture is applied after the gridcell-level balance check is initialized but before the column-level balance check is initialized (in PrescribedSoilMoistureInterp). I guess the two options are (1) bypass gridcell-level balance checks when use_soil_moisture_streams is true, or (2) add an extra tracking term that tracks the change in h2osoi_liq and h2osoi_ice due to prescribed soil moisture. From looking at PrescribedSoilMoistureInterp (relevant lines seem to be the last few lines of that routine) it looks like it would be pretty straightforward to do (2), but I'm not sure how much value that has. I'm pretty equal on the two so will defer to you if you have any preference. Regarding the other failing tests, let me know if you want to talk about these. Your understanding is right that you should not need to add |
I agree with @billsacks that there's not a lot of value in the grid-cell level checks for prescribed soil moisture, so either of his solutions would be fine. And his analysis sounds right to me as well. |
@slevisconsulting I looked at one of your failing cases (/glade/scratch/slevis/tests_0211-104010ch/SMS_Lm37.f10_f10_musgs.I1850Clm50SpG.cheyenne_intel.clm-glcMEC_long.GC.0211-104010ch_int/run). It looks like the error happens at the start of the year, which suggests to me that it's coming from dynamic landunits. (I was wondering if it's coming from the lnd -> glc fluxes or dynamic landunits; I think that if it were coming from the lnd -> glc fluxes, we'd likely see the error in other time steps.) Please let me know if you'd like me to look into this. |
Thank you for your feedback @billsacks and @ekluzek So the prescribed soil moisture is the easy one. With the others, I now see that the common element in the compsets is the "G". I have opted to pursue the problem in the test |
@billsacks I will list a whole bunch of numbers here to show what I have found and to ask for guidance. First the error message. The numbers suggest to me that the error comes from a "jump" in endwb_grc. My suspicion is supported when I look at the progression of begwb_grc --> endwb_grc --> begwb_grc for the last few tsteps. I have removed here any terms equal to 0 for brevity:
For g = 172 I show next the numbers that go through subroutine c2g. First for begwb_col --> begwb_grc and then for endwb_col --> endwb_grc. I only show the last three
The last garr value agrees with begwb_grc in the error msg above, and the last garr value in the next set agrees with endwb_grc in the error msg:
Notice that carr(c) has not changed while wtgcell(c) has changed from the first set to the second set of last three values. Now come the questions: |
@slevisconsulting Thanks a lot for digging into this. What you show agrees with what's expected for dynamic landunits - well, except for the balance error in the end: we do not adjust the column-level states when areas change, so we expect a jump in the total water state when areas change. This should be compensated by the "dynbal" adjustment terms: these adjustment terms are calculated by summing the gridcell water balance before and after the area adjustments (using the same routines as are used to compute begwb and endwb), and then the difference is put in the dynbal flux. In standard cases, where the dynamic glaciers just happen once a year, most of this should end up in the dynbal dribbler; this should be true in the multi-year case, but not in this short test case you're running. However, in the short test case you're running here, the fluxes aren't generated on the year boundary, so the dribbling logic is different: any flux not generated on the year boundary should get fluxed out immediately. This should show up in this term:
via qflx_liq_dynbal_grc. It would be interesting to look at that particular term in your test case. By the way: In the multi-year case, I think that the offset by the dynbal dribblers here: endwb_grc(g) = endwb_grc(g) - qflx_liq_dynbal_left_to_dribble(g) &
- qflx_ice_dynbal_left_to_dribble(g) &
- wa_reset_nonconservation_gain_grc(g) should make the endwb close to the begwb, but apparently something isn't working right there, either. I'd be happy to help look into this next week if you'd like. |
I got the five G tests listed in this and its preceding post to PASS by introducing this change: In subroutine dyn_water_content I changed subtract_dynbal_baselines from .true. to .false. The change causes additional variables to differ in comparison to the baseline (ERRH2O was expected): Recommendations? |
@slevisconsulting Ah, good work tracking this down. This makes total sense to me, and I'm kicking myself for not realizing this would be an issue. Given #658 , I think what we need to do is to compute the begwb_grc and endwb_grc slightly differently from what is done for begwb_col and endwb_col: For the gridcell water balance, I think we need to include the dynbal baseline terms, so that the water balance check is consistent with what's used to calculate those dynbal terms. For begwb_grc, that should be straightforward: Since you use a different subroutine for calculating begwb_grc vs. begwb_col, you can change BeginWaterGridcellBalanceSingle to set subtract_dynbal_baselines and add_lake_water_and_subtract_dynbal_baselines to true rather than false. For endwb_grc, I think some slightly more extensive changes are needed: For the sake of calculating tws in lnd2atm, we want to go back to the old version, where we did the c2g call directly into tws_grc rather than into endwb_grc: call c2g( bounds, &
water_inst%waterbalancebulk_inst%endwb_col(bounds%begc:bounds%endc), &
water_inst%waterdiagnosticbulk_inst%tws_grc (bounds%begg:bounds%endg), &
c2l_scale_type= 'urbanf', l2g_scale_type='unity' )
do g = bounds%begg, bounds%endg
water_inst%waterdiagnosticbulk_inst%tws_grc(g) = water_inst%waterdiagnosticbulk_inst%tws_grc(g) + water_inst%wateratm2lndbulk_inst%volr_grc(g) / grc%area(g) * 1.e-3_r8
enddo This is important because tws needs to NOT take the dynbal baseline terms into account, due to #658 . But then we need to calculate endwb_grc similarly to begwb_grc - i.e., setting subtract_dynbal_baselines and add_lake_water_and_subtract_dynbal_baselines to true rather than false. Probably the best thing to do is to extract the core of BeginWaterGridcellBalanceSingle into a routine that has an extra argument giving the variable to set; then you can call it from BeginWaterGridcellBalanceSingle setting begwb_col and then again later setting endwb_col. This will have the side-benefit of consolidating some code that is currently duplicated between BeginWaterGridcellBalanceSingle and BalanceCheck. Does that make sense? If not, we can chat. By the way: I considered an alternative solution of making begwb_col and endwb_col also taking the dynbal baselines into account, and then doing a separate call for the sake of tws (rather than using endwb_col to set tws). I think that could mostly work, but it would also impact qflx_qrgwl (by roundoff, I think), leading to answer changes that would probably be best to avoid in this PR. But if you think that's a significantly better solution, I'm open to that as long as we can convince ourselves that the resulting answer changes are okay. |
1) Addressed the former with some code reorg. At the crux of it though was the need to pass subtract_dynbal_baselines = .true. and add_lake_water_and_subtract_dynbal_baselines = .true. instead of .false. when calling ComputeWaterMassNonLake and ComputeWaterMassLake at the beginning and end of the time step. 2) Addressed the *clm-prescribed test failure by bypassing the grc-level water balance check when use_soil_moisture_streams = .false.
The two test suites are running. I will let you know when OK. |
@slevisconsulting thank you for these changes. I am mostly very happy with how your latest set of changes look! There is one change I would make. If you need to redo testing anyway, it would be great if you could fold this change in, if you agree with my suggestion. But if your testing all passes, we can fold this change into a follow-up tag (it's not worth redoing all the testing just for this change):
Let me know what you think. Again, if your testing passes, we can defer this to a follow-up tag. |
I never hope for tests to fail, but I also dislike putting off small changes like this :-) (Summary file located here: |
No problem putting it off: I have a backlog of some simple bfb changes, so I really need to make a simple bfb tag soon anyway, and this could go in there. Regarding the one test with larger diffs: that seems really weird. Wasn't this passing before your latest changes? It seems weird that your latest changes would cause a failure in that test and nothing else. I'd probably try rerunning it before spending any time digging in, just in case there was some machine glitch. If you're still seeing diffs, I have no idea what could be going on. |
I'm afraid the test still fails. I last ran the summary comparison on the izumi tests prior to updating to dev023 and, you're right that the test had passed at the time. Anyway, I'll investigate. |
Then I'd start by confirming that it passes if you run it from a fresh checkout of master. |
Same error! This suggests that the archived dev023 test is wrong. How do you go about correcting such a situation? |
@slevisconsulting if the baseline is wrong it sounds like it's my fault since I made dev023. So I should redo it. Is it just one test thats bad? I'll look into why the baseline would be wrong. Possibly I reran it or something? I'm not quite sure... |
OK, there was some sort of glitch in my test case. But, I was able to rerun and it and now it passed as expected. PASS SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50Sp.izumi_nag.clm-ptsRLA CREATE_NEWCASE It looks like it archived the baseline as well. So I think you should be good to go now. Let me know if there's more trouble with it... |
@ekluzek for some reason I still get the large diffs between dev023 master and baseline when I run |
Hmmm. OK. Let me do some more work and make sure there isn't something different on the branch I tested and dev023. I'll try to replicate what you did as well. Sorry about this trouble. Let me get back to you... |
I'd suggest at this point:
@slevisconsulting I have moved my one outstanding request to #1286 . I'll take care of that since I have a clear vision of what I want and it will be quick to implement - and it's about time to make a simple bfb tag anyway. |
I have confirmed (1). |
Description of changes
Introducing a grid cell-level error check for H2O to address issue #201
Following the approach of pull requests #984 and #1022
Specific notes
Contributors other than yourself, if any:
@billsacks
CTSM Issues Fixed (include github issue #): #201
Are answers expected to change (and if so in what way)?
No
Any User Interface Changes (namelist or namelist defaults changes)?
No
Testing performed, if any:
ERI_D_Ld9.1x1_camdenNJ.I2000Clm50BgcCruGs.cheyenne_intel.clm-default PASS
ERI_D_Ld9.f10_f10_musgs.I1850Clm50Bgc.cheyenne_gnu.clm-default PASS
ERP_D_Ld5.f10_f10_musgs.IHistClm50BgcCrop.cheyenne_intel.clm-allActive PASS
ERS_Ly3_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropQianGs.cheyenne_gnu.clm-cropMonthOutput FAIL
The last test fails because I have not handled the special case of transient simulations, yet. I will address this in this PR.