-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow nlevgrnd less than nlevurb #674
Comments
@ekluzek I just reworded the title to make it sound more like an enhancement. |
From a quick look through the code, I think it will take some care to ensure that we find and fix all relevant places. For example, consider this loop: This loop seems to implicitly assume that nlevurb <= nlevgrnd (if nlevurb > nlevgrnd, not all urban layers would be handled currently). We'll need to be vigilant to find and fix any similar looping structures. |
This is needed because currently we need to have nlevurb <= nlevgrnd, and we're using nlevurb = 5: see ESCOMP#674
@olyson can you please take this on at some point? |
Ok. Start by branching off of master? |
Yes, we want this to go onto master. So starting with latest master would be good. |
I've gone through the code and made appropriate changes to remove this assumption (I think). I'm still getting bfb for a test case (with nlevgrnd>nlevurb; ERP_Ld3.f09_g17.I1850Clm50BgcCropCru.cheyenne_intel.clm-ciso). |
Thank you @olyson ! It's possible that you could just set nlevgrnd to 4, but the easiest thing to do may be to start with the NWP configuration and tweak that. The NWP configuration is supported starting with ctsm1.0.dev038, so you'd need to merge your branch up to that (hopefully fairly easy since it's just a few tags ahead of your starting point). Then you could use a compset like I2000Ctsm50NwpSpGswpGs (with any grid). Note that this is an SP case: in that version of the code, I believe NWP and BGC are incompatible. Then search the code for Actually, probably it's a good idea to first reproduce that you get an error with this configuration from this code version without your changes, before verifying that your code changes make this problem go away. (Because I'm not sure exactly what configuration generated the error before.) If any of this is unclear, I'm happy to help. |
Thanks, that all makes sense, I'll give it a try. |
I tried compset I2000Ctsm50NwpSpGswpGs with a 1deg global grid with ctsm1.0.dev038. This ran for 5 days. I then tried changing nlevsoi and nlevgrnd to 4 in clm_varpar.F90, and removed dzsoi(5)=1.0_r8 in initVerticalMod.F90. This simulation crashed right away with a bunch of large snow balance errors (mostly associated with glacier columns but some with vegetated landunits). |
Actually, looking more closely, the snow balance errors are only warnings (because DAnstep <= skip_steps) and the model is crashing because of a NaN passed to the coupler. |
I got more information from debug mode. There is still a bounds problem in initVerticalMod.F90 where, e.g., col%z for urban columns is assumed to have vertical dimension of nlevurb (5), but is assigned a dimension of nlevgrnd (4). Since col%z and other similar arrays are global, one way to handle this is to assign col%z to spval for soil columns when nlevgrnd < nlevurb (something similar is done for col%z for urban points when nlevgrnd > nlevurb), although this seems like it has the possibility of propagating unnecessarily through the code and increase memory requirements because there are a great deal more soil than urban points. Another option might be to use explicit arrays for urban similar to how lakes are handled (zlake, dzlake) I had been focused on the biogeophysics code itself to handle nlevgrnd < nlevurb and hadn't noticed this fundamental problem. |
Turns out there are a lot of arrays that are shared between urban points and other landunit types. I've chased these down and essentially modified them with arrays dimensioned by the maximum of nlevgrnd and nlevurb. This runs for a single point urban case and a single point veg case, as well as a global NWP case, all with nlevgrnd=nlevsoi=4 and nlevurb=5, but only for a cold start. |
From discussion with @olyson here is a path forward: (1) Keith will rework some of his changes to introduce a new variable in the code and on the restart file, named something like (2) Keith will share with me the test case he has been running to confirm that he can get nlevgrnd < nlevurb to work, and which also demonstrated problems with interpolation. (3) I will add a test to the test suite that mimics (2). (4) I will make any changes needed to get init_interp working in the case where nlevgrnd < nlevurb. Once this is done, the new test (3) should pass. (5) I will rework code to remove duplication. This is particularly relevant in SoilTemperature, where on Keith's branch there are now separate loops over nlevgrnd and nlevurb that have similar or identical bodies. I will either merge these back into a single loop, or extract the bodies into shared subroutines in order to avoid having large blocks of duplicated code. This rework should preserve answers for the new test as well as standard tests in our test suite. @olyson can you think of any relevant points from our discussion that I'm missing here? |
I guess we'll have to think through how to ensure backward compatibility with restart files that won't have this new variable on it. |
I've substituted in nlevmaxurbgrnd for max0(nlevgrnd,nlevurb) everywhere. /glade/work/oleson/ctsm_nlevurbgrnd/cime/scripts/I2000Ctsm50NwpSpGswpGs_nlevurbgrnd The error message is here: /glade/scratch/oleson/I2000Ctsm50NwpSpGswpGs_nlevurbgrnd/run/cesm.log.8715431.chadmin1.ib0.cheyenne.ucar.edu.191004-135256 spvals or NaNs found in coordinate array where level_class /= ispval; this is currently unhandled ERROR in initInterpMultilevelInterp.F90 at line 435 Otherwise, it runs and restarts using a cold start. |
I've now addressed 5) by reworking SoilTemperatureMod to eliminate duplicate code. The code is still bfb with ctsm1.0.dev038 with the NWP 5SL_3m configuration (cold start). |
@olyson - My sincere apologies that this has been sitting in a mostly-done state for the last year. I am finally returning to this and can do the last pieces: a final review, maybe a bit of final cleanup and (the biggest piece) getting init_interp working. But before I do this, I wanted to check in to confirm that this is still wanted, and that the branch you pointed me to a year ago (https://github.com/olyson/ctsm/tree/nlevurbgrnd ) is still what we want. |
No worries...Yes, we still want this and the branch should still apply, thanks. |
The main point of this testmod is to exercise the case where nlevgrnd < nlevurb (connected with ESCOMP#674). If/when we have an out-of-the-box configuration with nlevgrnd < nlevurb (such as the NWP configuration once nlevurb = 10 is the standard) then we can remove this test. A secondary consideration is having a test that interpolates from a configuration with nlevgrnd > nlevurb onto one with lower nlevgrnd and nlevgrnd < nlevurb. But I don't think it's critical to maintain this test just for that purpose - e.g., if we start having an out-of-the-box configuration with nlevgrnd < nlevurb but where the finidat file also has nlevgrnd < nlevurb, then I feel it will be okay to remove this test.
Brief summary of bug
There are some bits of code that address the nlevurb level of arrays that are dimensioned by nlevgrnd. Thus an implicit assumption that nlevgrnd >= nlevurb.
@olyson @barlage
General bug information
CTSM version you are using: ctsm1.0.dev031
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected:
Non-standard configurations where you set nlevgrnd to be less than nlevurb.
We want to use this for the NWP configuration where we want to use 4 levels
with nlevurb being 5.
Details of bug
This dependence is especially true of the older simple building temperature. And we likely won't fix that. We should make sure the code aborts for that case though.
For newer cases, I think we can remove this dependence.
The text was updated successfully, but these errors were encountered: