Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge to clm > r17x-ish and stable cime tag #46

Closed
bandre-ucar opened this issue Apr 6, 2016 · 59 comments
Closed

merge to clm > r17x-ish and stable cime tag #46

bandre-ucar opened this issue Apr 6, 2016 · 59 comments

Comments

@bandre-ucar
Copy link
Contributor

Summary of Issue:

Merge to a more recent clm trunk tag based on clm4_5_x > r17x-ish. This will include critical bugfixes for clm, new but stable cime user interface scripts and a better starting point for clm5 - ED integration.

Code review and testing needed for:

  • port to LBL, LANL machines

Significant user interfaces changes

  • Case scripts are now 'case.build' instead of '${CASE_NAME}.build'
@bandre-ucar
Copy link
Contributor Author

bandre-ucar commented Apr 8, 2016

Work taking place on:

andre-ed-clm-16x

ed-clm-master changeset: c3a1f92

Testing: for update through r150 were 'ok'. expected failures for f09 and f19 restarts, #14. answer changing, no baseline comparison.

@bandre-ucar
Copy link
Contributor Author

Testing for r159 - massive failures for ed tests related to pio. Expected because of problems with ed/pio in clm trunk. Going to continue merging to current r174. If problems aren't fixed, then backtrack to r159-ish or fix on head....?

@bandre-ucar
Copy link
Contributor Author

TODO(bja): Check CNWoodProducts change for cnveg nitrogen state intent from Stef's branch is in new CNProducts?

@bandre-ucar
Copy link
Contributor Author

Testing on yellowstone with c3a1f92 / r175 just to see where things stand.

@bandre-ucar
Copy link
Contributor Author

The clm-ed parameter file needs to be merged with the standard clm parameter file. Merged clm_params_ed.c151027.nc with clm_params.c160225.nc to generate clm_params_ed.c160225.nc. The resulting parameter set needs science review. slatop had minor conflicts.

TODO(bja, 20160412) add clm_params_ed.c160225.nc to input data repo!

@ckoven
Copy link
Contributor

ckoven commented Apr 13, 2016

Ben -- what are the differences in the variable lists between these PFT parameter files?

@bandre-ucar
Copy link
Contributor Author

The major difference seemed to be the addition of flexcn/luna/fun parameters. You can run see the files and run ncdump on yellowstone /glade/p/cesm/cseg/inputdata/lnd/clm2/paramdata/ or I can add the file to the svn input repo so you can download it.

@ckoven
Copy link
Contributor

ckoven commented Apr 13, 2016

But these PFT parameters ought not to really apply to this model, since it currently has neither LUNA nor FUN nor FlexCN in it. So are the new variables even read when ED is active?

@ckoven
Copy link
Contributor

ckoven commented Apr 13, 2016

Maybe an alternate solution is to have two possible sets of PFT files: one for the CN veg model and one for the ED veg model? It seems like this would be required during interfacing anyway (since, e.g. CLM and ACME non-demographic veg models will definitely have different PFT parameters), and would reduce any confusion about which PFT parameters are in use in a given model configuration?

@bandre-ucar
Copy link
Contributor Author

All the parameters are required to be in the file even, if they aren't used.

I think we will definitely want to stop reusing the clm parameter file and create a separate file for ED.

@rosiealice
Copy link
Contributor

seems like the parameter file strategy might be a thing we could discuss
tomorrow, if we have any time left over?

On 13 April 2016 at 12:38, Ben Andre [email protected] wrote:

All the parameters are required to be in the file even, if they aren't
used.

I think we will definitely want to stop reusing the clm parameter file and
create a separate file for ED.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#46 (comment)


Dr Rosie A. Fisher

Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
Boulder, Colorado, 80305
USA.
+1 303-497-1706

http://www.cgd.ucar.edu/staff/rfisher/

@bandre-ucar
Copy link
Contributor Author

bandre-ucar commented Apr 21, 2016

Failing restart tests with the r175 based ed:

FAIL ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_intel.clm-edTest.clm2.h0.nc : test compare clm2.h0 (.base and .rest files) 
FAIL ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_intel.clm-edTest.cpl.hi.nc : test compare cpl.hi (.base and .rest files) 
FAIL ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_intel.clm-edTest : test functionality summary (ERS_test) 

FAIL ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_intel.clm-edTest.clm2.h0.nc : test compare clm2.h0 (.base and .rest files) 
FAIL ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_intel.clm-edTest.cpl.hi.nc : test compare cpl.hi (.base and .rest files) 
FAIL ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_intel.clm-edTest : test functionality summary (ERS_test) 

Bisecting through the history:

r150 - f10 pass
r151 - f10 pass
r152 - f10 pass
r153 - f10 fail
r155 - f10 fail
r158 - f10 fail
r160 - f10 fail
r175 - f10 fail

ed-clm at r153, 56dfb03, introduced the restart failure at f10 and f45.

From the clm short log:

clm4_5_6_r153    sacks 11/17/2015 Fix snow cover fraction bug

r153 is a very small delta. The CanopyHydrology change:

              !======================  FSCA PARAMETERIZATIONS  ======================
              ! fsca parameterization based on *changes* in swe
              ! first compute change from melt during previous time step
-             if(snowmelt(c) >= 0._r8) then
+             if(snowmelt(c) > 0._r8) then

                 smr=min(1._r8,(h2osno(c))/(int_snow(c)))

When this is reverted, i.e. '>' back to '>=', the failing ED tests pass. Have not yet investigated why....

@bandre-ucar
Copy link
Contributor Author

All the variables related to the frac_sno already appear to be on the restart file.

Restart is not bit for bit at a single point in the antarctic....

ED interacts through frac_sno_eff.

@bandre-ucar
Copy link
Contributor Author

As a sanity check, I tried the other r153, 56dfb03, delta from SnowHydrology:

components/clm/src/biogeophys/SnowHydrologyMod.F90
@@ -625,10 +625,12 @@ subroutine SnowCompaction(bounds, num_snowc, filter_snowc, &
                       ddz3 = max(0._r8,min(1._r8,(swe_old(c,j) - wx)/wx))

                       ! 2nd term is delta fsno over fsno, allowing for negative values for ddz3
-                      wsum = sum(h2osoi_liq(c,snl(c)+1:0)+h2osoi_ice(c,snl(c)+1:0))
-                      fsno_melt = 1. - (acos(2.*min(1._r8,wsum/int_snow(c)) - 1._r8)/rpi)**(n_melt(c))
-
-                      ddz3 = ddz3 - max(0._r8,(fsno_melt - frac_sno(c))/frac_sno(c))
+                      if((swe_old(c,j) - wx) > 0._r8) then
+                         wsum = sum(h2osoi_liq(c,snl(c)+1:0)+h2osoi_ice(c,snl(c)+1:0))
+                         fsno_melt = 1. - (acos(2.*min(1._r8,wsum/int_snow(c)) - 1._r8)/rpi)**(n_melt(c))
+                         
+                         ddz3 = ddz3 - max(0._r8,(fsno_melt - frac_sno(c))/frac_sno(c))
+                      endif
                       ddz3 = -1._r8/dtime * ddz3
                    else
                       ddz3 = - 1._r8/dtime * max(0._r8,(frac_iceold(c,j) - fi)/frac_iceold(c,j))

This does NOT allow the exact restart tests to pass.

Testing done with: ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_intel.clm-edTest

@ckoven
Copy link
Contributor

ckoven commented Jun 7, 2016

@bandre-ucar so did this isolation of the code that breaks restart allow you to solve the issue? If not, should we reach out to someone with specific knowledge of the snow code to get this sorted out? thanks-

@bandre-ucar
Copy link
Contributor Author

The problem isn't with snow, it's with ED, either it's interaction with snow or this is just highlighting the generally poor restart capability in ED (see #14 and #43).

I have a work around, so I'm going to use it, and move on. This workaround is in the CLM code, and will be unacceptable to put back into mainline CLM, so this will have to be resolved at some point. Further discussion and work on this issue should take place in #74.

@bandre-ucar
Copy link
Contributor Author

Merging from r175-r180 results in runtime failures.

RUN ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_intel.clm-edTest.06131631-edi 

This seems to happen for all ERS_D tests. There is no useful information in the logs and no core files.

Testing intermediate clm trunk tags:

175 - 06f4619 - runs
176 - f40a5e8 - runs
177 - a2bf304 - runs
178 - 7107a26 - dies at runtime
180 - 6dfd39d - dies at runtime

7107a26 has the following error in the cesm.log file:

   1:Create file ./ed-clm-i46-6dfd39d2.clm2.r.0001-01-06-00000.nc 65536
   1:Abort with message NetCDF: String match to name in use in file in file /glade/scratch/andre/ed-clm-i46-6dfd39d2/bld/intel/mpich2/nodebug/nothreads/pio/src/clib/pio_nc.c at line 393

There is no useful information in the stack trace and no core files.

The last line in the land log file is:

 hist_htapes_wrapup : history tape            1 : no open file to close
 writing restart file ./ed-clm-i46-6dfd39d2.clm2.r.0001-01-06-00000.nc
  for model date = 0001-01-06-00000                

 restFile_open: writing restart dataset at 
 ./ed-clm-i46-6dfd39d2.clm2.r.0001-01-06-00000.nc at nstep =          240

r178 is:

Tag name:  clm4_5_8_r178
 Originator(s):  sacks (Bill Sacks)
 Date: Sun Apr 17 19:28:55 MDT 2016
 One-line Summary: Remove some consistency checks, and merge crop_prog with use_crop in code

There was also a problem with this merge. Need to investigate further if the runtime failure is a result of the merge or r178.

@bandre-ucar
Copy link
Contributor Author

Reran the failing case in debug mode in the debugger. Error is:

Abort with message NetCDF: String match to name in use in file in file /glade/scratch/andre/ed-clm-i46-7107a26a/bld/intel/mpich2/debug/nothreads/pio/src/clib/pio_nc.c at line 393

From the stack, the variable is ED_GDD0_VALUE coming out of accumulMod::accumulrest line 596.
Next TODO item is investigate why ED_GDD0 appears to be added twice.

@ckoven
Copy link
Contributor

ckoven commented Jun 16, 2016

this sounds like a conflict with the phenology refactoring. i deleted any interaction between fates and the accum machinery in that so sounds like the issue is there.

Charlie

Sent from a mobile device.

On Jun 16, 2016, at 4:38 PM, Ben Andre [email protected] wrote:

Reran the failing case in debug mode in the debugger. Error is:

Abort with message NetCDF: String match to name in use in file in file /glade/scratch/andre/ed-clm-i46-7107a26a/bld/intel/mpich2/debug/nothreads/pio/src/clib/pio_nc.c at line 393
From the stack, the variable is ED_GDD0_VALUE coming out of accumulMod::accumulrest line 596.
Next TODO item is investigate why ED_GDD0 appears to be added twice.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@bandre-ucar
Copy link
Contributor Author

Yep, it looks like I didn't handle the merge conflicts correctly in TemperatureType.F90. Running the full test suite on the fixed version of the code.

@bandre-ucar
Copy link
Contributor Author

I've merged the fix from @rosiealice for the ed r153 restart problem into my branch for the clm trunk merge. The current status of the branch is: based on clm-ed master at c3a1f92, up to date with clm trunk r180. All tests pass. Next step is a series of merges to bring it up to the current clm-ed master.

@rosiealice
Copy link
Contributor

Nice job Ben!

On 22 June 2016 at 14:04, Ben Andre [email protected] wrote:

I've merged the fix from @rosiealice https://github.com/rosiealice for
the ed r153 restart problem into my branch for the clm trunk merge. The
current status of the branch is: based on clm-ed master at c3a1f92
c3a1f92,
up to date with clm trunk r180. All tests pass. Next step is a series of
merges to bring it up to the current clm-ed master.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#46 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AMWsQ9s4fIxRsiToDhMGjwkMEjTLAUjSks5qOZVqgaJpZM4IBO7B
.


Dr Rosie A. Fisher

Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
Boulder, Colorado, 80305
USA.
+1 303-497-1706

http://www.cgd.ucar.edu/staff/rfisher/

@bandre-ucar
Copy link
Contributor Author

bandre-ucar commented Jun 22, 2016

clm-ed master commits that need to be merged:

c3a1f92 - done, all tests pass
90c3758 - done, issue template only, no tests run
8740a1a - done, all tests pass, merge conflicts in machines
19fe567 - done, all tests pass
d8a9ee5 - done, all tests pass
c23cf02 - done, all tests pass. conflicts require scientific review of commit eebdf4d by @ckoven
534d152 - done, all tests pass
5d12066 - machines update, single test passed.
763a722 - done, all tests pass
89b8709 - done, all tests pass
584f2a1 - done, all tests pass
0dec8c3 - done, doc only no testing.
f881721 - done, all tests pass
0471ef9 - done, all tests pass
5421da5 - done, all tests pass
1aaba89 - done, all tests pass
94118a5 - done, all tests pass
1fc6811 - done, all tests pass
57c533c - done, all tests pass
18613d1 - done, all tests pass; pio2 issue with r180, resolved by merged branch to clm-r181. columnization issues resolved by @rgknox. (runtime failures in restart tests related to columnization. review of f7c3eee @rgknox)
c0654db - done, introduced new expected failure in tests, see #88
bd4719b - done, all ed and clm_short tests pass
f8e6313 - done, all ed and clm_short tests pass

@ckoven
Copy link
Contributor

ckoven commented Jun 28, 2016

@bandre-ucar,

I'm looking through the commits to eebdf4d as you suggested. a couple things:

eebdf4d#diff-495103cd15f8049be0952b4736d75bf5R2407 -- the logic here ought to be related to use_vertsoilc rather than use_century_decomp. if in SP mode, then none of this ought to be on

http://github.com/NGEET/ed-clm/commit/eebdf4dc5a4a55edefd09a74b95b4597f77d4181#diff-86b7b164f9607a4d86d626947a92af4bR95 -- only the cnveg_carbonstate_type and cnveg_carbonflux_type are actually used in EDBGCDynMod.F90, so you could delete the occurrences of cnveg_state_inst (and actually cnveg_nitrogenstate_type and cnveg_nitrogenflux_type too) from EDBGCDynMod.F90, but agreed that this could be interfaced in a better way.

@ckoven
Copy link
Contributor

ckoven commented Jun 30, 2016

@bandre-ucar I am confused about which options are triggering the error. this is happening in SP mode or in ED mode?

@bandre-ucar
Copy link
Contributor Author

bandre-ucar commented Jun 30, 2016

@ckoven The default conditions used by all the ed tests appear to be ed mode with sp. You can recreate it with:

cd cime/scripts
./create_test -testname ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_intel.clm-edTest -testid junk-r180

The snippet of the lnd_in above came from that test.

@ckoven
Copy link
Contributor

ckoven commented Jun 30, 2016

@bandre-ucar ok so I think the issue is the scripting logic. "ed" ought to be a value for bgc_mode as ED and SP are mutually exclusive possibilities. The old logic may have allowed that since in both cases the soil biogeochemistry is off, but now the soil biogeochemistry is on (or a subset of it anyway) whenever ED is on, so the mutual incompatibility of those is crashing the model. I've made a new commit, f47cd81, which I think sets the switches the way I think they ought to be set.

I tried running the test on yellowstone but am running into other issues -- could you try testing the new commit? thanks

@bandre-ucar
Copy link
Contributor Author

@ckoven ok, testing.

@bandre-ucar
Copy link
Contributor Author

@ckoven You have a typo in the xml file you modified:

$ git log -1
commit f47cd8179421049e7e31cf60cd60b4a2b1626e24
Author: Charlie Koven <[email protected]>
Date:   Thu Jun 30 13:10:44 2016 -0700

    set bgc_mode to 'ed' whenever use_ed = .true. to avoid the old default behavior of bgc_mode = 'sp' when ED was on which doesn't make sense
$ git diff
diff --git a/components/clm/bld/namelist_files/namelist_defaults_clm4_5.xml b/components/clm/bld/namelist_files/namelist_defaults_clm4_5.xml
index 75d2cb6..0a4ab56 100644
--- a/components/clm/bld/namelist_files/namelist_defaults_clm4_5.xml
+++ b/components/clm/bld/namelist_files/namelist_defaults_clm4_5.xml
@@ -1802,7 +1802,7 @@ lnd/clm2/surfdata_map/surfdata_ne120np4_78pfts_simyr1850_c160216.nc</fsurdat>
 <use_ed_spit_fire use_ed=".true.">.true.</use_ed_spit_fire>
 <use_lch4 use_ed=".true.">.false.</use_lch4>
 <use_nitrif_denitrif use_ed=".true.">.false.</use_nitrif_denitrif>
-<bgc_mode use_ed=".true.">ed</<bgc_mode>
+<bgc_mode use_ed=".true.">ed</bgc_mode>


 </namelist_defaults>

Even fixing that, your branch still has the same runtime error with decomp_depth_efolding described above.

@rgknox
Copy link
Contributor

rgknox commented Jul 3, 2016

Hi @bandre-ucar,

I have a build here that I think is passing the restart read phase. Lawrencium has been very troublesome the last week due to IO problems, so its been hard to test. But I'm curious if the restart reads will work for you as well, and if there is any difference with how you implemented your fix.

https://github.com/bandre-ucar/ed-clm/pull/4

@ckoven
Copy link
Contributor

ckoven commented Jul 8, 2016

@ekluzek, not sure how much background @bandre-ucar has given you on this, but the baseline tag that has the problem ought to be 853f006. I've been trying to test it via Ben's suggested fix of:
./create_test -testname ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_intel.clm-edTest -testid junk-r180
I'll paste my original email describing the problem to you below--
Thanks!
Charlie

Erik,

I am trying to fix a scripting issue that has come up in the Ben’s recent merges of the FATES code to the CLM trunk. Basically the issue is that when use_ed is set to .true., the perl logic is somehow setting bgc_mode to “sp”. This doesn’t logically make sense, since ED ought to be mutually exclusive with SP. I think this used to work because ED was effectively a biophysics model only, but now that ED is connected to the soil biogeochemistry code this creates errors. So what I’d like is to define a new bgc_mode option of “ed”. so that whenever ed is on, that is the BGC mode as well.

I tried to do that in the xml file components/clm/bld/namelist_files/namelist_defaults_clm4_5.xml
by adding the line:
<bgc_mode use_ed=".true.">ed</bgc_mode>
and then adding “ed” as a valid value for bgc_mode in components/clm/bld/namelist_files/namelist_definition_clm4_5.xml

and I also did the same in the testing scripts components/clm/bld/test_build_namelist/t/input/namelist_defaults_clm4_5_test.xml and components/clm/bld/test_build_namelist/t/input/namelist_definition_clm4_5_test.xml and

but none of this seems to work; the perl script is still receiving a value of “sp” in components/clm/bld/CLMBuildNamelist.pm. I don’t understand the syntax of the setup_cmdl_bgc subroutine in that script so I don’t get where that info actually comes from. Any chance you could help me get this sorted out?

Thanks,
Charlie

@ekluzek
Copy link
Collaborator

ekluzek commented Jul 8, 2016

@ckoven I have a branch on my NGEET fork that is basically functional. The clarification I want to make is that when ed is on, you also by default want the "bgc" settings right? So you get methane, nitrif-denitrif, century pools, and vertical carbon right? So you also need "use_cn" on, which also turns on some things you don't need like (lightning, and pop-density namelists). But, since we don't have switches for above and below ground processes, it seems that getting the "bgc" switches in addition to "use_ed=T" is what makes sense. I just want to confirm that's correct?

Later at some point we need a way to distinguish the above and below ground processes.

@ckoven
Copy link
Contributor

ckoven commented Jul 8, 2016

@ekluzek thanks. Of the things you mention, we really only want methane, century pools, and vertical carbon when ED is on. Not the nitrif-denitrif option or the fang-fire-model-specific things (lightning, and pop-density) since ED doesn't have an N cycle yet and has its own fire model. I can't see your branch for whatever reason, but unless you changed the fortran, the nitrogen-cycle-specific things should still be off even though the bgc is flagged?

@ekluzek
Copy link
Collaborator

ekluzek commented Jul 8, 2016

I haven't pushed my branch to the upstream main repository, so it's just on my fork right now. But, I'll try to get something that works as you've outlined and then push the branch upstream and let you know it's there.

And I'm just messing with the namelist generation.

Also I thought that methane requires nitrif_denitrif to be on?

@bandre-ucar
Copy link
Contributor Author

As of 6ab0d89 the merge branch is up to date with clm-r181.

@bandre-ucar
Copy link
Contributor Author

bandre-ucar commented Aug 1, 2016

Testing of the c0654db changes into the r16x branch causes a restart failure in the coupler:

    FAIL ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_gnu.clm-edTest.cpl.hi.nc : test 
compare cpl.hi (.base and .rest files) 
    FAIL ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_gnu.clm-edTest : test functionality summary (ERS_test)

Failing fields from TestStatus.log

/glade/scratch/andre/ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_gnu.clm-edTest.07281644-edg/run/ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_gnu.clm-edTest.07281644-edg.cpl.hi.0001-01-06-00000.nc.rest.cprnc.out had the following fields that are NOT b4b  

   RMS l2x_Sl_anidr                     7.2491E-19            NORMALIZED  9.0283E-19
    RMS l2x_Sl_anidf                     7.2491E-19            NORMALIZED  8.8489E-19
    RMS l2x_Sl_tref                      4.4538E-15            NORMALIZED  1.6760E-17
    RMS l2x_Sl_qref                      5.2556E-18            NORMALIZED  1.0570E-15
    RMS l2x_Sl_t                         8.9077E-15            NORMALIZED  3.3675E-17
    RMS l2x_Sl_fv                        3.1606E-16            NORMALIZED  2.2130E-15
    RMS l2x_Sl_ram1                      2.2047E-13            NORMALIZED  4.5475E-16
    RMS l2x_Sl_u10                       2.4357E-15            NORMALIZED  8.2384E-16
    RMS l2x_Fall_swnet                   2.9692E-15            NORMALIZED  6.9590E-17
    RMS l2x_Fall_taux                    1.0439E-16            NORMALIZED  3.5435E-15
    RMS l2x_Fall_tauy                    1.0439E-16            NORMALIZED  3.5435E-15
    RMS l2x_Fall_lat                     3.6596E-13            NORMALIZED  3.0123E-14
    RMS l2x_Fall_sen                     1.8567E-13            NORMALIZED  1.3053E-14
    RMS l2x_Fall_lwup                    6.0869E-14            NORMALIZED  2.0968E-16
    RMS l2x_Fall_evap                    1.4619E-19            NORMALIZED  3.0530E-14
    RMS l2x_Fall_flxdst1                 7.3133E-25            NORMALIZED  6.5983E-16
    RMS l2x_Fall_flxdst2                 3.9265E-24            NORMALIZED  6.5999E-16
    RMS l2x_Fall_flxdst3                 9.2073E-24            NORMALIZED  6.5999E-16
    RMS l2x_Fall_flxdst4                 8.6713E-24            NORMALIZED  6.5986E-16

Note: all tests in the clm_short suite pass for yellowstone intel, gnu, pgi.

I poked at this a bit. Since the merged change incorporates three different bug fixes, I think the sanest way to debug the issues is going to be to back them out one at a time to narrow down the problem. For now I'm going to mark this test as an expected fail.

@rgknox
Copy link
Contributor

rgknox commented Aug 1, 2016

In that last commit we modified the argument in the ice_mask to setFilters(). Yet, it looks like the second argument to setFilters is no longer the ice_mask and now is a "glc_behavior".

EDIT: I am trying to determine if we really need that call to setFilters (as some of the comments imply), that seems the most likely culprit to mess up something in the coupler.

@rgknox
Copy link
Contributor

rgknox commented Aug 1, 2016

also, @bandre-ucar, I'm glad that we caught a new error, but where did the ERS_D_Ld5.f45_g37.ICLM45ED come from?

@bandre-ucar
Copy link
Contributor Author

@rgknox The setFilters call was one of the conflicts, and I resolved it the same way you did in the prototype merge you made last week. I need to look more closely at it.

@bandre-ucar
Copy link
Contributor Author

@rgknox It looks like it's been part of the standard test suite since January.

git blame components/clm/cime_config/testdefs/testlist_clm.xml > tmp.txt

Line 770.

@rgknox
Copy link
Contributor

rgknox commented Aug 1, 2016

wow, I think one of my wires short-circuited, yeah, its been there the whole time. Its been so agreeable up until now.

I looked through setfilters, and it doesn't seem like ED is changing any of the verctors that are dictating what happens in setFilters, so it does not appear to be needed (but I'm probably wrong). ED does change frac_veg_nosno_alb_patch, but that is not called during that setFilters().

@bandre-ucar
Copy link
Contributor Author

@rgknox If you look at the list of changed variables above, do any of them jump out at you as directly related to the changes in c0654db?

@rgknox
Copy link
Contributor

rgknox commented Aug 1, 2016

These seem to be the largest differences:
RMS l2x_Fall_lat 3.6596E-13 NORMALIZED 3.0123E-14
RMS l2x_Fall_sen 1.8567E-13 NORMALIZED 1.3053E-14

Nothing is jumping out at me.

Just to be sure, these are not b4b regressions your showing right? The hard-coded stomatal slope parameters that we no longer use do not match what is in the PFT file. To get b4b regressions, we need to either retroactively change the hard-coded stomatal slope or change the PFT file to match. The differences are minor and the parameter is the quintessential tuning parameter (ie no physical analogue), so whether we use the value of 9 that is hard-coded or the 8 in the file is insignificant until someone starts optimizing (see Chonggang's experiments).

@bandre-ucar
Copy link
Contributor Author

Nope, no baselines, it is just an exact restart issue with info that is being handed to the coupler.

@bandre-ucar
Copy link
Contributor Author

Yellowstone is down on Tuesday 8/2. I checked test results and all gnu and intel passed. All pgi tests fail with a compiler error. Documenting for Wednesday when I have access to pgi again.

CFAIL ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.clm-edNoFire.08011831-edp
CFAIL SMS_D_Mmpi-serial_Ld5.5x5_amazon.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL SMS_Ld5.f19_g16.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL ERS_D_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL ERS_D_Ld5.f45_g37.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL SMS_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL ERI_D_Ld9.f09_g16.ICLM45BGC.yellowstone_pgi.clm-default.08011831-edp
CFAIL ERS_D_Mmpi-serial_Ld5.1x1_brazil.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL ERS_D_Ld5.5x5_amazon.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL SMS_D_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL ERS_D_Ld5.f19_g16.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
CFAIL SMS_D_Ld3.f10_f10.ICLM45BGC.yellowstone_pgi.clm-default.08011831-edp
CFAIL ERS_D_Ld5.f09_g16.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp
mpif90  -c -I. -I/glade/scratch/andre/sharedlibroot.08011831-edp/pgi/mpich2/node
bug/nothreads/include -I/glade/scratch/andre/sharedlibroot.08011831-edp/pgi/mpic
h2/nodebug/nothreads/MCT/noesmf/a1l1r1i1o1g1w1e1/csm_share -I/glade/apps/opt/net
cdf-mpi/4.3.3.1/pgi/15.10/include -I/glade/apps/opt/pnetcdf/1.6.1/pgi/default/in
clude -I/glade/scratch/andre/sharedlibroot.08011831-edp/pgi/mpich2/nodebug/nothr
eads/include -I/glade/p/work/andre/ed/ed-clm-r16x/cime/share/csm_share/shr -I/gl
ade/p/work/andre/ed/ed-clm-r16x/cime/share/csm_share/include -I/glade/p/work/and
re/ed/ed-clm-r16x/cime/share/shr_RandNum/include -I/glade/scratch/andre/sharedli
broot.08011831-edp/pgi/mpich2/nodebug/nothreads/MCT/noesmf/clm/obj -I. -I/glade/
scratch/andre/tests-ed-20160801-1831/SMS_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.cl
m-edTest.08011831-edp/SourceMods/src.clm -I/glade/p/work/andre/ed/ed-clm-r16x/co
mponents/clm/src/main -I/glade/p/work/andre/ed/ed-clm-r16x/components/clm/src/bi
ogeophys -I/glade/p/work/andre/ed/ed-clm-r16x/components/clm/src/biogeochem -I/g
lade/p/work/andre/ed/ed-clm-r16x/components/clm/src/soilbiogeochem -I/glade/p/wo
rk/andre/ed/ed-clm-r16x/components/clm/src/dyn_subgrid -I/glade/p/work/andre/ed/
ed-clm-r16x/components/clm/src/init_interp -I/glade/p/work/andre/ed/ed-clm-r16x/
components/clm/src/ED -I/glade/p/work/andre/ed/ed-clm-r16x/components/clm/src/ED
/main -I/glade/p/work/andre/ed/ed-clm-r16x/components/clm/src/ED/biogeophys -I/g
lade/p/work/andre/ed/ed-clm-r16x/components/clm/src/ED/biogeochem -I/glade/p/wor
k/andre/ed/ed-clm-r16x/components/clm/src/ED/fire -I/glade/p/work/andre/ed/ed-cl
m-r16x/components/clm/src/utils -I/glade/p/work/andre/ed/ed-clm-r16x/components/
clm/src/cpl -I/glade/scratch/andre/SMS_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.clm-
edTest.08011831-edp/bld/lib/include -i4 -gopt  -time -Mextend -byteswapio  -Mflushz -Kieee   -O   -DLINUX -DNDEBUG -DMCT_INTERFACE -DHAVE_MPI -DFORTRANUNDERSCORE -DNO_SHR_VMATH -DNO_R16   -DLINUX -DCPRPGI  -DHAVE_SLASHPROC -Mfree  -DUSE_CONTIGUOUS= /glade/p/work/andre/ed/ed-clm-r16x/components/clm/src/ED/main/FatesInterfaceMod.F90



PGF90-S-0155-Attempt to use private component: numswbands (/glade/p/work/andre/e
d/ed-clm-r16x/components/clm/src/ED/main/FatesInterfaceMod.F90: 165)
PGF90-S-0155-Attempt to use private component: numswbands (/glade/p/work/andre/e
d/ed-clm-r16x/components/clm/src/ED/main/FatesInterfaceMod.F90: 166)
  0 inform,   0 warnings,   2 severes, 0 fatal for allocate_bcin
PGF90-S-0155-Attempt to use private component: numswbands (/glade/p/work/andre/e
d/ed-clm-r16x/components/clm/src/ED/main/FatesInterfaceMod.F90: 267)
PGF90-S-0155-Attempt to use private component: numswbands (/glade/p/work/andre/e
d/ed-clm-r16x/components/clm/src/ED/main/FatesInterfaceMod.F90: 272)
PGF90-S-0155-Attempt to use private component: numswbands (/glade/p/work/andre/ed/ed-clm-r16x/components/clm/src/ED/main/FatesInterfaceMod.F90: 293)
  0 inform,   0 warnings,   3 severes, 0 fatal for set_fates_ctrlparms
/glade/scratch/andre/tests-ed-20160801-1831/SMS_Ld5.f10_f10.ICLM45ED.yellowstone_pgi.clm-edTest.08011831-edp/Tools/Makefile:739: recipe for target 'FatesInterfaceMod.o' failed

@bandre-ucar
Copy link
Contributor Author

yellowstone is back up. The pgi problem was a quick simple fix. Re-running tests.

@bandre-ucar
Copy link
Contributor Author

bandre-ucar commented Aug 3, 2016

Update merge branch to more recent clm trunk tags:

r182 - done, all ed and clm_short tests pass.
r183
r184
r185
r186
r187

@bandre-ucar
Copy link
Contributor Author

closing. new clm tags will be integrated periodically.

@bandre-ucar bandre-ucar removed their assignment Apr 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants