-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balance Check failure in fire runs #378
Comments
@ekluzek @rosiealice @rgknox @ckoven Erik - does this look at all similar to the balance check error we saw in the past? |
Some things I'm noticing: Do these runs have grasses with some structural biomass, or are they 0 structure/sap? |
allom_latosa_int = zero.
but had a variant with allom_agb1=zero and allom_agb1=0.0001 (both
variants failed.)
will try a variant with allom_latosa_int set to default and
allom_agb1=0.0001
…------------------------------------------------------------------------
Jacquelyn Shuman, PhD
Terrestrial Sciences Section
National Center for Atmospheric Research
PO Box 3000
Boulder, Colorado 80307-3000
USA
[email protected]
office: +1-303-497-1787
On Sun, May 6, 2018 at 9:45 PM, Ryan Knox ***@***.***> wrote:
Some things I'm noticing:
The radiation solution errors are quite large, so if they are that large,
I would not be surprised if they will generate a NaN, or cause anarchy
anywhere in the code down-stream.
These errors appear to be triggered over and over again in the same patch.
The patch area is e-11 in size, which seems like maybe it should be culled?
In the arrays that are printed out, lai_change, elai, ftweight, etc. I'm
surprised that there are some lai_change values (which is change in light
level, per change in lai, maybe..) where I see no tai. But its hard to tell
why this is so.
I'm wondering if perhaps the "ftweight" variable is being filled
incorrectly, and maybe because there is something special about the
grasses. I can't really tell exactly what is happening though, also the
diagnostic that writes this stuff uses canopy layer 1 for ftweight, but
ncl_p for the others...
Do these runs have grasses with some structural biomass, or are they 0
structure/sap?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVFDhvYsfd6mw0wkl7_aK_Bw9SKixv1zks5tv8NygaJpZM4Tzp8E>
.
|
Run which uses allom_latosa_int = default and allom_agb1=0.0001 for grass also fails in year 5 with fire. (This is a bad case name as it uses default allometry. will fix that...) /glade2/scratch2/jkshuman/Fire0507_Obrienh_Saldaa_Saldal_latosa_int_default_2PFT_1x1_2dba074_f8d7693/run WARNING:: BalanceCheck, solar radiation balance error (W/m2) |
that is the right case name. Obrien Salda is the default allometry... |
@rgknox @rosiealice I did another set of runs for single and 2PFTs for a regional run in South America. Both fails have the same set of solar radiation balance check errors. I include pieces of the cesm.log for the failed runs. general case statement: 1 PFT (no fire) for Grass and Trop Tree completed to year 21 with reasonable biomass and distribution. 2 PFT (Fire) for Trop Tree and Grass failed at year 5. (cesm.log piece after the fire grass log) /glade2/scratch2/jkshuman/Fire_Grass_1x1_2dba074_f8d7693/run (and from further within the cesm.log...) /glade2/scratch2/jkshuman/Fire0507_Obrienh_Saldaa_Saldal_2PFT_1x1_2dba074_f8d7693/run WARNING:: BalanceCheck, solar radiation balance error (W/m2) 529: gridcell longitude = 290.000000000000 |
It is a merge between the memory leak commit and my added crown area history field. Here is a link, but this may not have the memory leak commit. I don't recall if I pushed those changes to my link. Cheyenne is still down. so I can't update at the moment. |
Cheyenne is still down, so putting my link to my crown area history variable branch in this issue as well. The failing runs were on a merge branch created from master branch #372 memory leak fix and my crown area branch (link below). |
I updated the sync branch with the failing branch code. https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync |
Did you try the run with just the new master branch? That way we can see if
the issues are caused by stuff on the branch?
2018-05-11 13:04 GMT-06:00 jkshuman <[email protected]>:
… I updated the sync branch with the failing branch code.
https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMWsQz3aPQrwTkDSmE-ksRjhY2pcpMLkks5txeDUgaJpZM4Tzp8E>
.
--
-----------------------------------------------------------------
Dr Rosie A. Fisher
Staff Scientist
Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
Boulder, Colorado, 80305
USA.
+1 303-497-1706
http://www.cgd.ucar.edu/staff/rfisher/
|
Running 1PFT grass, 1PFT trop tree, and 2PFT all with fire on CLM4.5 (paths below) ./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupported /glade2/scratch2/jkshuman/Fire_Grass_1x1_2dba074_5dda57b |
the crown area stuff is just a history variable, so unlikely to cause this
failure? but can run with master to test that as well.
…------------------------------------------------------------------------
Jacquelyn Shuman, PhD
Terrestrial Sciences Section
National Center for Atmospheric Research
PO Box 3000
Boulder, Colorado 80307-3000
USA
[email protected]
office: +1-303-497-1787
On Fri, May 11, 2018 at 1:10 PM, Rosie Fisher ***@***.***> wrote:
Did you try the run with just the new master branch? That way we can see if
the issues are caused by stuff on the branch?
2018-05-11 13:04 GMT-06:00 jkshuman ***@***.***>:
> I updated the sync branch with the failing branch code.
> https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#378 (comment)>, or
mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AMWsQz3aPQrwTkDSmE-
ksRjhY2pcpMLkks5txeDUgaJpZM4Tzp8E>
> .
>
--
-----------------------------------------------------------------
Dr Rosie A. Fisher
Staff Scientist
Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
<https://maps.google.com/?q=1850+Table+Mesa+Drive+%0D%0ABoulder,+Colorado,+80305+%0D%0AUSA&entry=gmail&source=g>
Boulder, Colorado, 80305
USA.
+1 303-497-1706
http://www.cgd.ucar.edu/staff/rfisher/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVFDhpLxiKBTjX8wIrRZQnV3n-Rx6ZRbks5txeI9gaJpZM4Tzp8E>
.
|
looks like my single site run at: gridcell longitude = 290.000000000000 did not generate the error after 30 years. I will try to look through and see if I added some configuration that was different. Run directory: /glade2/scratch2/rgknox/jkstest-1pt-v0/run Uses this parameter file: /glade/u/home/rgknox/param_file_2PFT_Obrienh_Saldaa_Saldal_05042018.nc |
this was with fire for clm45? |
I noticed this in the parameter file: fates_leaf_xl = 0.1, 0.1, -0.3 This may be fine, it just caught my eye. xl is orientation index, which I think I recall allowing negatives. But we should double check if our formulation does. |
yeah, that parameter seems fine, false alarm |
my runs are a 1 degree regional subset for South America. surface and
domain files here:
/glade2/scratch2/jkshuman/sfcdata
…------------------------------------------------------------------------
Jacquelyn Shuman, PhD
Terrestrial Sciences Section
National Center for Atmospheric Research
PO Box 3000
Boulder, Colorado 80307-3000
USA
[email protected]
office: +1-303-497-1787
On Fri, May 11, 2018 at 1:57 PM, Ryan Knox ***@***.***> wrote:
yeah, that parameter seems fine, false alarm
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVFDhnXNHevVwvGv620Rs7kIJwoRe0liks5txe0rgaJpZM4Tzp8E>
.
|
ok, thanks. New single site run on cheyenne is going, now using spit-fire. My current guess as to what is happening is that we are running into a problem with nigh-zero biomass or leaves, which is the product of fire turning over an all grass patch? Its possible the recent bug fix addressed this, but we will see. |
@rgknox another set of runs going with pull request 382. 1 PFT runs with fire are still going (tree at year 21, grass at year 2 - slow in queue?). 2PFT run (trop tree and grass) failed in year 6. Similar set of errors. BalanceCheckMod.f90 line 543, BalanceCheck, solar radiation balance error. From cesm.log |
I feel like ftweight should not ever be >1, but here it's like 93, 143,
etc. I've got a bunch of slides to do for tomorrow morning still, but
that's the thing that strikes me most about this. Maybe worth checking the
ftweight calculations...
2018-05-14 21:28 GMT-06:00 jkshuman <[email protected]>:
… @rgknox <https://github.com/rgknox> another set of runs going with pull
request 382. 1 PFT runs with fire are still going (tree at year 21, grass
at year 2 - slow in queue?). 2PFT run (trop tree and grass) failed in year
6. Similar set of errors. BalanceCheckMod.f90 line 543, BalanceCheck, solar
radiation balance error.
/glade/scratch/jkshuman/archive/Fire_Obrienh_Saldaa_
Saldal_2PFT_SA1x1_2dba074_0f0c41c/
New location:
gridcell longitude = 305.000000000000
gridcell latitude = -23.0890052356021
From cesm.log
WARNING:: BalanceCheck, solar radiation balance error (W/m2)
235: nstep = 119564
235: errsol = -1.108547849071329E-007
252: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
252: nstep = 119565
252: errsol = -1.065200194716454E-007
0: memory_write: model date = 71029 0 memory = 128919.57 MB (highwater)
101.85 MB (usage) (pe= 0 comps= ATM ESP)
467: trimming patch area - is too big 1.818989403545856E-012
545: trimming patch area - is too big 1.818989403545856E-012
353: trimming patch area - is too big 1.818989403545856E-012
390: trimming patch area - is too big 1.818989403545856E-012
513: trimming patch area - is too big 1.818989403545856E-012
506: trimming patch area - is too big 1.818989403545856E-012
535: trimming patch area - is too big 1.818989403545856E-012
446: trimming patch area - is too big 1.818989403545856E-012
469: trimming patch area - is too big 1.818989403545856E-012
477: trimming patch area - is too big 1.818989403545856E-012
326: trimming patch area - is too big 1.818989403545856E-012
403: trimming patch area - is too big 1.818989403545856E-012
69: trimming patch area - is too big 1.818989403545856E-012
239: trimming patch area - is too big 1.818989403545856E-012
70: trimming patch area - is too big 1.818989403545856E-012
218: trimming patch area - is too big 1.818989403545856E-012
257: trimming patch area - is too big 1.818989403545856E-012
75: trimming patch area - is too big 1.818989403545856E-012
330: trimming patch area - is too big 1.818989403545856E-012
170: trimming patch area - is too big 1.818989403545856E-012
200: trimming patch area - is too big 1.818989403545856E-012
198: trimming patch area - is too big 1.818989403545856E-012
255: trimming patch area - is too big 1.818989403545856E-012
80: trimming patch area - is too big 1.818989403545856E-012
219: trimming patch area - is too big 1.818989403545856E-012
118: trimming patch area - is too big 1.818989403545856E-012
119: trimming patch area - is too big 1.818989403545856E-012
202: >5% Dif Radn consvn error -1.05825538715178 1 2
202: diags 7.96359955072742 -54.6696896639910 38.3301532002546
202: lai_change 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: elai 0.796415587611356 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.234465085324267
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: esai 9.096157657329497E-002 0.000000000000000E+000
3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 9.398288976575598E-003
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: ftweight 1.267302001703947E-002 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: cp 6.405767903805394E-010 1
202: bc_in(s)%albgr_dif_rb(ib) 0.190858817093915
202: rhol 0.100000001490116 0.100000001490116 0.100000001490116
202: 0.449999988079071 0.449999988079071 0.349999994039536
202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
202: 0.000000000000000E+000
202: present 1 0 0
202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
331: Large Dir Radn consvn error 87300236774.1395 1 2
331: diags 35545013833.8197 -1.718567028306606E-002 -793747809365.306
331: 496278040697.993
331: lai_change 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: elai 0.776682425289442 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.227539226615268
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: esai 9.093202219977818E-002 0.000000000000000E+000
3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 9.101385150350671E-003
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: ftweight 0.143517787345916 0.000000000000000E+000
331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000
331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000
331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: cp 2.006325586387992E-009 1
331: bc_in(s)%albgr_dir_rb(ib) 0.220000000000000
331: dif ground absorption error 1 1 -2.968510966153521E+017
331: -2.968510966153521E+017 2 2 1.00000000000000
331: >5% Dif Radn consvn error 4.270016056591235E+016 1 2
331: diags 1.669646990961853E+016 -3.805783289940412E+017
2.374544661398212E+017
331: lai_change 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: elai 0.776682425289442 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.227539226615268
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: esai 9.093202219977818E-002 0.000000000000000E+000
3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 9.101385150350671E-003
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: ftweight 7.801052745940848E-002 0.000000000000000E+000
331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000
331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000
331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: cp 2.006325586387992E-009 1
331: bc_in(s)%albgr_dif_rb(ib) 0.220000000000000
331: rhol 0.100000001490116 0.100000001490116 0.100000001490116
331: 0.449999988079071 0.449999988079071 0.349999994039536
331: ftw 1.00000000000000 0.143517787345916 0.000000000000000E+000
331: 0.856482212654084
331: present 1 0 1
331: CAP 0.143517787345916 0.000000000000000E+000 0.856482212654084
331: there is still error after correction 1.00000000000000 1
331: 2
202: >5% Dif Radn consvn error -1.07307654594231 1 2
202: diags 8.03407121904317 -55.1147964199711 38.6409503555679
202: lai_change 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: elai 0.796415587611356 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.234465085324267
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: esai 9.096157657329497E-002 0.000000000000000E+000
3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 9.398288976575598E-003
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: ftweight 1.267302001703947E-002 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: cp 6.405767903805394E-010 1
202: bc_in(s)%albgr_dif_rb(ib) 0.190744628923151
202: rhol 0.100000001490116 0.100000001490116 0.100000001490116
202: 0.449999988079071 0.449999988079071 0.349999994039536
202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
202: 0.000000000000000E+000
202: present 1 0 0
202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
331: energy balance in canopy 26844 , err= -11.9593662381158
331: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
331: nstep = 119588
331: errsol = -1323.30638249407
331: clm model is stopping - error is greater than 1e-5 (W/m2)
331: fsa = -7.745702732785249E+017
331: fsr = 7.745702732785236E+017
331: forc_solad(1) = 5.51145480639649
331: forc_solad(2) = 8.61256572561393
331: forc_solai(1) = 16.1417364406403
331: forc_solai(2) = 13.0406255214228
331: forc_tot = 43.3063824940735
331: clm model is stopping
331: calling getglobalwrite with decomp_index= 26844 and clmlevel= pft
331: local patch index = 26844
331: global patch index = 9516
331: global column index = 4795
331: global landunit index = 1267
331: global gridcell index = 296
331: gridcell longitude = 305.000000000000
331: gridcell latitude = -23.0890052356021
331: pft type = 1
331: column type = 1
331: landunit type = 1
331: ENDRUN:
331: ERROR in BalanceCheckMod.F90 at line 543
331:
331:
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMWsQ3aV2BBnc0QhUSS28cWX__BsCupcks5tyktDgaJpZM4Tzp8E>
.
--
-----------------------------------------------------------------
Dr Rosie A. Fisher
Staff Scientist
Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
Boulder, Colorado, 80305
USA.
+1 303-497-1706
http://www.cgd.ucar.edu/staff/rfisher/
|
agreed @rosiealice , whatever is wrong, seems to be mediated by ftweight |
I will try to reproduce errors in that last post. @jkshuman , could you post your create_case execution and any environment modifiers? relevant parameters:
|
ok. I have it down to days. it seems to be hung up, but I will restart from this case in debug mode and take a close look at ftweight. Going to use the 2PFT case as the 1 PFT trop tree run made it out to 51 years with fire. seems a grass and fire issue. But may try the grass single PFT as well... /glade2/scratch2/jkshuman/archive/Fire_Grass_SA_1x1_2dba074_0f0c41c/ |
path to restart files for 2PFT case: path to my script for creating the case, and relevant params below: ./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupp ./xmlchange JOB_WALLCLOCK_TIME=1:00 ./xmlchange DATM_MODE=CLMGSWP3v1 ./xmlchange RTM_MODE=NULL ./xmlchange NTASKS_ATM=-1 |
relevant parameters in user_nl_clm are as you have them listed. above. |
I think we need to look at why ftweight is >1. ftweight is the same as canopy_area_profile, which is set on: fates/biogeochem/EDCanopyStructureMod.F90 Line 1337 in e522527
I'd put a write statement there to catch anything going over 1... (or a slightly bigger number, so we don't get all these 10^-12 edge cases), and then print out the c_area, total_canopy_area, etc. if that happens. If you've got the runs down to days it shouldn't take long to find the culprit there. I'd be quite surprised if the ftweight wasn't the culprit here. |
So I was able to trigger an error using just cell -20.09N 305E, and your 2PFT case. The fail happens on April 17th of the 7th year.
|
That’s interesting. My run with rest option set to days is still going into
month 9 day 18 last I checked...
Progress
On Tue, May 15, 2018 at 5:15 PM Ryan Knox ***@***.***> wrote:
So I was able to trigger an error using just cell -20.09N 305E.
FATES Dynamics: 7-04-17
0:cesm.exe 0000000002B8581B dynpatchstateupda 189 dynPatchStateUpdaterMod.F90
``
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVFDhtuqiX7CDhQhLXtv9ugHUwaxA5LYks5ty2GWgaJpZM4Tzp8E>
.
--
Jacquelyn Shuman
Terrestrial Sciences Section
NCAR
|
Got it to day of failure (October 30 year 7). Will kick it off in debug to see if I get the same error as you did @rgknox (similar error as previous, and same location: long = 305 lat = -23.089 |
Here is a print message at the time of fail, this is from subroutine set_new_weights() in dynPatchStateUpdaterMod.F90. The problem is triggered because from the second-to-last step to the last, that bare-ground patch goes to a weight of zero, and somehow its old (previous) area was negative?
|
The interface call wrap_update_hlmfates_dyn(), in clmfates_interfaceMod.F90, is responsible for calculating these weights. We sum up the canopy fractions, via this output boundary condition: this%fates(nc)%bc_out(s)%canopy_fraction_pa(1:npatch) But if this sum is above 1, which it shouldn't be, we will have problems, and calculate a negative bare-patch size. Somehow that is happening in this run. I put a break-point where this endrun used to be: https://github.com/ESCOMP/ctsm/blob/master/src/utils/clmfates_interfaceMod.F90#L830 |
I think one bug is that we are not zero'ing out bc_out(s)%canopy_fraction_pa(1:npatch) in the subroutine that is filling it update_hlm_dynamics() . So if we shrink in total number of patches, we have an extra index that is contributing to total patch area. I will test this. |
actually, that probably wasn't the problem... although zero'ing would had been better, we should be only passing the used indexes in that array... |
Are we sure that the bug is fire specific? Has it shown up in any non-fire
runs @jkshuman?
If is it fire, my suspicion might be to do with how the model handles
completely burned patches.
…On Wed, May 16, 2018, 2:41 PM Ryan Knox ***@***.***> wrote:
actually, that probably wasn't the problem... although zero'ing would had
been better, we should be only passing the used indexes in that array...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMWsQ52GXweSmW15nLzuOdoDmTxjrmiEks5tzI8DgaJpZM4Tzp8E>
.
|
I have been focusing on the fire runs. With the updates to master, and continued testing the fail still occurs for grass and for tree/grass runs with fire. I had a tree fire run which completed through year 51 with reasonable biomass. My 2PFT debug fire run is in queue still, so no update there. With grass the difference is that when it burns, it burns completely. So, this could be a response to the grass flammability specifically and, as @rosiealice said, completely burned patches. |
For the problem I'm currently working through (which may or may not be related to what is ultimately killing Jackie's runs), one problem is that total_canopy_area is exceeding patch area. We currently don't force total_canopy_area to be equal to or less than patch area. I'm also noticing that when we do canopy promotion/demotion, that we have a fairly relaxed tolerance on layer area exeedance of patch area: 1e-4. I'm wondering if grasses give the canopy demotion/promotion scheme a particularly challenging time at layering? Maybe in this specific case we are left with not-so precise canopy area, which is creating weirdness? |
Here is an error log that I think corroborates with the ftweight issue. During leaf_area_profile(), we construct several canopy-layer x pft x leaf-layer arrays. cpatch%canopy_area_profile(cl,ft,iv) is converted directly into ftweight. We have a few checks in the scheme, which can be switched on, one of which fails gracefully, if canopy_area_profile exceeds 1.0 for any given layer.
In this case, we have a few cohorts contributing crown area to the offending layer, layer 1. Layer 1 is also the top layer, and it should be assumed there is an understory layer also. The cohorts appear to be normal, no nans, no garbage values... Note that the area fraction of the last cohort is 130% of the area. I'm not sure why the other cohorts are sharing the top layer (cl==1) with it, if this cohort, which is the largest, is filling that layer completely. This is particularly strange/wrong because we have grasses sharing that layer with a couple of 5 cm cohorts. I'm wondering if this is a precision problem, as indicated in a post above. The area on this patch is very small, but large enough to keep. Although, the promotion/demotion precision is about 4 orders of magnitude larger than the size of the patch... |
New runs using 1) rgknox promotion/demotion updates PR 388, 2) updated API 4.0.0, 3) updated CTSM changes. Two runs: one using clm45 or clm5 with 2PFTs (TropTree and Grass) and active fire. clm45 completed to year 63 and still running, in queue at the moment. /glade2/scratch2/jkshuman/archive/Fire_rgknox_area_fixes_clm45_2PFT_1x1_692ba82_992e968/lnd/hist clm5 failed in year 6 with error in EdPatchDynamicsMod.F90 associated with high fire area and patch trimming. from cesm.log |
@jkshuman , that new fail is an error check that I put into that branch you are currently testing. What happened is that the model determined that the total patch area exceeded 10,000 m2, and so it simply removes the excess from one of it's patches. But, we have been removing it from the oldest patch. HOwever, up until now, we have never checked to see if that patch has the area to donate. This can be solved by removing the area from the largest patch, instead of the oldest patch. I will make a correction and update the branch. |
hold a moment before testing though, it needs a quick tweak, forgot to declare "nearzero" |
HI Ryan,
Thanks for this. Should we have a call, or hold off until the tests go?
2018-06-06 12:51 GMT-06:00 Ryan Knox <[email protected]>:
… hold a moment before testing though, it needs a quick tweak, forgot to
declare "nearzero"
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMWsQ2X_iK53oxccs2RDVPunqglPUVxWks5t6CTEgaJpZM4Tzp8E>
.
--
-----------------------------------------------------------------
Dr Rosie A. Fisher
Staff Scientist
Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
Boulder, Colorado, 80305
USA.
+1 303-497-1706
http://www.cgd.ucar.edu/staff/rfisher/
|
@jkshuman @rosiealice and I had a review and discussion of changes in PR #388. Added some updates to code per our discussion. @jkshuman I'm going to pass it through the regression tests now. |
Revising this to correct my mistaken runs from earlier. Confirmed that the branch code pulled in the correct changes from rgknox repo. clm5: /glade/scratch/jkshuman/archive/Fire_rgknox_areafixes_0607_2PFT_1x1_fdce2b2_26542ea/ |
Runs are up to year 92 for clm5 and year 98 for clm45. I am going to call this closed, and open a new issue if anything else comes up as the code has diverged since opening this... branch details for ctsm and fates below. fates git log details: ctsm git log details: |
Great !
Le ven. 8 juin 2018 à 13:25, jkshuman <[email protected]> a écrit :
… Runs are up to year 92 for clm5 and year 98 for clm45. I am going to call
this closed, and open a new issue if anything else comes up as the code has
diverged since opening this...
To summarize: fixes included pull requests PR382 and PR388 and @rgknox
<https://github.com/rgknox> fixes in repo branches for fates and ctsm.
ctsm branch from rgknox_ctsm_repo-protectbaresoilfrac
fates branch from rgknox-area-fix merged with master sci.1.14.0_api.4.0.0
branch details for ctsm and fates below.
fates git log details:
26542ea (HEAD, rgknox-areafix-0607_api4.0.0) Merge branch
'rgknox-area-fixes' into rgknox-areafix-0607_api4.0.0
ce689da (rgknox-area-fixes) Merge branch 'rgknox-area-fixes' of
https://github.com/rgknox/fates into rgknox-area-fixes
658064e
<658064e>
(rgknox_repo/rgknox-area-fixes) Updated some comments, added back
protections on patch canopy areas exceeding 1 during the output boundary
condition preparations.
c357399
<c357399>
Merge branch 'rgknox-area-fixes' of github.com:rgknox/fates into
rgknox-area-fixes
e85b681
<e85b681>
Fixed area checking logic on their sum to 10k
0f2003b Merge remote-tracking branch 'rgknox_repo/rgknox-area-fixes' into
rgknox-area-fixes
34bfcdb
<34bfcdb>
Resolved conflict in EDCanopyStructureMod, used HEAD over master
5e92e69 (master) Merge remote-tracking branch 'ngeet_repo/master'
14aeb4f
<14aeb4f>
(tag: sci.1.14.0_api.4.0.0, ngeet_repo/master) Merge pull request #381
<#381> from
rgknox/rgknox-soildepth-clm5
ctsm git log details:
fdce2b2 (HEAD, rgknox_ctsm_repo/rgknox-fates-protectbaresoilfrac,
rgknox-fates-protectbaresoilfrac,
fates_next_api_rgknox_protectbaresoilfrac) Protected fates calculation of
bare-soil area to not go below 0
692ba82 (origin/fates_next_api, fates_next_api) Merge pull request #375
<#375> from
rgknox/rgknox-fates-varsoildepth
1cdd0e6 Merge pull request #390
<#390> from ckoven/fateshistdims
8eb90b1 (rgknox_ctsm_repo/rgknox-fates-varsoildepth) Changed a 1.0 r4 to r8
e9b7b68 Updating fates external to sci.1.14.0_api.4.0.0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#378 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMWsQzzI16Bn62hSpUdkCJ22Ewlve8S3ks5t6s-5gaJpZM4Tzp8E>
.
--
-----------------------------------------------------------------
Dr Rosie A. Fisher
Staff Scientist
Terrestrial Sciences Section
Climate and Global Dynamics
National Center for Atmospheric Research
1850 Table Mesa Drive
Boulder, Colorado, 80305
USA.
+1 303-497-1706
http://www.cgd.ucar.edu/staff/rfisher/
|
Getting a fail in fire runs. Seems to be due to a Balance Check. This happens in both CLM45 runs and CLM5 runs at year 5 with 2PFTs (Trop tree and Grass). Non-fire runs haven't failed through year 10, but will resubmit longer.
ctsm git hash: 2dba074 fates git hash: f8d7693
Here is the create case statement:
./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupported
from within cesm.log (and end of cesm.log below)
396: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
396: nstep = 96934
396: errsol = -1.031027636599902E-007
529: Large Dir Radn consvn error 87346.4733653322 1 2
529: diags 46218.1932574409 -0.338494232152740 589450.614042712
529: -394259.718697869
529: lai_change 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002
529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002
529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: ftweight 1.00000000000000 0.000000000000000E+000
529: 0.000000000000000E+000 1.00000000000000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: cp 9.580078716659667E-011 1
529: bc_in(s)%albgr_dir_rb(ib) 0.557730205770928
529: >5% Dif Radn consvn error -2474470293.77894 1 2
529: diags 639144447.809849 -10366553911.8306 6420139512.41898
529: lai_change 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002
529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002
529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: ftweight 0.000000000000000E+000 0.000000000000000E+000
529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000
529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000
529: 31.0465442101942 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: cp 9.580078716659667E-011 1
529: bc_in(s)%albgr_dif_rb(ib) 0.557730205770928
529: rhol 0.100000001490116 0.100000001490116 0.100000001490116
529: 0.449999988079071 0.449999988079071 0.349999994039536
529: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
529: 0.000000000000000E+000
529: present 1 0 0
529: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
465: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
465: nstep = 96935
465: errsol = -1.048202307174506E-007
433: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
433: nstep = 96935
433: errsol = -1.017730255625793E-007
358: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
358: nstep = 96936
358: errsol = -1.278503987123258E-007
432: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
432: nstep = 96936
432: errsol = -1.040576194100140E-007
431: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
431: nstep = 96936
431: errsol = -1.129041606873216E-007
466: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
466: nstep = 96936
466: errsol = -1.248336616299639E-007
433: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
433: nstep = 96936
433: errsol = -1.003071474769968E-007
529: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
529: nstep = 96936
529: errsol = 1.383552742595384E-005
529: clm model is stopping - error is greater than 1e-5 (W/m2)
529: fsa = 12787101170.2958
529: fsr = -12787101148.9356
529: forc_solad(1) = 2.30644280577964
529: forc_solad(2) = 3.71261017842798
529: forc_solai(1) = 8.37364785641270
529: forc_solai(2) = 6.96748048376436
529: forc_tot = 21.3601813243847
529: clm model is stopping
529: calling getglobalwrite with decomp_index= 39670 and clmlevel= pft
529: local patch index = 39670
529: global patch index = 15897
529: global column index = 8008
529: global landunit index = 2104
529: global gridcell index = 494
529: gridcell longitude = 290.000000000000
529: gridcell latitude = -15.5497382198953
529: pft type = 1
529: column type = 1
529: landunit type = 1
529: ENDRUN:
529: ERROR in BalanceCheckMod.F90 at line 543
396: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
396: nstep = 96934
396: errsol = -1.031027636599902E-007
529: Large Dir Radn consvn error 87346.4733653322 1 2
529: diags 46218.1932574409 -0.338494232152740 589450.614042712
529: -394259.718697869
529: lai_change 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002
529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002
529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: ftweight 1.00000000000000 0.000000000000000E+000
529: 0.000000000000000E+000 1.00000000000000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: cp 9.580078716659667E-011 1
529: bc_in(s)%albgr_dir_rb(ib) 0.557730205770928
529: >5% Dif Radn consvn error -2474470293.77894 1 2
529: diags 639144447.809849 -10366553911.8306 6420139512.41898
529: lai_change 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002
529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002
529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
529: ftweight 0.000000000000000E+000 0.000000000000000E+000
529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000
529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000
529: 31.0465442101942 0.000000000000000E+000 0.000000000000000E+000
529: 0.000000000000000E+000
529: cp 9.580078716659667E-011 1
529: bc_in(s)%albgr_dif_rb(ib) 0.557730205770928
529: rhol 0.100000001490116 0.100000001490116 0.100000001490116
529: 0.449999988079071 0.449999988079071 0.349999994039536
529: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
529: 0.000000000000000E+000
529: present 1 0 0
529: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
465: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
465: nstep = 96935
465: errsol = -1.048202307174506E-007
433: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
433: nstep = 96935
433: errsol = -1.017730255625793E-007
358: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
358: nstep = 96936
358: errsol = -1.278503987123258E-007
432: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
432: nstep = 96936
432: errsol = -1.040576194100140E-007
431: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
431: nstep = 96936
431: errsol = -1.129041606873216E-007
466: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
466: nstep = 96936
466: errsol = -1.248336616299639E-007
433: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
433: nstep = 96936
433: errsol = -1.003071474769968E-007
529: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
529: nstep = 96936
529: errsol = 1.383552742595384E-005
529: clm model is stopping - error is greater than 1e-5 (W/m2)
529: fsa = 12787101170.2958
529: fsr = -12787101148.9356
529: forc_solad(1) = 2.30644280577964
529: forc_solad(2) = 3.71261017842798
529: forc_solai(1) = 8.37364785641270
529: forc_solai(2) = 6.96748048376436
529: forc_tot = 21.3601813243847
529: clm model is stopping
529: calling getglobalwrite with decomp_index= 39670 and clmlevel= pft
529: local patch index = 39670
529: global patch index = 15897
529: global column index = 8008
529: global landunit index = 2104
529: global gridcell index = 494
529: gridcell longitude = 290.000000000000
529: gridcell latitude = -15.5497382198953
529: pft type = 1
529: column type = 1
529: landunit type = 1
529: ENDRUN:
529: ERROR in BalanceCheckMod.F90 at line 543
529:
529:
529:
529:
529:
529: ERROR: Unknown error submitted to shr_abort_abort.
413: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
413: nstep = 96936
413: errsol = -1.288894111439731E-007
397: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
397: nstep = 96937
397: errsol = -1.022812625706138E-007
319: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
319: nstep = 96937
319: errsol = -1.036731305248395E-007
395: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
395: nstep = 96937
395: errsol = -1.211479911944480E-007
432: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
432: nstep = 96937
432: errsol = -1.264885440832586E-007
464: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
464: nstep = 96937
464: errsol = -1.101450379792368E-007
431: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
431: nstep = 96937
431: errsol = -1.387476800118748E-007
433: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
433: nstep = 96937
433: errsol = -1.261905708815902E-007
529:Image PC Routine Line Source
529:cesm.exe 0000000001237DAD Unknown Unknown Unknown
529:cesm.exe 0000000000D1B432 shr_abort_mod_mp_ 114 shr_abort_mod.F90
529:cesm.exe 0000000000503CD5 abortutils_mp_end 77 abortutils.F90
529:cesm.exe 0000000000677E2D balancecheckmod_m 543 BalanceCheckMod.F90
529:cesm.exe 000000000050AF77 clm_driver_mp_clm 924 clm_driver.F90
529:cesm.exe 00000000004F9516 lnd_comp_mct_mp_l 451 lnd_comp_mct.F90
529:cesm.exe 0000000000430E14 component_mod_mp_ 688 component_mod.F90
529:cesm.exe 0000000000417D59 cime_comp_mod_mp_ 2652 cime_comp_mod.F90
529:cesm.exe 0000000000430B3D MAIN__ 68 cime_driver.F90
529:cesm.exe 0000000000415C5E Unknown Unknown Unknown
529:libc-2.19.so 00002AAAB190AB25 libc_start_main Unknown Unknown
529:cesm.exe 0000000000415B69 Unknown Unknown Unknown
529:MPT ERROR: Rank 529(g:529) is aborting with error code 1001.
529: Process ID: 53637, Host: r12i2n18, Program: /glade2/scratch2/jkshuman/Fire0504_Obrienh_Saldaa_Saldal_agb1zero_2PFT_1x1_2dba074_f8d7693/bld/cesm.exe
529: MPT Version: SGI MPT 2.15 12/18/16 02:58:06
529:
529:MPT: --------stack traceback-------
0: memory_write: model date = 60715 0 memory = 65749.16 MB (highwater) 102.04 MB (usage) (pe= 0 comps= ATM ESP)
529:MPT: Attaching to program: /proc/53637/exe, process 53637
529:MPT: done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=3d290be00d48b823d3b71df2249e80d881bc473d"
529:MPT: (no debugging symbols found)...done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=5409c48fdb15e90649c1407e444fbe31d6dc8ec1"
529:MPT: (no debugging symbols found)...done.
529:MPT: [Thread debugging using libthread_db enabled]
529:MPT: Using host libthread_db library "/glade/u/apps/ch/os/lib64/libthread_db.so.1".
529:MPT: Try: zypper install -C "debuginfo(build-id)=e97cfdb062d6f0c41073f2109a7605d0ae991c03"
529:MPT: (no debugging symbols found)...done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=f43d7754940a14ffe3d9bd8fc9472ffbbfead544"
529:MPT: (no debugging symbols found)...done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=0ea764119690f32c98faae9a63a73f35ed8b1099"
529:MPT: (no debugging symbols found)...done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=15916519d9dbaea26ec88427460b4cedb9c0a6ab"
529:MPT: (no debugging symbols found)...done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=79264652a62453da222372a430cd9351d4bbcbde"
529:MPT: (no debugging symbols found)...done.
529:MPT: Try: zypper install -C "debuginfo(build-id)=68682e9ac223d269cbecb94315fcec5e16b32bfb"
529:MPT: (no debugging symbols found)...done.
529:MPT: 0x00002aaaafac141c in waitpid () from /glade/u/apps/ch/os/lib64/libpthread.so.0
529:MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.19-35.1.x86_64
529:MPT: (gdb) #0 0x00002aaaafac141c in waitpid ()
529:MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0
529:MPT: #1 0x00002aaab16215d6 in mpi_sgi_system (
529:MPT: #2 MPI_SGI_stacktraceback (
529:MPT: header=header@entry=0x7ffffffeeb70 "MPT ERROR: Rank 529(g:529) is aborting with error code 1001.\n\tProcess ID: 53637, Host: r12i2n18, Program: /glade2/scratch2/jkshuman/Fire0504_Obrienh_Saldaa_Saldal_agb1zero_2PFT_1x1_2dba074_f8d7693/bld"...) at sig.c:339
529:MPT: #3 0x00002aaab1574d6f in print_traceback (ecode=ecode@entry=1001)
529:MPT: at abort.c:227
529:MPT: #4 0x00002aaab1574fda in PMPI_Abort (comm=, errorcode=1001)
529:MPT: at abort.c:66
529:MPT: #5 0x00002aaab157528d in pmpi_abort ()
529:MPT: from /opt/sgi/mpt/mpt-2.15/lib/libmpi.so
529:MPT: #6 0x0000000000e191a9 in shr_mpi_mod_mp_shr_mpi_abort_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/share/util/shr_mpi_mod.F90:2132
529:MPT: #7 0x0000000000d1b4d8 in shr_abort_mod_mp_shr_abort_abort_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/share/util/shr_abort_mod.F90:69
529:MPT: #8 0x0000000000503cd5 in abortutils_mp_endrun_globalindex_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/main/abortutils.F90:77
529:MPT: #9 0x0000000000677e2d in balancecheckmod_mp_balancecheck_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/biogeophys/BalanceCheckMod.F90:543
529:MPT: #10 0x000000000050af77 in clm_driver_mp_clm_drv_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/main/clm_driver.F90:924
529:MPT: #11 0x00000000004f9516 in lnd_comp_mct_mp_lnd_run_mct_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/cpl/lnd_comp_mct.F90:451
529:MPT: #12 0x0000000000430e14 in component_mod_mp_component_run_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/drivers/mct/main/component_mod.F90:688
529:MPT: #13 0x0000000000417d59 in cime_comp_mod_mp_cime_run_ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/drivers/mct/main/cime_comp_mod.F90:2652
529:MPT: #14 0x0000000000430b3d in MAIN__ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/drivers/mct/main/cime_driver.F90:68
529:MPT: #15 0x0000000000415c5e in main ()
529:MPT: (gdb) A debugging session is active.
529:MPT:
529:MPT: Inferior 1 [process 53637] will be detached.
529:MPT:
529:MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
529:MPT: Detaching from program: /proc/53637/exe, process 53637
529:
529:MPT: -----stack traceback ends-----
-1:MPT ERROR: MPI_COMM_WORLD rank 529 has terminated without calling MPI_Finalize()
-1: aborting job
The text was updated successfully, but these errors were encountered: