Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical Overflows in carbon accounting #168

Closed
rgknox opened this issue Jan 4, 2017 · 3 comments
Closed

Numerical Overflows in carbon accounting #168

rgknox opened this issue Jan 4, 2017 · 3 comments

Comments

@rgknox
Copy link
Contributor

rgknox commented Jan 4, 2017

Summary of Issue:

There is a check on total fates carbon balance that occurs once each day when vegetation dynamics is called. This happens in ChecksBalancesMod.F90: FATES_BGC_Carbon_Balancecheck().

This is the routine that updates the variable: sites(s)%cbal_err_fates, this is the variable that blows up. It is blowing up from the variable: sites(s)%fates_to_bgc_this_ts, which is blowing up from: currentPatch%root_litter_out(ft). By blowing up, it suddenly has a value of E+248.

This only occurs on the first time a new patch has been created.

currentPatch%root_litter_out() is zero'd and set in cwd_out(), via:

 do ft = 1,numpft_ed
       currentPatch%leaf_litter_out(ft) = max(0.0_r8,currentPatch%leaf_litter(ft)* SF_val_max_decomp(dg_sf) * &
            currentPatch%fragmentation_scaler )
       currentPatch%root_litter_out(ft) = max(0.0_r8,currentPatch%root_litter(ft)* SF_val_max_decomp(dg_sf) * &
            currentPatch%fragmentation_scaler )
       if ( currentPatch%leaf_litter_out(ft)<0.0_r8.or.currentPatch%root_litter_out(ft)<0.0_r8)then
         write(iulog,*) 'root or leaf out is negative?',SF_val_max_decomp(dg_sf),currentPatch%fragmentation_scaler
       endif
    enddo

The order of operations seems fine. cwd_out is called from vegetation_dynamics, and the balance checks are called from EDBGCDynSummary() which is immediately after vegetation_dynamics in clm_drv().

ed_ecosystem_dynamics -> ed_integrate_state_variables -> non_canopy_derivs() -> cwd_out()
EDBGCDynSummary() -> wrap_bgc_summary() -> SummarizeNetFluxes()
EDBGCDynSummary() -> wrap_bgc_summary() ->FATES_BGC_Carbon_BalanceCheck()

Expected behavior and actual behavior:

Steps to reproduce the problem (should include create_newcase or create_test command along with any user_nl or xml changes):

What is the changeset ID of the code, and the machine you are using:

have you modified the code? If so, it must be committed and available for testing:

Screen output or output files showing the error message and context:

@ckoven
Copy link
Contributor

ckoven commented Jan 4, 2017

does it work to just zero the root_litter_out at the new patch creation step here:?
https://github.com/NGEET/ed-clm/blob/master/components/clm/src/ED/biogeochem/EDPatchDynamicsMod.F90#L868

@rgknox
Copy link
Contributor Author

rgknox commented Jan 5, 2017

That seems like a good idea, I will look into that too.
I've also tracked down some NaN's being generated in patch%root_litter during cohort termination. Although its not clear yet why the NaN's are occurring, as components used to calculate the value are all reasonable numbers at the time of failure seem very reasonable.

At the end of EDCohortDynamicsMod.F90:terminate_cohorts(), I added a nan catcher for root_litter:

             currentPatch%root_litter(currentCohort%pft) = currentPatch%root_litter(currentCohort%pft) + currentCohort%n* &
                  (currentCohort%br+currentCohort%bstore)/currentPatch%area 

             if(currentPatch%root_litter(currentCohort%pft).ne.currentPatch%root_litter(currentCohort%pft)) then
                print*,'ROOT LITTER PROB',currentCohort%n,currentCohort%br,currentCohort%bstore,currentPatch%area 
                stop
             end if

And it does catch a NaN, but the output is not unusual:

ROOT LITTER PROB   1.5980184922696298E-006   5.3619791069715643E-002  0.10930903672984571       0.36255274778408553   

UPDATE: This was a red herring, I tracked the NaN back to earlier in the code to subroutine mortality_litter_fluxes(), variable currentpatch%canopy_mortality_root_litter(p)

@rgknox
Copy link
Contributor Author

rgknox commented Jan 6, 2017

The problem was traced all the way back to cohort%npp_acc. This value was not being copied correctly during cohort copying, which is used in various places including spawning newly disturbed patches.
PR with fix coming soon.

bandre-ucar added a commit that referenced this issue Feb 6, 2017
Merge branch 'rgknox-leafnpp-diag-fix'

This pull request mostly deals with some bug fixes to carbon
accounting. @jenniferholm noticed that the diagnostics for leaf_npp
were periodically showing negative values. Through #164 we identified
that this occurred because daily carbon balances during the allocation
sequences were sometimes negative to to high respiration and low
gpp. The model was correctly using storage carbon to "pay" the
negative daily carbon balance and allow maintenance respiration to
occur, it was not however correctly diagnosing this flow. The
accounting of this process is a little complicated because storage can
be used to pay for maintenance respiration as well as maintenance
turnover demand.

During the process of verifying that carbon accounting errors were
low, I triggered spurious values of output variables that triggered
netcdf write errors, this lead to the identification that
cohort%npp_accum was not being properly copied during copying of
cohorts during patch fission.

A fix was needed to lawrencium machine files for its lr2 partition for
serial runs. While these changes are unrelated to carbon accounting,
they are trivial and simple, so I bundled them here.

Fixes: #164, #169, #168 and possibly #154

User interface changes?: no

Code review: requesting @jenniferholm and @serbinsh for evaluation in
their science algorithms.

Testing:
  rgknox:
    Test suite: lawrencium lr3 (baseline) and lawrencium-lr2 (non-baseline) edTest, Rapid Science Check tool (single site multi-decadal analysis)
    Test baseline: 5c5928f
    Test namelist changes: none
    Test answer changes:
    Test summary: all PASS

  andre:
    Test suite: ed - yellowstone gnu, intel, pgi
                     hobart nag
    Test baseline: 30f84d7
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass

    Test suite: clm_short - yellowstone gnu, intel, pgi
    Test baseline: clm4_5_12_r195
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass
@rgknox rgknox closed this as completed Mar 1, 2017
rgknox pushed a commit that referenced this issue Apr 21, 2017
Merge branch 'rgknox-leafnpp-diag-fix'

This pull request mostly deals with some bug fixes to carbon
accounting. @jenniferholm noticed that the diagnostics for leaf_npp
were periodically showing negative values. Through #164 we identified
that this occurred because daily carbon balances during the allocation
sequences were sometimes negative to to high respiration and low
gpp. The model was correctly using storage carbon to "pay" the
negative daily carbon balance and allow maintenance respiration to
occur, it was not however correctly diagnosing this flow. The
accounting of this process is a little complicated because storage can
be used to pay for maintenance respiration as well as maintenance
turnover demand.

During the process of verifying that carbon accounting errors were
low, I triggered spurious values of output variables that triggered
netcdf write errors, this lead to the identification that
cohort%npp_accum was not being properly copied during copying of
cohorts during patch fission.

A fix was needed to lawrencium machine files for its lr2 partition for
serial runs. While these changes are unrelated to carbon accounting,
they are trivial and simple, so I bundled them here.

Fixes: #164, #169, #168 and possibly #154

User interface changes?: no

Code review: requesting @jenniferholm and @serbinsh for evaluation in
their science algorithms.

Testing:
  rgknox:
    Test suite: lawrencium lr3 (baseline) and lawrencium-lr2 (non-baseline) edTest, Rapid Science Check tool (single site multi-decadal analysis)
    Test baseline: 5c5928f
    Test namelist changes: none
    Test answer changes:
    Test summary: all PASS

  andre:
    Test suite: ed - yellowstone gnu, intel, pgi
                     hobart nag
    Test baseline: 30f84d7
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass

    Test suite: clm_short - yellowstone gnu, intel, pgi
    Test baseline: clm4_5_12_r195
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass
rgknox pushed a commit that referenced this issue May 23, 2017
Merge branch 'rgknox-leafnpp-diag-fix'

This pull request mostly deals with some bug fixes to carbon
accounting. @jenniferholm noticed that the diagnostics for leaf_npp
were periodically showing negative values. Through #164 we identified
that this occurred because daily carbon balances during the allocation
sequences were sometimes negative to to high respiration and low
gpp. The model was correctly using storage carbon to "pay" the
negative daily carbon balance and allow maintenance respiration to
occur, it was not however correctly diagnosing this flow. The
accounting of this process is a little complicated because storage can
be used to pay for maintenance respiration as well as maintenance
turnover demand.

During the process of verifying that carbon accounting errors were
low, I triggered spurious values of output variables that triggered
netcdf write errors, this lead to the identification that
cohort%npp_accum was not being properly copied during copying of
cohorts during patch fission.

A fix was needed to lawrencium machine files for its lr2 partition for
serial runs. While these changes are unrelated to carbon accounting,
they are trivial and simple, so I bundled them here.

Fixes: #164, #169, #168 and possibly #154

User interface changes?: no

Code review: requesting @jenniferholm and @serbinsh for evaluation in
their science algorithms.

Testing:
  rgknox:
    Test suite: lawrencium lr3 (baseline) and lawrencium-lr2 (non-baseline) edTest, Rapid Science Check tool (single site multi-decadal analysis)
    Test baseline: 5c5928f
    Test namelist changes: none
    Test answer changes:
    Test summary: all PASS

  andre:
    Test suite: ed - yellowstone gnu, intel, pgi
                     hobart nag
    Test baseline: 30f84d7
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass

    Test suite: clm_short - yellowstone gnu, intel, pgi
    Test baseline: clm4_5_12_r195
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants