Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restarting issue in FATES - bug solved #165

Closed
xuyi02 opened this issue Dec 22, 2016 · 10 comments
Closed

restarting issue in FATES - bug solved #165

xuyi02 opened this issue Dec 22, 2016 · 10 comments

Comments

@xuyi02
Copy link
Contributor

xuyi02 commented Dec 22, 2016

Summary of Issue:
I am using the compset 'ICLM45ED' and want to spin-up FATES for initialization (e.g., 10 years) and then restart it with the initial files for another 10 years.

I tried two different methods to restart FATES:

  1. I just create one new case. I set resubmit =1 in env_rum.xml , and the restarting results look good. However this approach does not allow me to set different frequencies of output in restarting process. E.g., I want to set yearly output for initialization and then set monthly output for restarting. May I turn on resubmit=1 and set two hist_nhtfrq values in the same user_nl_clm to control two frequencies? It does not seem to work.

  2. I create two different cases and restart manually: one is for initialization and the other one is for restarting . I use 'finidat' variable to locate the restarting files for the restarting case. No matter I use the same or different output frequencies in these two cases, the restarting results do not look correct.

In sum, is there a way to restart FATES with different output frequency for initial and restarting processes?

Expected behavior and actual behavior:
The results from the 1st method: (I set 10 years simulation with yearly output and let resubmit=1 for restarting. It was successfully restarting, but I don't know how to change the output frequency for restarting in one simulation )

Here I am just listing TLAI for comparison, while GPP, ED_biomass etc. have the same issue.

TLAI=
0.06422824,
0.07443716,
0.08802477,
0.1114918,
0.138454,
0.1647888,
0.2127144,
0.2666956,
0.3208471,
0.4176061,
0.526724,
0.634316,
0.8275706,
1.031733,
1.161206,
1.501366,
1.824748,
1.769158,
1.990662,
1.968585,
1.533405,
1.393863,
1.314905,
1.081113,
1.032848,
1.026625,
0.9234439 ;

The results from the 2nd method: (obviously, the restarting results are not correct, although I set yearly output for both of initial and restarting cases)
initial case:
TLAI =
0.06422824,
0.07443716,
0.08802477,
0.1114918,
0.138454,
0.1647888,
0.2127144,
0.2666956,
0.3208471,
0.4176061,
0.526724 ;

restarting case:
TLAI =
0.06422824,
0.07675284,
0.09022225,
0.1129024,
0.137653,
0.1605633,
0.2040686,
0.2528495,
0.3001052,
0.3864403,
0.4834436 ;

Steps to reproduce the problem (should include create_newcase or create_test command along with any user_nl or xml changes):

The script for the first method:

COMPILER=intel
export CESM_inputdir="/global/project/projectdirs/m2422/xuyi/inputdata/km67"
export CESM_dir="/global/project/projectdirs/m2422/xuyi/git/ed-clm/cime/scripts/"
export CASENAME="1x1pt_km67"
export CASEDIR="clm4_5_12_r195_ED_spinup_resubmit"
export CASEROOT=/global/project/projectdirs/m2420/test/case/ed_clm/KM67/${CASEDIR}
rm -rf ${CASEROOT}

cd ${CESM_dir}

./create_newcase -case ${CASEROOT} -res CLM_USRDAT -compset ICLM45ED -mach edison -compiler ${COMPILER}

cd ${CASEROOT}

./xmlchange -file env_mach_pes.xml -id NTASKS_ATM -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_LND -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_ICE -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_OCN -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_CPL -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_GLC -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_ROF -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_WAV -val 1
./xmlchange -file env_mach_pes.xml -id MAX_TASKS_PER_NODE -val 1
./xmlchange -file env_mach_pes.xml -id TOTALPES -val 1

./xmlchange -file env_build.xml -id EXEROOT -val ${CASEROOT}/bld
./xmlchange -file env_build.xml -id CESMSCRATCHROOT -val ${CASEROOT}/SCRATCH

./xmlchange -file env_run.xml -id STOP_N -val 10
./xmlchange -file env_run.xml -id STOP_OPTION -val nyears
./xmlchange -file env_run.xml -id RUN_STARTDATE -val '2002-01-01'
./xmlchange -file env_run.xml -id DATM_CLMNCEP_YR_START -val 2002
./xmlchange -file env_run.xml -id DATM_CLMNCEP_YR_END -val 2004
./xmlchange -file env_run.xml -id DIN_LOC_ROOT -val ${CESM_inputdir}
./xmlchange -file env_run.xml -id RUNDIR -val ${CASEROOT}/run
./xmlchange -file env_run.xml -id RESUBMIT -val 1
./xmlchange -file env_run.xml -id DIN_LOC_ROOT_CLMFORC -val ${CESM_inputdir}/atm/datm7
./xmlchange -file env_run.xml -id CLM_USRDAT_NAME -val ${CASENAME}
./xmlchange -file env_run.xml -id ATM_NCPL -val 24
./xmlchange -file env_run.xml -id DOUT_S_SAVE_INTERIM_RESTART_FILES -val TRUE
./xmlchange -file env_run.xml -id DOUT_S -val TRUE
./xmlchange -file env_run.xml -id DOUT_S_ROOT -val ${CASEROOT}/restarts
./xmlchange -file env_run.xml -id RUNDIR -val ${CASEROOT}/run
./xmlchange -file env_run.xml -id PIO_DEBUG_LEVEL -val 0
./xmlchange -file env_run.xml -id PIO_TYPENAME -val 'netcdf'
./xmlchange -file env_run.xml -id DATM_CLMNCEP_YR_ALIGN -val 2002

cat >> user_nl_clm << EOF
fsurdat = '${CESM_inputdir}/lnd/clm2/surfdata_map/surfdata_${CASENAME}_simyr2000.nc'
hist_nhtfrq = -8760

paramfile = '/global/project/projectdirs/m2422/xuyi/inputdata/km67/lnd/clm2/paramdata/clm_params_ed.c160824.nc'
EOF

cat >> user_nl_datm << EOF
taxmode = 'cycle','cycle'

EOF

cd ${CASEROOT}

./case.setup
./case.clean_build
./case.build

The scripts for the second method:
Initial case:

COMPILER=intel

export CESM_inputdir="/global/project/projectdirs/m2422/xuyi/inputdata/km67"
export CESM_dir="/global/project/projectdirs/m2422/xuyi/git/ed-clm/cime/scripts/"
export CASENAME="1x1pt_km67"
export CASEDIR="clm4_5_12_r195_ED_spinup"
export CASEROOT=/global/project/projectdirs/m2420/test/case/ed_clm/KM67/${CASEDIR}

rm -rf ${CASEROOT}

cd ${CESM_dir}

./create_newcase -case ${CASEROOT} -res CLM_USRDAT -compset ICLM45ED -mach edison -compiler ${COMPILER}

cd ${CASEROOT}

./xmlchange -file env_mach_pes.xml -id NTASKS_ATM -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_LND -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_ICE -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_OCN -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_CPL -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_GLC -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_ROF -val 1
./xmlchange -file env_mach_pes.xml -id NTASKS_WAV -val 1
./xmlchange -file env_mach_pes.xml -id MAX_TASKS_PER_NODE -val 1
./xmlchange -file env_mach_pes.xml -id TOTALPES -val 1

./xmlchange -file env_build.xml -id EXEROOT -val ${CASEROOT}/bld
./xmlchange -file env_build.xml -id CESMSCRATCHROOT -val ${CASEROOT}/SCRATCH

./xmlchange -file env_run.xml -id STOP_N -val 10
./xmlchange -file env_run.xml -id STOP_OPTION -val nyears
./xmlchange -file env_run.xml -id RUN_STARTDATE -val '2002-01-01'
./xmlchange -file env_run.xml -id DATM_CLMNCEP_YR_START -val 2002
./xmlchange -file env_run.xml -id DATM_CLMNCEP_YR_END -val 2004
./xmlchange -file env_run.xml -id DIN_LOC_ROOT -val ${CESM_inputdir}
./xmlchange -file env_run.xml -id RUNDIR -val ${CASEROOT}/run
./xmlchange -file env_run.xml -id DIN_LOC_ROOT_CLMFORC -val ${CESM_inputdir}/atm/datm7
./xmlchange -file env_run.xml -id CLM_USRDAT_NAME -val ${CASENAME}
./xmlchange -file env_run.xml -id ATM_NCPL -val 24

./xmlchange -file env_run.xml -id DOUT_S_SAVE_INTERIM_RESTART_FILES -val TRUE
./xmlchange -file env_run.xml -id DOUT_S -val TRUE
./xmlchange -file env_run.xml -id DOUT_S_ROOT -val ${CASEROOT}/restarts
./xmlchange -file env_run.xml -id RUNDIR -val ${CASEROOT}/run
./xmlchange -file env_run.xml -id PIO_DEBUG_LEVEL -val 0
./xmlchange -file env_run.xml -id PIO_TYPENAME -val 'netcdf'
./xmlchange -file env_run.xml -id DATM_CLMNCEP_YR_ALIGN -val 2002

cat >> user_nl_clm << EOF
fsurdat = '${CESM_inputdir}/lnd/clm2/surfdata_map/surfdata_${CASENAME}_simyr2000.nc'
hist_nhtfrq = -8760

paramfile = '/global/project/projectdirs/m2422/xuyi/inputdata/km67/lnd/clm2/paramdata/clm_params_ed.c160824.nc'
EOF

cat >> user_nl_datm << EOF
taxmode = 'cycle','cycle'

EOF

cd ${CASEROOT}

./case.setup
./case.clean_build
./case.build

The scripts for the second method:
the restarting case:
It is the same script as initial case, but the only difference is that I add restarting path in user_nl_clm as follows:
finidat = '${CASEROOT}/CaseSpin/clm4_5_12_r195_ED_spinup.clm2.r.2012-01-01-00000.nc'

What is the changeset ID of the code, and the machine you are using:
Machine: Edison
have you modified the code? If so, it must be committed and available for testing:

Screen output or output files showing the error message and context:

@ckoven
Copy link
Contributor

ckoven commented Jan 5, 2017

Yi-
So I confirm that I can reproduce your bug with the simplified script pasted below. as you report, the soil carbon and related variables (e.g. TOTSOMC) show continuity from one case to the next, but the ed variables (e.g. ED_biomass, or ED_NCOHORTS) do not, and instead reset to bare-ground vegetation at the start of the second case.

Basically what appears to be happening is that when you restart from within a single case, it does what it should, but when you try to daisy-chain a new case starting from a prior case's restart file, the CLM side of things loads the data properly but the FATES side of things doesn't. I haven't sorted out the solution yet but am guessing that the FATES logic for when to read from restart just isn't working correctly.

script below:
`
#!/usr/bin/env bash

SRCDIR=$HOME/clmed/ed-clm/
INPUTDATADIR=$HOME/cesm_input_data
SCRATCHDIR=/global/scratch/cdkoven/
cd ${SRCDIR}
GITHASH=git log -n 1 --format=%h

#SETUP_CASE=bareground
SETUP_CASE=restart

#####################################
if [ "${SETUP_CASE}" == "bareground" ]; then
CASE_NAME=${SETUP_CASE}_${GITHASH}
basedir=$SRCDIR/cime/scripts

cd $basedir
./create_newcase -case ${CASE_NAME} -res 1x1_brazil -compset ICLM45ED -mach lawrencium-lr3 -project ac_ngeet
cd ${CASE_NAME}

./xmlchange -file env_run.xml -id STOP_OPTION -val nyears
./xmlchange -file env_run.xml -id STOP_N -val 10
./xmlchange -file env_run.xml -id REST_N -val 5
./xmlchange -file env_run.xml -id DIN_LOC_ROOT -val $INPUTDATADIR
./xmlchange -file env_build.xml -id EXEROOT -val $SCRATCHDIR/$CASE_NAME/bld
./xmlchange -file env_run.xml -id RUNDIR -val $SCRATCHDIR/$CASE_NAME/run
./xmlchange -file env_run.xml -id DOUT_S_ROOT -val $SCRATCHDIR/archive/$CASE_NAME
./xmlchange -file env_batch.xml -id JOB_WALLCLOCK_TIME -val 2:59:00
./case.setup

cat > user_nl_clm <<EOF

EOF
./case.build
./case.submit
fi

####################################
if [ "${SETUP_CASE}" == "restart" ]; then
CASE_NAME=${SETUP_CASE}_${GITHASH}
basedir=$SRCDIR//cime/scripts

cd $basedir
./create_newcase -case ${CASE_NAME} -res 1x1_brazil -compset ICLM45ED -mach lawrencium-lr3 -project ac_ngeet
cd ${CASE_NAME}

./xmlchange -file env_run.xml -id STOP_OPTION -val nyears
./xmlchange -file env_run.xml -id STOP_N -val 10
./xmlchange -file env_run.xml -id REST_N -val 5
./xmlchange -file env_run.xml -id DIN_LOC_ROOT -val $INPUTDATADIR
./xmlchange -file env_build.xml -id EXEROOT -val $SCRATCHDIR/$CASE_NAME/bld
./xmlchange -file env_run.xml -id RUNDIR -val $SCRATCHDIR/$CASE_NAME/run
./xmlchange -file env_run.xml -id DOUT_S_ROOT -val $SCRATCHDIR/archive/$CASE_NAME
./xmlchange -file env_batch.xml -id JOB_WALLCLOCK_TIME -val 2:59:00
./case.setup

cat > user_nl_clm <<EOF

finidat='/global/home/users/cdkoven/scratch/bareground_5c5928f/run/bareground_5c5928f.clm2.r.0011-01-01-00000.nc'
EOF
./case.build
./case.submit
fi

`

@xuyi02
Copy link
Contributor Author

xuyi02 commented Jan 5, 2017

Hi Charlie,

Yes, that is exactly the issue I met. It is great that you can reproduce the bug. Please let me know if anything I can do to bypass it or fix the bug.

Thanks

@rgknox
Copy link
Contributor

rgknox commented Jan 5, 2017

I am giving this a "bug" tag. Certainly we want this functionality, and I agree with Charlie's assessment, it seems a matter of making sure the fates restart sequence is activated under the right run specifications.

@ckoven
Copy link
Contributor

ckoven commented Jan 5, 2017

it looks like what's happening is that the restart is being read correctly, but then is being overwritten subsequently by a call to force a cold start. the one line change in 8e8dece appears to solve the problem.

@xuyi02
Copy link
Contributor Author

xuyi02 commented Jan 5, 2017

Great ! Thank you Charlie to solve the problem quickly. I will use the new code to have some tests.

@ckoven
Copy link
Contributor

ckoven commented Jan 5, 2017

no worries. i'm running the test suite on that commit right now and assuming it doesn't break the normal flow of things i'll issue a PR later. in the meantime probably best to merge that commit into your branch and go from there.

@xuyi02
Copy link
Contributor Author

xuyi02 commented Jan 5, 2017

Sure, I will. Anyway, it is a good news !

@xuyi02 xuyi02 changed the title restarting issue in FATES restarting issue in FATES - bug solved Jan 10, 2017
@xuyi02
Copy link
Contributor Author

xuyi02 commented Jan 10, 2017

Thanks Charlie to solve the restarting problem in FATES. It is really helpful !

@ckoven
Copy link
Contributor

ckoven commented Jan 10, 2017

awesome, glad that solved it.

bandre-ucar added a commit that referenced this issue Jan 18, 2017
Merge branch 'restart_bugfix'

fix to properly restart when using finidat from a prior case

Fixes: NGT-ED Github issue #165

User interface changes?: No

Code review: ckoven, rgknox

Testing:
  ckoven:
    Test suite: ED test suite on lawrencium-lr3
    Test baseline: don't have any recent baselines, so didn't run; I can't imagine this would be an issue but please double-check when testing on Yellowstone
    Test namelist changes: none
    Test answer changes: ought to be b4b
    Test summary: all smoke, restart, etc tests pass

  andre:
    Test suite: ed - yellowstone gnu, intel, pgi
                     hobart nag
    Test baseline: ed-clm-7f67d19
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass

    Test suite: clm_short - yellowstone gnu, intel, pgi
    Test baseline: clm4_5_12_r195
    Test namelist changes: none
    Test answer changes: bit for bit
    Test summary: all tests pass
@ckoven
Copy link
Contributor

ckoven commented Jan 18, 2017

closed via #171

@ckoven ckoven closed this as completed Jan 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants