Skip to content

Commit

Permalink
RRFS debug & 2threads variants fixed plus many boundary condition bugs (
Browse files Browse the repository at this point in the history
#1437)

* fixes for rrfs debug tests (uninitialized memory in fv_regional_bc and module_bl_mynn)

* rrfs 13km debug tests

* smoke bug fixes for restart

* RRFS tests, but smoke takes too long due to 2hr wallclock limit, needed for restart test

* remove smoke test variants

* remove workarounds and fix remaining known bugs in ps_reg

* a few more surface pressure bug fixes; now the test case runs in debug mode

* update conus13km test list

* workarounds and bug fixes from gnu compiler testing

* atmos_cubed_sphere fixes&tweaks; ccpp/physics fix for precision issue that fails gfortran -DDEBUG=ON

* 120s timestep for conus13km tests

* atmos_cubed_sphere: simplify comments and explain snan

* move task calculations to compute_petbounds_and_tasks in rt_utils.sh; call it from rt.sh

* disable conus13km decomp and restart tests that are known to not match the control

* hera.gnu tests pass, except conus13km decomp and restart which are expected to fail

* move sanity checks to lsm_ruc and add "snow on ice" check

* use i-1 & j-1 for two-point averages, when available

* hera.gnu tests pass against new baseline after atmos_cubed_sphere i-j change

* Replace many changes with atmos_cubed_sphere PR #220

* update stochastic_physics url

Co-authored-by: JONG KIM <[email protected]>
Co-authored-by: Brian Curtis <[email protected]>
  • Loading branch information
3 people authored Oct 17, 2022
1 parent 2539086 commit 87c8ea9
Show file tree
Hide file tree
Showing 29 changed files with 6,579 additions and 5,264 deletions.
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
branch = dev/ufs-weather-model
[submodule "stochastic_physics"]
path = stochastic_physics
url = https://github.com/noaa-psd/stochastic_physics
url = https://github.com/NOAA-PSL/stochastic_physics
branch = master
[submodule "CMakeModules"]
path = CMakeModules
Expand Down
2 changes: 1 addition & 1 deletion FV3
Submodule FV3 updated 2 files
+1 −1 atmos_cubed_sphere
+1 −1 ccpp/physics
582 changes: 313 additions & 269 deletions tests/RegressionTests_cheyenne.gnu.log

Large diffs are not rendered by default.

1,640 changes: 850 additions & 790 deletions tests/RegressionTests_cheyenne.intel.log

Large diffs are not rendered by default.

1,650 changes: 855 additions & 795 deletions tests/RegressionTests_gaea.intel.log

Large diffs are not rendered by default.

586 changes: 315 additions & 271 deletions tests/RegressionTests_hera.gnu.log

Large diffs are not rendered by default.

1,670 changes: 865 additions & 805 deletions tests/RegressionTests_hera.intel.log

Large diffs are not rendered by default.

1,694 changes: 849 additions & 845 deletions tests/RegressionTests_jet.intel.log

Large diffs are not rendered by default.

1,684 changes: 872 additions & 812 deletions tests/RegressionTests_orion.intel.log

Large diffs are not rendered by default.

1,250 changes: 655 additions & 595 deletions tests/RegressionTests_wcoss2.intel.log

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions tests/default_vars.sh
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,7 @@ export NSSL_INVERTCCN=.true.

# Smoke
export RRFS_SMOKE=.false.
export RRFS_RESTART=NO
export SEAS_OPT=2

# GWD
Expand Down
26 changes: 25 additions & 1 deletion tests/fv3_conf/rrfs_warm_run.IN
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,31 @@ mkdir INPUT RESTART

OPNREQ_TEST=${OPNREQ_TEST:-false}
SUFFIX=${RT_SUFFIX}
cp -r @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/* INPUT/

if [[ "${RRFS_RESTART:-NO}" == YES ]] ; then
# cp -r ../${DEP_RUN}${SUFFIX}/RESTART/${RESTART_FILE_PREFIX}.* ./INPUT
# rm -f INPUT/fv_core.res.*
# rm -f INPUT/fv_srf_wnd.res.*
# rm -f INPUT/fv_tracer.res.*
# rm -f INPUT/phy_data.*
# rm -f INPUT/sfc_data.*
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/grid_spec.nc INPUT/.
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/*_grid.tile*.nc INPUT/.
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/oro_data*.nc INPUT/.
for RFILE in ../${DEP_RUN}${SUFFIX}/RESTART/${RESTART_FILE_PREFIX}.*; do
[ -e $RFILE ] || exit 1
RFILE_OLD=$(basename $RFILE)
RFILE_NEW="${RFILE_OLD//${RESTART_FILE_PREFIX}./}"
cp $RFILE "INPUT/$RFILE_NEW"
done
for x in emi_data.nc SMOKE_GBBEPx_data.nc dust12m_data.nc gfs_ctrl.nc gfs_data.nc \
grid.tile7.halo4.nc ; do
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/$x INPUT/.
done
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/gfs_bndy.* INPUT/.
else
cp -r @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/* INPUT/
fi

for x in global_glacier.2x2.grb global_h2oprdlos.f77 global_maxice.2x2.grb \
global_o3prdlos.f77 global_snoclim.1.875.grb global_zorclim.1x1.grb \
Expand Down
1 change: 1 addition & 0 deletions tests/parm/model_configure_rrfs_conus13km.IN
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ start_hour: @[SHOUR]
start_minute: 0
start_second: 0
nhours_fcst: @[FHMAX]
fhrot: @[FHROT]

dt_atmos: @[DT_ATMOS]
calendar: 'julian'
Expand Down
16 changes: 15 additions & 1 deletion tests/rt.conf
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,19 @@ RUN | rrfs_v1nssl
RUN | rrfs_v1nssl_nohailnoccn | | fv3 |

RUN | rrfs_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm | | fv3 |

# These do not match the control yet:
# RUN | rrfs_conus13km_hrrr_warm_decomp | | |
# RUN | rrfs_conus13km_radar_tten_warm_decomp | | |

RUN | rrfs_conus13km_hrrr_warm_2threads | | |
RUN | rrfs_conus13km_radar_tten_warm_2threads | | |

# These do not match the control yet:
# RUN | rrfs_conus13km_hrrr_warm_restart | | | rrfs_conus13km_hrrr_warm
# RUN | rrfs_conus13km_radar_tten_warm_restart | | | rrfs_conus13km_radar_tten_warm

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16_csawmg,FV3_GFS_v16_ugwpv1,FV3_GFS_v16_ras,FV3_GFS_v16_noahmp | | fv3 |
RUN | control_csawmg | - gaea.intel | fv3 |
Expand All @@ -121,6 +132,9 @@ RUN | control_wam

COMPILE | -DAPP=ATM -DDEBUG=ON -D32BIT=ON | | fv3 |

RUN | rrfs_conus13km_hrrr_warm_debug | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm_debug | | fv3 |

RUN | control_debug | | fv3 |
RUN | control_2threads_debug | | |
RUN | control_CubedSphereGrid_debug | | fv3 |
Expand Down
4 changes: 3 additions & 1 deletion tests/rt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ if [[ $TESTS_FILE =~ '35d' ]] || [[ $TESTS_FILE =~ 'weekly' ]]; then
TEST_35D=true
fi

BL_DATE=20221007
BL_DATE=20221012

RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-${BL_DATE}/${RT_COMPILER^^}}

Expand Down Expand Up @@ -715,6 +715,8 @@ EOF
(
source ${PATHRT}/tests/$TEST_NAME

compute_petbounds_and_tasks

TPN=$(( TPN / THRD ))
NODES=$(( TASKS / TPN ))
if (( NODES * TPN < TASKS )); then
Expand Down
16 changes: 13 additions & 3 deletions tests/rt_gnu.conf
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,16 @@ RUN | hrrr_control_2threads
RUN | hrrr_control_decomp | | |
RUN | hrrr_control_restart | | | hrrr_control
RUN | rrfs_v1beta | | fv3 |
RUN | rrfs_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | fv3 |

RUN | rrfs_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | fv3 |

RUN | rrfs_conus13km_radar_tten_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm_2threads | | |

# These two are known to not match the control:
#RUN | rrfs_conus13km_radar_tten_warm_decomp | | |
#RUN | rrfs_conus13km_radar_tten_warm_restart | | | rrfs_conus13km_radar_tten_warm

##################################################################################################################################################################
# CCPP DEBUG tests #
Expand All @@ -48,6 +55,9 @@ RUN | control_ras_debug
RUN | control_stochy_debug | | fv3 |
RUN | control_debug_p8 | | fv3 |

RUN | rrfs_conus13km_hrrr_warm_debug | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm_debug | | fv3 |

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16_fv3wam -D32BIT=ON -DMULTI_GASES=ON -DDEBUG=ON | | fv3 |
RUN | control_wam_debug | | fv3 |

Expand Down
69 changes: 69 additions & 0 deletions tests/rt_utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,75 @@ qsub_id=0
slurm_id=0
bsub_id=0

function compute_petbounds_and_tasks() {

# each test MUST define ${COMPONENT}_tasks variable for all components it is using
# and MUST NOT define those that it's not using or set the value to 0.

# ATM is a special case since it is running on the sum of compute and io tasks.
# CHM component and mediator are running on ATM compute tasks only.

if [[ $DATM_CDEPS = 'false' ]]; then
if [[ ${ATM_compute_tasks:-0} -eq 0 ]]; then
ATM_compute_tasks=$((INPES * JNPES * NTILES))
fi
if [[ $QUILTING = '.true.' ]]; then
ATM_io_tasks=$((WRITE_GROUP * WRTTASK_PER_GROUP))
fi
fi

local n=0
unset atm_petlist_bounds ocn_petlist_bounds ice_petlist_bounds wav_petlist_bounds chm_petlist_bounds med_petlist_bounds aqm_petlist_bounds

# ATM
ATM_io_tasks=${ATM_io_tasks:-0}
if [[ $((ATM_compute_tasks + ATM_io_tasks)) -gt 0 ]]; then
atm_petlist_bounds="${n} $((n + ATM_compute_tasks + ATM_io_tasks -1))"
n=$((n + ATM_compute_tasks + ATM_io_tasks))
fi

# OCN
if [[ ${OCN_tasks:-0} -gt 0 ]]; then
ocn_petlist_bounds="${n} $((n + OCN_tasks - 1))"
n=$((n + OCN_tasks))
fi

# ICE
if [[ ${ICE_tasks:-0} -gt 0 ]]; then
ice_petlist_bounds="${n} $((n + ICE_tasks - 1))"
n=$((n + ICE_tasks))
fi

# WAV
if [[ ${WAV_tasks:-0} -gt 0 ]]; then
wav_petlist_bounds="${n} $((n + WAV_tasks - 1))"
n=$((n + WAV_tasks))
fi

# CHM
chm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# MED
med_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# AQM
aqm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

UFS_tasks=${n}

echo "ATM_petlist_bounds: ${atm_petlist_bounds:-}"
echo "OCN_petlist_bounds: ${ocn_petlist_bounds:-}"
echo "ICE_petlist_bounds: ${ice_petlist_bounds:-}"
echo "WAV_petlist_bounds: ${wav_petlist_bounds:-}"
echo "CHM_petlist_bounds: ${chm_petlist_bounds:-}"
echo "MED_petlist_bounds: ${med_petlist_bounds:-}"
echo "AQM_petlist_bounds: ${aqm_petlist_bounds:-}"
echo "UFS_tasks : ${UFS_tasks:-}"

# TASKS is now set to UFS_TASKS
export TASKS=$UFS_tasks
}

interrupt_job() {
set -x
if [[ $SCHEDULER = 'pbs' ]]; then
Expand Down
72 changes: 1 addition & 71 deletions tests/run_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,64 +32,6 @@ remove_fail_test() {
fi
}

function compute_petbounds() {

# each test MUST define ${COMPONENT}_tasks variable for all components it is using
# and MUST NOT define those that it's not using or set the value to 0.

# ATM is a special case since it is running on the sum of compute and io tasks.
# CHM component and mediator are running on ATM compute tasks only.

local n=0
unset atm_petlist_bounds ocn_petlist_bounds ice_petlist_bounds wav_petlist_bounds chm_petlist_bounds med_petlist_bounds aqm_petlist_bounds

# ATM
ATM_io_tasks=${ATM_io_tasks:-0}
if [[ $((ATM_compute_tasks + ATM_io_tasks)) -gt 0 ]]; then
atm_petlist_bounds="${n} $((n + ATM_compute_tasks + ATM_io_tasks -1))"
n=$((n + ATM_compute_tasks + ATM_io_tasks))
fi

# OCN
if [[ ${OCN_tasks:-0} -gt 0 ]]; then
ocn_petlist_bounds="${n} $((n + OCN_tasks - 1))"
n=$((n + OCN_tasks))
fi

# ICE
if [[ ${ICE_tasks:-0} -gt 0 ]]; then
ice_petlist_bounds="${n} $((n + ICE_tasks - 1))"
n=$((n + ICE_tasks))
fi

# WAV
if [[ ${WAV_tasks:-0} -gt 0 ]]; then
wav_petlist_bounds="${n} $((n + WAV_tasks - 1))"
n=$((n + WAV_tasks))
fi

# CHM
chm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# MED
med_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# AQM
aqm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

UFS_tasks=${n}

echo "ATM_petlist_bounds: ${atm_petlist_bounds:-}"
echo "OCN_petlist_bounds: ${ocn_petlist_bounds:-}"
echo "ICE_petlist_bounds: ${ice_petlist_bounds:-}"
echo "WAV_petlist_bounds: ${wav_petlist_bounds:-}"
echo "CHM_petlist_bounds: ${chm_petlist_bounds:-}"
echo "MED_petlist_bounds: ${med_petlist_bounds:-}"
echo "AQM_petlist_bounds: ${aqm_petlist_bounds:-}"
echo "UFS_tasks : ${UFS_tasks:-}"

}

if [[ $# != 5 ]]; then
echo "Usage: $0 PATHRT RUNDIR_ROOT TEST_NAME TEST_NR COMPILE_NR"
exit 1
Expand Down Expand Up @@ -174,22 +116,10 @@ fi

atparse < ${PATHRT}/parm/${MODEL_CONFIGURE:-model_configure.IN} > model_configure

if [[ $DATM_CDEPS = 'false' ]]; then
if [[ ${ATM_compute_tasks:-0} -eq 0 ]]; then
ATM_compute_tasks=$((INPES * JNPES * NTILES))
fi
if [[ $QUILTING = '.true.' ]]; then
ATM_io_tasks=$((WRITE_GROUP * WRTTASK_PER_GROUP))
fi
fi

compute_petbounds
compute_petbounds_and_tasks

atparse < ${PATHRT}/parm/${NEMS_CONFIGURE:-nems.configure} > nems.configure

# TASKS is now set to UFS_TASKS
export TASKS=$UFS_tasks

if [[ "Q${INPUT_NEST02_NML:-}" != Q ]] ; then
INPES_NEST=$INPES_NEST02; JNPES_NEST=$JNPES_NEST02
NPX_NEST=$NPX_NEST02; NPY_NEST=$NPY_NEST02
Expand Down
2 changes: 1 addition & 1 deletion tests/tests/rrfs_conus13km_hrrr_warm
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ export SMONTH=5
export SDAY=12
export SHOUR=16
export FHMAX=2
export DT_ATMOS=60
export DT_ATMOS=120
export RESTART_INTERVAL=1
export QUILTING=.true.
export WRITE_GROUP=1
Expand Down
Loading

0 comments on commit 87c8ea9

Please sign in to comment.