Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RRFS debug & 2threads variants fixed plus many boundary condition bugs #1437

Merged
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
1bc290c
fixes for rrfs debug tests (uninitialized memory in fv_regional_bc an…
SamuelTrahanNOAA Sep 18, 2022
0970b5d
rrfs 13km debug tests
SamuelTrahanNOAA Sep 18, 2022
05a4275
smoke bug fixes for restart
SamuelTrahanNOAA Sep 19, 2022
e17f67f
RRFS tests, but smoke takes too long due to 2hr wallclock limit, need…
SamuelTrahanNOAA Sep 19, 2022
c73b94d
remove smoke test variants
SamuelTrahanNOAA Sep 20, 2022
b626819
remove workarounds and fix remaining known bugs in ps_reg
SamuelTrahanNOAA Sep 20, 2022
a7b1803
a few more surface pressure bug fixes; now the test case runs in debu…
SamuelTrahanNOAA Sep 20, 2022
7fcd28d
update conus13km test list
SamuelTrahanNOAA Sep 20, 2022
fc49cf6
merge develop into FV3
SamuelTrahanNOAA Sep 20, 2022
51f2a72
bug fixes to my bug fixes
SamuelTrahanNOAA Sep 20, 2022
418217f
update to top of dev/emc atmos_cubed_sphere
SamuelTrahanNOAA Sep 20, 2022
9964f4b
Merge remote-tracking branch 'origin/develop' into bugfix/rrfs-debug-…
SamuelTrahanNOAA Sep 20, 2022
6789ebd
update atmos_cubed_sphere to dev/emc
SamuelTrahanNOAA Sep 20, 2022
7a7774e
update atmos_cubed_sphere
SamuelTrahanNOAA Sep 20, 2022
e61592d
workarounds and bug fixes from gnu compiler testing
SamuelTrahanNOAA Sep 21, 2022
6b7cb3b
atmos_cubed_sphere fixes&tweaks; ccpp/physics fix for precision issue…
SamuelTrahanNOAA Sep 21, 2022
939948b
120s timestep for conus13km tests
SamuelTrahanNOAA Sep 21, 2022
6eaabeb
atmos_cubed_sphere: simplify comments and explain snan
SamuelTrahanNOAA Sep 21, 2022
4fb44cd
move task calculations to compute_petbounds_and_tasks in rt_utils.sh;…
SamuelTrahanNOAA Sep 22, 2022
ef3aa86
Merge remote-tracking branch 'sam/bugfix/rt-sh-tasks' into bugfix/rrf…
SamuelTrahanNOAA Sep 22, 2022
1199227
Merge remote-tracking branch 'sam/bugfix/rt-sh-tasks' into bugfix/rrf…
SamuelTrahanNOAA Sep 22, 2022
0476261
disable conus13km decomp and restart tests that are known to not matc…
SamuelTrahanNOAA Sep 22, 2022
9c41ded
Merge branch 'bugfix/rrfs-debug-mode' of ssh://github.com/SamuelTraha…
SamuelTrahanNOAA Sep 22, 2022
05149a7
hera.gnu tests pass, except conus13km decomp and restart which are ex…
SamuelTrahanNOAA Sep 22, 2022
7a354c4
Merge branch 'bugfix/rrfs-debug-mode' of ssh://github.com/SamuelTraha…
SamuelTrahanNOAA Sep 22, 2022
b6e7012
Point to Sam's branches of fv3atm, atmos cubed sphere, and ccpp physics
SamuelTrahanNOAA Sep 22, 2022
54a70e9
Merge branch 'bugfix/rrfs-debug-mode' of ssh://github.com/SamuelTraha…
SamuelTrahanNOAA Sep 22, 2022
096b2e9
merge upstream/develop change for ccpp/physics url
SamuelTrahanNOAA Sep 22, 2022
d7ec03a
hera.intel tests pass
SamuelTrahanNOAA Sep 22, 2022
ceb1d69
jet.intel tests passed
SamuelTrahanNOAA Sep 22, 2022
68e8e02
move sanity checks to lsm_ruc and add "snow on ice" check
SamuelTrahanNOAA Sep 26, 2022
11179b8
hera.gnu tests passed again.
SamuelTrahanNOAA Sep 26, 2022
5ae327d
use i-1 & j-1 for two-point averages, when available
SamuelTrahanNOAA Sep 27, 2022
fe0b042
hera.gnu tests pass against new baseline after atmos_cubed_sphere i-j…
SamuelTrahanNOAA Sep 27, 2022
b3bba20
jet intel tests passed
SamuelTrahanNOAA Sep 27, 2022
2186010
Replace many changes with atmos_cubed_sphere PR #220
SamuelTrahanNOAA Oct 3, 2022
5174c04
hera.gnu tests passed
SamuelTrahanNOAA Oct 3, 2022
4dc1960
merge develop
SamuelTrahanNOAA Oct 10, 2022
807bfa8
hera gnu tests passed
SamuelTrahanNOAA Oct 10, 2022
5f8e1bf
jet intel tests passed
SamuelTrahanNOAA Oct 10, 2022
4972fb6
hera intel tests passed
SamuelTrahanNOAA Oct 11, 2022
369460b
intel hera tests passed again
SamuelTrahanNOAA Oct 11, 2022
22313cc
missing from prior commit: merge upstream to ccpp/physics; latest tes…
SamuelTrahanNOAA Oct 11, 2022
c0dc489
satisfy git's glitchiness
SamuelTrahanNOAA Oct 11, 2022
829b41f
hera gnu tests passed
SamuelTrahanNOAA Oct 11, 2022
b00ee13
merge upstream fv3
SamuelTrahanNOAA Oct 12, 2022
f04bcc2
merge upstream (except hera&jet test logs)
SamuelTrahanNOAA Oct 12, 2022
568f36e
update stochastic_physics url
SamuelTrahanNOAA Oct 12, 2022
1ae96af
add new BL_DATE
jkbk2004 Oct 12, 2022
94e257e
[AutoRT] hera.gnu Job Completed.
BrianCurtis-NOAA Oct 12, 2022
22ab217
[AutoRT] hera.intel Job Completed.
BrianCurtis-NOAA Oct 12, 2022
f9c6433
[AutoRT] orion.intel Job Completed.
BrianCurtis-NOAA Oct 13, 2022
fddfea1
[AutoRT] gaea.intel Job Completed.
BrianCurtis-NOAA Oct 13, 2022
292c65a
add jet.intel RT log: passed
jkbk2004 Oct 13, 2022
1349c2c
add cheyenne intel/gnu RT logs: passed
jkbk2004 Oct 13, 2022
e391921
WCOSS2 Intel RT Log
BrianCurtis-NOAA Oct 13, 2022
bb3da3c
point to EMC fv3atm
SamuelTrahanNOAA Oct 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
[submodule "FV3"]
path = FV3
url = https://github.com/NOAA-EMC/fv3atm
branch = develop
url = https://github.com/SamuelTrahanNOAA/fv3atm
branch = bugfix/rrfs-debug-mode
[submodule "WW3"]
path = WW3
url = https://github.com/NOAA-EMC/WW3
branch = dev/ufs-weather-model
[submodule "stochastic_physics"]
path = stochastic_physics
url = https://github.com/noaa-psd/stochastic_physics
url = https://github.com/NOAA-PSL/stochastic_physics
branch = master
[submodule "CMakeModules"]
path = CMakeModules
Expand Down
2 changes: 1 addition & 1 deletion FV3
Submodule FV3 updated 3 files
+4 −4 .gitmodules
+1 −1 atmos_cubed_sphere
+1 −1 ccpp/physics
582 changes: 313 additions & 269 deletions tests/RegressionTests_cheyenne.gnu.log

Large diffs are not rendered by default.

1,640 changes: 850 additions & 790 deletions tests/RegressionTests_cheyenne.intel.log

Large diffs are not rendered by default.

1,650 changes: 855 additions & 795 deletions tests/RegressionTests_gaea.intel.log

Large diffs are not rendered by default.

586 changes: 315 additions & 271 deletions tests/RegressionTests_hera.gnu.log

Large diffs are not rendered by default.

1,670 changes: 865 additions & 805 deletions tests/RegressionTests_hera.intel.log

Large diffs are not rendered by default.

1,694 changes: 849 additions & 845 deletions tests/RegressionTests_jet.intel.log

Large diffs are not rendered by default.

1,684 changes: 872 additions & 812 deletions tests/RegressionTests_orion.intel.log

Large diffs are not rendered by default.

1,250 changes: 655 additions & 595 deletions tests/RegressionTests_wcoss2.intel.log

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions tests/default_vars.sh
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,7 @@ export NSSL_INVERTCCN=.true.

# Smoke
export RRFS_SMOKE=.false.
export RRFS_RESTART=NO
export SEAS_OPT=2

# GWD
Expand Down
26 changes: 25 additions & 1 deletion tests/fv3_conf/rrfs_warm_run.IN
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,31 @@ mkdir INPUT RESTART

OPNREQ_TEST=${OPNREQ_TEST:-false}
SUFFIX=${RT_SUFFIX}
cp -r @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/* INPUT/

if [[ "${RRFS_RESTART:-NO}" == YES ]] ; then
# cp -r ../${DEP_RUN}${SUFFIX}/RESTART/${RESTART_FILE_PREFIX}.* ./INPUT
# rm -f INPUT/fv_core.res.*
# rm -f INPUT/fv_srf_wnd.res.*
# rm -f INPUT/fv_tracer.res.*
# rm -f INPUT/phy_data.*
# rm -f INPUT/sfc_data.*
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/grid_spec.nc INPUT/.
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/*_grid.tile*.nc INPUT/.
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/oro_data*.nc INPUT/.
for RFILE in ../${DEP_RUN}${SUFFIX}/RESTART/${RESTART_FILE_PREFIX}.*; do
[ -e $RFILE ] || exit 1
RFILE_OLD=$(basename $RFILE)
RFILE_NEW="${RFILE_OLD//${RESTART_FILE_PREFIX}./}"
cp $RFILE "INPUT/$RFILE_NEW"
done
for x in emi_data.nc SMOKE_GBBEPx_data.nc dust12m_data.nc gfs_ctrl.nc gfs_data.nc \
grid.tile7.halo4.nc ; do
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/$x INPUT/.
done
cp @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/gfs_bndy.* INPUT/.
else
cp -r @[INPUTDATA_ROOT]/FV3_input_data_conus13km/INPUT/* INPUT/
fi

for x in global_glacier.2x2.grb global_h2oprdlos.f77 global_maxice.2x2.grb \
global_o3prdlos.f77 global_snoclim.1.875.grb global_zorclim.1x1.grb \
Expand Down
1 change: 1 addition & 0 deletions tests/parm/model_configure_rrfs_conus13km.IN
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ start_hour: @[SHOUR]
start_minute: 0
start_second: 0
nhours_fcst: @[FHMAX]
fhrot: @[FHROT]

dt_atmos: @[DT_ATMOS]
calendar: 'julian'
Expand Down
16 changes: 15 additions & 1 deletion tests/rt.conf
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,19 @@ RUN | rrfs_v1nssl
RUN | rrfs_v1nssl_nohailnoccn | | fv3 |

RUN | rrfs_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm | | fv3 |

# These do not match the control yet:
# RUN | rrfs_conus13km_hrrr_warm_decomp | | |
# RUN | rrfs_conus13km_radar_tten_warm_decomp | | |

RUN | rrfs_conus13km_hrrr_warm_2threads | | |
RUN | rrfs_conus13km_radar_tten_warm_2threads | | |

# These do not match the control yet:
# RUN | rrfs_conus13km_hrrr_warm_restart | | | rrfs_conus13km_hrrr_warm
# RUN | rrfs_conus13km_radar_tten_warm_restart | | | rrfs_conus13km_radar_tten_warm

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16_csawmg,FV3_GFS_v16_ugwpv1,FV3_GFS_v16_ras,FV3_GFS_v16_noahmp | | fv3 |
RUN | control_csawmg | - gaea.intel | fv3 |
Expand All @@ -121,6 +132,9 @@ RUN | control_wam

COMPILE | -DAPP=ATM -DDEBUG=ON -D32BIT=ON | | fv3 |

RUN | rrfs_conus13km_hrrr_warm_debug | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm_debug | | fv3 |

RUN | control_debug | | fv3 |
RUN | control_2threads_debug | | |
RUN | control_CubedSphereGrid_debug | | fv3 |
Expand Down
4 changes: 3 additions & 1 deletion tests/rt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ if [[ $TESTS_FILE =~ '35d' ]] || [[ $TESTS_FILE =~ 'weekly' ]]; then
TEST_35D=true
fi

BL_DATE=20221007
BL_DATE=20221012

RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-${BL_DATE}/${RT_COMPILER^^}}

Expand Down Expand Up @@ -715,6 +715,8 @@ EOF
(
source ${PATHRT}/tests/$TEST_NAME

compute_petbounds_and_tasks

TPN=$(( TPN / THRD ))
NODES=$(( TASKS / TPN ))
if (( NODES * TPN < TASKS )); then
Expand Down
16 changes: 13 additions & 3 deletions tests/rt_gnu.conf
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,16 @@ RUN | hrrr_control_2threads
RUN | hrrr_control_decomp | | |
RUN | hrrr_control_restart | | | hrrr_control
RUN | rrfs_v1beta | | fv3 |
RUN | rrfs_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | fv3 |

RUN | rrfs_conus13km_hrrr_warm | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | fv3 |

RUN | rrfs_conus13km_radar_tten_warm | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm_2threads | | |

# These two are known to not match the control:
#RUN | rrfs_conus13km_radar_tten_warm_decomp | | |
#RUN | rrfs_conus13km_radar_tten_warm_restart | | | rrfs_conus13km_radar_tten_warm

##################################################################################################################################################################
# CCPP DEBUG tests #
Expand All @@ -48,6 +55,9 @@ RUN | control_ras_debug
RUN | control_stochy_debug | | fv3 |
RUN | control_debug_p8 | | fv3 |

RUN | rrfs_conus13km_hrrr_warm_debug | | fv3 |
RUN | rrfs_conus13km_radar_tten_warm_debug | | fv3 |

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16_fv3wam -D32BIT=ON -DMULTI_GASES=ON -DDEBUG=ON | | fv3 |
RUN | control_wam_debug | | fv3 |

Expand Down
69 changes: 69 additions & 0 deletions tests/rt_utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,75 @@ qsub_id=0
slurm_id=0
bsub_id=0

function compute_petbounds_and_tasks() {

# each test MUST define ${COMPONENT}_tasks variable for all components it is using
# and MUST NOT define those that it's not using or set the value to 0.

# ATM is a special case since it is running on the sum of compute and io tasks.
# CHM component and mediator are running on ATM compute tasks only.

if [[ $DATM_CDEPS = 'false' ]]; then
if [[ ${ATM_compute_tasks:-0} -eq 0 ]]; then
ATM_compute_tasks=$((INPES * JNPES * NTILES))
fi
if [[ $QUILTING = '.true.' ]]; then
ATM_io_tasks=$((WRITE_GROUP * WRTTASK_PER_GROUP))
fi
fi

local n=0
unset atm_petlist_bounds ocn_petlist_bounds ice_petlist_bounds wav_petlist_bounds chm_petlist_bounds med_petlist_bounds aqm_petlist_bounds

# ATM
ATM_io_tasks=${ATM_io_tasks:-0}
if [[ $((ATM_compute_tasks + ATM_io_tasks)) -gt 0 ]]; then
atm_petlist_bounds="${n} $((n + ATM_compute_tasks + ATM_io_tasks -1))"
n=$((n + ATM_compute_tasks + ATM_io_tasks))
fi

# OCN
if [[ ${OCN_tasks:-0} -gt 0 ]]; then
ocn_petlist_bounds="${n} $((n + OCN_tasks - 1))"
n=$((n + OCN_tasks))
fi

# ICE
if [[ ${ICE_tasks:-0} -gt 0 ]]; then
ice_petlist_bounds="${n} $((n + ICE_tasks - 1))"
n=$((n + ICE_tasks))
fi

# WAV
if [[ ${WAV_tasks:-0} -gt 0 ]]; then
wav_petlist_bounds="${n} $((n + WAV_tasks - 1))"
n=$((n + WAV_tasks))
fi

# CHM
chm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# MED
med_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# AQM
aqm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

UFS_tasks=${n}

echo "ATM_petlist_bounds: ${atm_petlist_bounds:-}"
echo "OCN_petlist_bounds: ${ocn_petlist_bounds:-}"
echo "ICE_petlist_bounds: ${ice_petlist_bounds:-}"
echo "WAV_petlist_bounds: ${wav_petlist_bounds:-}"
echo "CHM_petlist_bounds: ${chm_petlist_bounds:-}"
echo "MED_petlist_bounds: ${med_petlist_bounds:-}"
echo "AQM_petlist_bounds: ${aqm_petlist_bounds:-}"
echo "UFS_tasks : ${UFS_tasks:-}"

# TASKS is now set to UFS_TASKS
export TASKS=$UFS_tasks
}

interrupt_job() {
set -x
if [[ $SCHEDULER = 'pbs' ]]; then
Expand Down
72 changes: 1 addition & 71 deletions tests/run_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,64 +32,6 @@ remove_fail_test() {
fi
}

function compute_petbounds() {

# each test MUST define ${COMPONENT}_tasks variable for all components it is using
# and MUST NOT define those that it's not using or set the value to 0.

# ATM is a special case since it is running on the sum of compute and io tasks.
# CHM component and mediator are running on ATM compute tasks only.

local n=0
unset atm_petlist_bounds ocn_petlist_bounds ice_petlist_bounds wav_petlist_bounds chm_petlist_bounds med_petlist_bounds aqm_petlist_bounds

# ATM
ATM_io_tasks=${ATM_io_tasks:-0}
if [[ $((ATM_compute_tasks + ATM_io_tasks)) -gt 0 ]]; then
atm_petlist_bounds="${n} $((n + ATM_compute_tasks + ATM_io_tasks -1))"
n=$((n + ATM_compute_tasks + ATM_io_tasks))
fi

# OCN
if [[ ${OCN_tasks:-0} -gt 0 ]]; then
ocn_petlist_bounds="${n} $((n + OCN_tasks - 1))"
n=$((n + OCN_tasks))
fi

# ICE
if [[ ${ICE_tasks:-0} -gt 0 ]]; then
ice_petlist_bounds="${n} $((n + ICE_tasks - 1))"
n=$((n + ICE_tasks))
fi

# WAV
if [[ ${WAV_tasks:-0} -gt 0 ]]; then
wav_petlist_bounds="${n} $((n + WAV_tasks - 1))"
n=$((n + WAV_tasks))
fi

# CHM
chm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# MED
med_petlist_bounds="0 $((ATM_compute_tasks - 1))"

# AQM
aqm_petlist_bounds="0 $((ATM_compute_tasks - 1))"

UFS_tasks=${n}

echo "ATM_petlist_bounds: ${atm_petlist_bounds:-}"
echo "OCN_petlist_bounds: ${ocn_petlist_bounds:-}"
echo "ICE_petlist_bounds: ${ice_petlist_bounds:-}"
echo "WAV_petlist_bounds: ${wav_petlist_bounds:-}"
echo "CHM_petlist_bounds: ${chm_petlist_bounds:-}"
echo "MED_petlist_bounds: ${med_petlist_bounds:-}"
echo "AQM_petlist_bounds: ${aqm_petlist_bounds:-}"
echo "UFS_tasks : ${UFS_tasks:-}"

}

if [[ $# != 5 ]]; then
echo "Usage: $0 PATHRT RUNDIR_ROOT TEST_NAME TEST_NR COMPILE_NR"
exit 1
Expand Down Expand Up @@ -174,22 +116,10 @@ fi

atparse < ${PATHRT}/parm/${MODEL_CONFIGURE:-model_configure.IN} > model_configure

if [[ $DATM_CDEPS = 'false' ]]; then
if [[ ${ATM_compute_tasks:-0} -eq 0 ]]; then
ATM_compute_tasks=$((INPES * JNPES * NTILES))
fi
if [[ $QUILTING = '.true.' ]]; then
ATM_io_tasks=$((WRITE_GROUP * WRTTASK_PER_GROUP))
fi
fi

compute_petbounds
compute_petbounds_and_tasks

atparse < ${PATHRT}/parm/${NEMS_CONFIGURE:-nems.configure} > nems.configure

# TASKS is now set to UFS_TASKS
export TASKS=$UFS_tasks

if [[ "Q${INPUT_NEST02_NML:-}" != Q ]] ; then
INPES_NEST=$INPES_NEST02; JNPES_NEST=$JNPES_NEST02
NPX_NEST=$NPX_NEST02; NPY_NEST=$NPY_NEST02
Expand Down
2 changes: 1 addition & 1 deletion tests/tests/rrfs_conus13km_hrrr_warm
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ export SMONTH=5
export SDAY=12
export SHOUR=16
export FHMAX=2
export DT_ATMOS=60
export DT_ATMOS=120
export RESTART_INTERVAL=1
export QUILTING=.true.
export WRITE_GROUP=1
Expand Down
Loading