-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFSv16.2.0 - WCOSS2 transition #892
Conversation
Build updates based on NCO feedback and new WAFS tag
- Move enkf out of gdas and rename it to enkfgdas. Include all ecflow definition files job name Include all ecflow scripts name and job/log name - Move "model=gfs" to the top on each job except all jobs under obsproc. obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose. - Remove the source of model_ver from each ecflow script except all jobs under obsproc. obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose.
Update analysis ecflow script to use 128 for wcoss2 Remove extra CDATE
Update ecflow package for wcoss2 GFS transition
- update ROTDIR setting in NCO mode base config to use compath.py in its definition - this change supports the removal of a RUN_ENVIR=nco if-block in the JJOB scripts that set ROTDIR Refs: #399
- remove a RUN_ENVIR=nco if-block in JJOB scripts that set ROTDIR via COMROOT - ROTDIR is now set from the configuration level using compath.py - also update COMIN[COMOUT]wave paths in JGLOBAL_FORECAST to use compath.py for defaults Refs: #399
- update GLDAS tag to gldas_gfsv16_release.v1.24.0 - update WAFS to gfs_wafs.v6.2.6 Refs: #399
- remove "-o" after compath.py in COMIN definitions - add "${envir}" and move closing ")" forward in line Refs: #399
Incorporate ecflow feedback from NCO - part 1
- update WAFS tag in sorc/checkout.sh and release notes Refs: #399
…NOAA/global-workflow into feature/ops-wcoss2 * 'feature/ops-wcoss2' of https://github.com/KateFriedman-NOAA/global-workflow: revert ecflow include files to NCO versions. Will adapt as necessary for proper use
Add PBS debug directive.
- update COMIN paths in GEMPAK JJOB scripts for COMINukmet, COMINecmwf, and COMINnam to add the respective systems to the end of the path definition Refs: #399
Updates to ROTDIR/COMIN definitions related to compath.py and new GLDAS/WAFS tags
…e and intel. Ignore swp files
…nv-intel, craype and intel
…nv-intel, craype and intel
…nv-intel, craype and intel
Load compiler env. and modules in the ecf scripts.
…gnore in enkfgdas/post to ignore links
- The HOMEobsproc setting in config.base.nco.static is not used in operations and thus not needed in this version of config.base. Refs: #399
- Update on Dogwood implemented cgroups, which means memory limits are now enforced. - Exclusive jobs must now use "place=exclhost" insted of "place=excl". - Associated exclusive ecf script PBS statements are updated to exclhost. Refs: #399
Need to set exclhost for exclusive jobs on WCOSS2 now after cgroups was implemented. Matches updates to exclusive job ecf script PBS statements. Refs: #399
Will consider removing at later date. Refs: #399
Update "excl" to "exclhost" for exclusive jobs on WCOSS2
- Update config.efcs to run EnKF forecast job with serial netcdf instead of parallel netcdf. Based on joint decision between NCO and EMC. - Update C384 config.fv3.nco.static block to set DELTIM=200 (NCO request). - Update C384 config.fv3.nco.static block to set WRITE_GROUP=2 to speed up serial EnKF forecast jobs to fit inside needed window in ops. Refs: #399
- remove hyper=true in jgdas_atmos_analysis_calc.ecf - add export nth_echgres=$nth_echgres_gfs when CDUMP=gfs in config.analcalc; for correct thread setting at runtime - add export nth_echgres=4 to analcalc block in config.resources - add export nth_echgres_gfs=12 to analcalc block in config.resources Refs: #399
Hand-off tag to NCO is now EMC-v16.2.0.7 Refs: #399
Final pre-production freeze updates for GFSv16.2.0 package on WCOSS2
- NCO updated the default path for HOMENHC and tested it in prod on WCOSS2 during NHC test Refs: #399
- Based on testing on Dogwood after some WCOSS2 updates some memory and resource adjustments were made by NCO. - Memory updates to the gempak, awips, and fbwnd job ecf scripts. - Resource adjustments to remedy oversubscription errors in the post and postsnd jobs. Refs: #399
The gfspostsnd job was oversubscribing CPUs on WCOSS2 after updates on Dogwood. Updating resources settings to get them matching and working. Refs: #399
- Add updated memory values for awips and gempak jobs into resource configs to match similar updates in ecf scripts Refs: #399
WCOSS2 GFSv16.2.0 resource updates and NHC change
Sync merge from operations to get release notes for v16.1.8. All other v16.1.8 updates are already in v16.2.0 component tags. Refs: #399
* feature/ops-wcoss2_v16.2.0: (415 commits) Add GFSv16.1.8 release notes Matching memory updates for awips/gempak in config Update prior GFS version in v16.2.0 release notes Update gfspostsnd job resources - oversubscribing Memory and resource adjustments to some jobs (NCO) Update to HOMENHC default path in JGLOBAL_ATMOS_TROPCY_QC_RELOC Update EMC tag name in v16.2.0 release notes Resource updates for analysis_calc job on WCOSS2 Updated error handling in gfs_bufr script Add -g and -traceback flags to utility builds if missing EnKf forecast serial netcdf updates and DELTIM=200 Add HOMEobsproc back to config.base.nco.static Update "excl" to "exclhost" in workflow_utils.py Update ecf PBS excl to exclhost Remove reference to HOMEobsproc in NCO config.base Update GFSv16.2.0 release notes for new hand-off tag Update WCOSS2 env file cpu-bind flags for threading Update UPP tag to upp_v8.1.2 Remove nco_ver from build.ver - not needed Update release notes to update prior version Update GFSv16.2.0 release notes to reflect new tag Increase post_master job to 126 tasks Update enkfgdas_sfc job to use 60GB Add gsl module load needed by nco module Set hyper=true for gdas_atmos_analysis_calc job Optimized gfs_forecast job resource configuration Add WCOSS2 operations gfs defs files Add missing --init flag to GSI checkout submodule update file Release_Notes.gfs.v16.1.7.txt add Release_Notes.gfs.v16.1.7.txt Code update to syndat_getjtbul.fd for v16.1.7 Update HOMEobsproc paths in config.base.nco.static Update obsproc package settings in dev config.base Update prep.sh to use new WCOSS2 obsproc packages Add obsproc/prepobs run versions to wcoss2.ver Add needed gempak subfolder to gempak ush scripts Update GSI submodule command and release notes GFS v16.1.6 update: Turn off uv 224 VADWND Update GLDAS tag to gldas_gfsv16_release.v.2.0.0 Update DMPDIR and BASE_GIT paths for WCOSS2 Update Externals.cfg with GFSv16.2.0 component versions Update release notes for current ops version Move all PBS place settings to separate line Remove commented out lines from transfer lists Update WAFS tag to gfs_wafs.v6.2.8 GFS v16.1.6: Update release notes and comment in config.anal Update npe_node_fcst_gfs in config.resources.emc.dyn Updates to support wcoss2.ver Fold in transfer parm list updates from NCO Move transfer lists into new transfer folder Update wave job resources with NCO feedback Update EMC tag name in v16.2.0 release notes Updates to run.ver and create wcoss2.ver Script updates from NCO GFS v16.1.6: GSI update to add commercial GPSRO in DO-4 Move excl setting into resource line in ecf scripts Update gfs_forecast job resources Update several versions in run.ver Add OMP_PLACES=cores for fcst block in WCOSS2.env Update compilation flags for gaussian_sfcanl build Add the following scripts changed to remove module load libjpeg: jgdas_wave_prep.ecf jgfs_wave_prep.ecf Remove hardwired DELL path util/ush/make_tif.sh Remove esmf from enkf fcst 1. A check on job/ush/script from HOMEgfs, I found the following reference to USE_CFP: gldas_forcing.sh exgdas_atmos_chgres_forenkf.sh exgdas_atmos_gldas.sh exgdas_enkf_update.sh exglobal_atmos_analysis.sh exglobal_diag.sh Correct analysis job walltimes in config.resources The following scripts changed to remove module load wgrib2: jenkfgdas_sfc.ecf jgfs_wave_prdgen_bulls.ecf jgdas_wave_postsbs.ecf Adjust analysis job walltimes for ops Add missing EXPDIR setting to JGDAS_ATMOS_GEMPAK Remove non-WCOSS2 references in nco.static configs Update EMC tag name in release notes Change npe_analdiag to 96 Remove npe_node_eupd=9 setting on WCOSS2 Update several GSI/EnKF job resources Update to correct infinite loop in gempak script Add missing character to GLDAS tag in release notes Update GLDAS tag to gldas_gfsv16_release.v.1.28.0 Remove excl for gfswaveprep job PBS directive Update GFSv16.2.0 release notes GEMPAK_META script updates from Boi Update GEMPAK scripts Update gempak job in setup_workflow scripts Back out comment of job variable in awips scripts Resource adjustments for eobs, waveprep, gfspost Resource updates for analysis and eobs Change RUN to RUN2 in awips scripts Change RUN to RUN2 in gempak pgrb2 spec script Correct config list for wavepostbndpntbll job Comment out job variable in awips ecf scripts Reduce gdas analysis job walltime back to 40mins Remove nth_max usage in WCOSS2.env A few resource updates from NCO and WCOSS_C removal Update analysis job walltime to 50mins Optimization resource updates from NCO remove obsproc ecfs and there references from suite.def. work needs to find a proper trigger for the remaining dump job bringing in changes from @WeiWei-NCO after his testing Update analysis job walltime to 50 mins Update gdasechgres job resources Update esfc and analysis job resources Update C384 and C768 values in config.fv3.emc.dyn Update config.resource.emc.dyn with tested values Cleanup of config.fv3.nco.static Add COMIN_OBS/COMIN_GES_OBS and related xml support Add COMIN_OBS/COMIN_GES_OBS and related xml support Update resources for gdasesfc job in ecf script Numerous resource updates based on optimization Update for C768 gdasfcst job resource settings Update analysis job ecf resource settings Update to WCOSS2 env file for waveprep job Add missing get_awipsgroups function to fcstonly Update memory setting in workflow_utils for gfs update resources for more jobs in ecf scripts from NCO update resources for more jobs in ecf scripts from NCO update resources for jgfs_atmos_tropcy_qc_reloc.ecf. Remove developer overwrite section update resources for jgdas_atmos_tropcy_qc_reloc.ecf. Remove developer overwrite section update resources for more jobs to include memory in ecf scripts. wave init jobs need modules for Intel loaded update resources for wave jobs to include memory in ecf scripts update resources for atmos chgres for enkf in ecf scripts update resources for atmos pp wafs_gcip in ecf scripts update resources for atmos gempak_meta in ecf scripts update resources for atmos gempak in ecf scripts update resources for wave init, post and prep jobs in ecf scripts fix resource allocations for some jobs that NCO flagged were allocating too many cores Remove unneeded COM paths from wavepostsbs JJOB Update GLDAS tag in release notes Update GLDAS tag to gldas_gfsv16_release.v1.25.0 wave init jobs just need cray-pals per NCO. remove rest put NCO identified changes from the global-workflow in a branch remove gdas remnant from enkfgdas jobnames some PBS jobnames were hardwired gdas or gfs, while some inherited from %RUN%. This commit uses %RUN% to make it consistent and possibly will open the door for further consolidation between gdas and gfs families post/anl job is the same as the forecast hour. create a link, instead of having a copy Update post job resources in config.resources.nco.static Update post jobs ecf script resources Update release notes for new EMC tag Update workflow_utils.py to support exclusive Add imagemagick_ver=7.0.8-7 to run.ver Update NCO resource config for memory Update v16.2.0 release notes for ecf script linking Add memory setting to jgfs_atmos_wafs_master.ecf Add memory setting to jgfs_atmos_wafs_grib2_0p25.ecf Add memory setting to jgfs_atmos_wafs_grib2.ecf Add memory setting to jgfs_atmos_wafs_blending_0p25.ecf Add memory setting to jgfs_atmos_wafs_blending.ecf Add memory setting to jgfs_atmos_awips_g2_master.ecf Add memory setting to jgfs_atmos_awips_master.ecf Add excl tag to jgfs_atmos_gempak.ecf Add excl tag to jgfs_forecast.ecf add pesky blank lines at the end of script. reviewers are brutal add script that sets up the links to the master.ecf that loop over forecast hours add gitignore in appropriate places to ignore links. update defs to the consistent grib_wafs ecf tasks remove duplicate jgfs_atmos_wafs_f*.ecf files and rename f00 as master remove duplicate jgfs_atmos_awips_g2_f*.ecf files and rename f000 as master remove duplicate jgfs_atmos_awips_f*.ecf files and rename f000 as master remove duplicate gfs_atmos_post_fxxx.ecf files and rename f000 as master fix typo that causes the opposite effect ignore gdas/atmos/post/ forecast hour ecf links remove duplicate gdas_atmos_post_fxxx.ecf files and rename f000 as master remove remnant from PR555 that copied gdas/enkf to enkfgdas. add gitignore in enkfgdas/post to ignore links remove duplicate enkfgdas_post_fxxx.ecf files and rename f003 as master add script that sets up the links to the master.ecf that loop over forecast hours add gitignore in appropriate places to ignore links. update defs to the consistent grib_wafs ecf tasks remove duplicate jgfs_atmos_wafs_f*.ecf files and rename f00 as master remove duplicate jgfs_atmos_awips_g2_f*.ecf files and rename f000 as master remove duplicate jgfs_atmos_awips_f*.ecf files and rename f000 as master remove duplicate gfs_atmos_post_fxxx.ecf files and rename f000 as master fix typo that causes the opposite effect ignore gdas/atmos/post/ forecast hour ecf links remove duplicate gdas_atmos_post_fxxx.ecf files and rename f000 as master remove remnant from PR555 that copied gdas/enkf to enkfgdas. add gitignore in enkfgdas/post to ignore links remove duplicate enkfgdas_post_fxxx.ecf files and rename f003 as master request exclusive node where ncpus=128 remove memory requests of 500gb and request exclusive node instead every ecf script that loads compiler dependent module, now loads PrgEnv-intel, craype and intel every ecf script that loads compiler dependent module, now loads PrgEnv-intel, craype and intel every ecf script that loads compiler dependent module, now loads PrgEnv-intel, craype and intel every ecf script that loads cray-mpich, now loads PrgEnv-intel, craype and intel. Ignore swp files Correct COMIN paths in GEMPAK driver scripts Update COMIN paths for ukmet, ecmwf, and nam Update WAFS tag to gfs_wafs.v6.2.7 add #PBS -l debug=true to all .ecf files Correct COMIN definitions Update GLDAS and WAFS tags in release notes Update GLDAS tag to gldas_gfsv16_release.v1.24.0 revert ecflow include files to NCO versions. Will adapt as necessary for proper use Remove NCO if-block from JJOB scripts Update ROTDIR in config.base.nco.static Remove ecflow post assignment in envir-p1.h Remove remark from envir-p1.h and head.h Update analysis ecflow script to use 128 for wcoss2 Remove extra CDATE Reference to NCO version: - Move enkf out of gdas and rename it to enkfgdas. Include all ecflow definition files job name Include all ecflow scripts name and job/log name - Move "model=gfs" to the top on each job except all jobs under obsproc. obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose. - Remove the source of model_ver from each ecflow script except all jobs under obsproc. obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose. Update enkf structure changes in ecflow definition files Update WAFS tag to gfs_wafs.v6.2.6 Remove envvar from module-setup.*.inc scripts Remove envvar from WCOSS2 driver scripts Update machine-setup based on NCO feedback Update builds and makefiles based on NCO feedback Modulefile updates based on NCO feedback Update build.ver based on NCO feedback Add OMP_STACKSIZE to WCOSS2.env for forecast Update Release_Notes.gfs.v16.2.0.md Fix GFS fcst APRUN_FV3 command ecflow package for wcoss2 GFS transition WCOSS2 Migration and Porting #398 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rubber-stamped, since this is what is in operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very very big PR with nearly impossible to review in one sitting.
However, since this is operations and operations has been running and tested, it is ok to merge.
I would recommend doing a mock merge and then comparing the merged with operations_v16.2.0 to make sure that nothing incorrect is being retained or slipping in.
git checkout operations_v16.2.0
git checkout operations
git checkout -b operations_mock
git merge operations_v16.2.0
git commit -m 'merge operations_v16.2.0 into operations_mock'
git checkout operations_v16.2.0
git diff --name-status operations_mock
should result in zero-diffs.
Mock merge successful, no diffs encountered. Also diffed |
Description
This (rather large) PR merges the changes made during the WCOSS2 transition into the
operations
branch. A summary of the incoming changes is below. This PR brings theoperations
branch from GFSv16.1.8 to GFSv16.2.0.This PR merges the
operations_v16.2.0
branch into theoperations
branch. Theoperations_v16.2.0
branch is a combination of the last v16.2.0 hash from thefeature/ops-wcoss2
branch, a sync merge withoperations
to bring in the v16.1.8 release notes, and a pre-merge withoperations
to resolve conflicts.Summary of changes:
.gitignore
file; mimics updates made indevelop
to now ignore component and compilation files.Externals.cfg
andsorc/checkout.sh
to new WCOSS2 component tag versions.docs/Release_Notes.gfs.v16.1.4.txt
,docs/Release_Notes.gfs.v16.1.5.txt
, anddocs/Release_Notes.gfs.v16.1.6.txt
to remove extraneous spaces.docs/Release_Notes.gfs.v16.2.0.md
for GFSv16.2.0.jlogfile
throughout.ecflow/ecf
folder up one level to replace theecflow
folder:ecflow/ecf -> /ecf
gfs*.def
files.envir-p1.h
,head.h
, andtail.h
files for ecflow. Deleteenvir-p1-old.h
andmodel_ver.h
./ecf/setup_ecf_links.sh
script (added to installation instructions as well). Add.gitignore
files to ignore symlinks.gfs.ver
file; replaced by new version files.ORION.env
andWCOSS2.env
$RUN_ENVIR = "nco"
blocks./
before PDY (./PDY
)SENDCOM
where needed$NWROOT
to$PACKAGEROOT
throughoutcompath.py
throughout$gfs_ver
to COM* paths throughout (GFS now installed under$PACKAGEROOT/gfs/$gfs_ver
)$envir
from COM* paths throughout (replaced by$gfs_ver
in most cases)APRUN
variables frommpirun
tompiexec
where neededjobs/rocoto
) scripts:prep.sh
(new as of WCOSS2)COMINsyn
tovrfy.sh
KEEPDATA
removal from wave rocoto scriptsmodule-setup.*.inc
scripts.config.anal
$CDUMP == gfs
block inconfig.analcalc
to setnth_echgres=$nth_echgres_gfs
when gfs suiteQUEUE_ARCH
toQUEUE_SERVICE
inconfig.base.emc.dyn
PARTITION_BATCH
toconfig.base.emc.dyn
NWPROD
withPACKAGEROOT
inconfig.base.emc.dyn
andconfig.base.nco.static
COMROOT
toconfig.base.emc.dyn
HOMEOBSPROC_PREP
withHOMEobsproc
andHOMEobsproc_network[global]
withHOMEprepobs
inconfig.base.emc.dyn
andconfig.base.nco.static
ROTDIR
usingcompath.py
inconfig.base.nco.static
CHGRES_RSTPROD
toconfig.base.emc.dyn
FDATE
toconfig.base.emc.dyn
EXP_WARM_START
to config.base.emc.dyn`COMIN_OBS
andCOMIN_GES_OBS
toconfig.base.emc.dyn
WCOSS_DELL_P3
inconfig.base.nco.static
withWCOSS2
(hardcoded)OUTPUT_FILTYPES
and*chunk*d
variable blocks inconfig.efcs
andconfig.fcst
config.fv3.emc.dyn
to separate dev settings from ops settingsconfig.fv3
toconfig.fv3.nco.static
to hold ops settings; setnpe_node_max=128
for WCOSS2config.fv3.nco.static
for running in WCOSS2 opsGEMPAKSH
inconfig.gempak
FINDDATE
inconfig.gldas
GESROOT
block inconfig.prepbufr
config.resources
toconfig.resources.emc.dyn
to hold dev settingsconfig.resources.emc.dyn
andconfig.resources.nco.static
for running the GFS on WCOSS2 and the R&D platformsconfig.vrfy
FHMAX_WAV_IBP
toconfig.wavepostbndpnt
parm/transfer
folderbuild_all.sh
USE_PREINST_LIBS
blocks in build scriptsifort
toftn
, traceback, and/or HDF5 compile flagslink_fv3gfs.sh
link_fv3gfs.sh
forconfig.fv3.*
andconfig.resources.*
machine-setup.sh
build.ver
tomachine-setup.sh
load_fv3gfs_modules.sh
run.ver
toload_fv3gfs_modules.sh
/versions
folder andbuild.ver
/run.ver
versions files; also addwcoss2.ver
file for dev usageType of change
Port to the new WCOSS2 HPC machines Cactus and Dogwood.
How Has This Been Tested?
These changes have been extensively tested on WCOSS2 by EMC, NCO, and GDIT ahead of production go-live on June 28th, 2022.
Checklist
Refs #399, #398
MANY THANKS TO EVERYONE INVOLVED IN THE TRANSITION OF THE GFS TO WCOSS2!