Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone, build, and run C48_ATM, C48_S2SW, and C96_atm3DVar on Gaea C5 and C6 #3106

Draft
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

DavidBurrows-NCO
Copy link
Contributor

@DavidBurrows-NCO DavidBurrows-NCO commented Nov 15, 2024

Description

What:
Correct build/run for C48_ATM, C48_S2SW, and C96_atm3DVar on Gaea C5. Add build and run capability for C48_ATM, C48_S2SW, and C96_atm3DVar on Gaea C6.
Why:
After the C5 OS upgrade, submodules no longer built in the global-workflow. This PR correct that and adds build/run capability to C6.

Resolves #3011
Depends on:
ufs-community/ufs-weather-model#2448
ufs-community/UFS_UTILS#995
NOAA-EMC/gfs-utils#87
NOAA-EMC/UPP#1070
NOAA-EMC/GSI#800
NOAA-EMC/GSI-utils#55
NOAA-EMC/GSI-Monitor#146
NOAA-EMC/GDASApp#1361

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)

Change characteristics

How has this been tested?

C5 and C6: clone, built, and ran C48_ATM and C48_S2SW successfully.
C96_atm3DVar is hanging in sfcanl jobs.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidBurrows-NCO
Copy link
Contributor Author

Hi @aerorahul @WalterKolczynski-NOAA We're still waiting on build merges for some submodules, so I've left this PR in draft. From our conversation Tuesday, I've pointed the submodules that were merged to their respective head of develop and the others to my commit for now. Should I be pointing to my submodule commits instead to limit the number of changes coming into GW? Thanks

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

sorc/build_all.sh needs the following update:

--- sorc/build_all.sh
+++ sorc/build_all.sh
@@ -149,7 +149,7 @@ build_opts["ww3prepost"]="${_wave_opt} ${_verbose_opt} ${_build_ufs_opt} ${_buil

 # Optional DA builds
 if [[ "${_build_ufsda}" == "YES" ]]; then
-   if [[ "${MACHINE_ID}" != "orion" && "${MACHINE_ID}" != "hera" && "${MACHINE_ID}" != "hercules" && "${MACHINE_ID}" != "wcoss2" && "${MACHINE_ID}" != "noaacloud" && "${MACHINE_ID}" != "gaea" ]]; then
+   if [[ "${MACHINE_ID}" != "orion" && "${MACHINE_ID}" != "hera" && "${MACHINE_ID}" != "hercules" && "${MACHINE_ID}" != "wcoss2" && "${MACHINE_ID}" != "noaacloud" && "${MACHINE_ID}" != "gaeac5" && "${MACHINE_ID}" != "gaeac6" ]]; then
       echo "NOTE: The GDAS App is not supported on ${MACHINE_ID}.  Disabling build."
    else
       build_jobs["gdas"]=8

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

also ush/load_ufsda_modules.sh needs

--- a/ush/load_ufsda_modules.sh
+++ b/ush/load_ufsda_modules.sh
@@ -34,13 +34,13 @@ source "${HOMEgfs}/ush/module-setup.sh"
 module use "${HOMEgfs}/sorc/gdas.cd/modulefiles"

 case "${MACHINE_ID}" in
-  ("hera" | "orion" | "hercules" | "wcoss2")
+  ("hera" | "orion" | "hercules" | "gaeac5" | "gaeac6" | "wcoss2")
     module load "${MODS}/${MACHINE_ID}"
     ncdump=$( command -v ncdump )
     NETCDF=$( echo "${ncdump}" | cut -d " " -f 3 )
     export NETCDF
     ;;
-  ("jet" | "gaea" | "s4" | "acorn")
+  ("jet" | "s4" | "acorn")
     echo WARNING: UFSDA NOT SUPPORTED ON THIS PLATFORM
     ;;
   *)

@DavidBurrows-NCO
Copy link
Contributor Author

also ush/load_ufsda_modules.sh needs

Thanks @jswhit I pushed changes to ush/load_ufsda_modules.sh and sorc/build_all.sh

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

also...

workflow/hosts/gaeac6.yaml and gaeac5.yaml:

-QUEUE_SERVICE: normal
+QUEUE_SERVICE: hpss
 PARTITION_BATCH: batch
-PARTITION_SERVICE: batch
+PARTITION_SERVICE: dtn_f5_f6

and modulefiles/module_gwsetup.gaeac6.lua:

-prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")
+prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/c6/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

env/GAEAC5.env and GAEAC6.env seem to be missing a bunch of stuff. I just copied HERCULES.env for both, and made some minor mods (see https://github.com/jswhit2/global-workflow/blob/develop/env/GAEAC5.env)

@jswhit
Copy link
Contributor

jswhit commented Nov 15, 2024

build_ww3prepost is failing for me on both c5 and c6 (using ufs-wx-model 2448)

@DavidBurrows-NCO
Copy link
Contributor Author

missing a bunch of stuff

@jswhit It's not really missing but intentionally minimized at the request of EMC porting to a new machine. Instead, we started from a nearly blank canvas and have been building up. Currently, the C5 and C6.env files are set up for C48_ATM, C48_S2SW, and C96_atm3DVar jobs. The 3DVarAOWCDA configuration you're running will definitely have some additional jobs. If you send those particular job names (or "step" in the env file). I will add them to the files.

@JessicaMeixner-NOAA
Copy link
Contributor

build_ww3prepost is failing for me on both c5 and c6 (using ufs-wx-model 2448)

@jswhit - can you point me to a log file? Maybe I can look and see if something is easy to fix with this.

@jswhit
Copy link
Contributor

jswhit commented Nov 18, 2024

build_ww3prepost is failing for me on both c5 and c6 (using ufs-wx-model 2448)

@jswhit - can you point me to a log file? Maybe I can look and see if something is easy to fix with this.

@JessicaMeixner-NOAA here is the error:

gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(451): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [WAV_RESTART_MOD]    use wav_restart_mod, only : read_restart
--------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(975): error #6632: Keyword arguments are invalid without an explicit interface.   [VA]            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
-------------------------------------------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(975): error #6632: Keyword arguments are invalid without an explicit interface.   [MAPSTA]
            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
--------------------------------------------------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(975): error #6632: Keyword arguments are invalid without an explicit interface.   [MAPST2]
            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
-----------------------------------------------------------------^
/gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90(451): error #6580: Name in only-list does not exist or is not accessible.   [READ_RESTART]
    use wav_restart_mod, only : read_restart
--------------------------------^
compilation aborted for /gpfs/f6/ira-da/proj-shared/Jeffrey.S.Whitaker/global-workflow-jswhit2/sorc/ufs_model.fd/WW3/model/src/w3initmd.F90 (code 1)

@JessicaMeixner-NOAA
Copy link
Contributor

@jswhit - Okay I know what the issue is, but it'll take a minute to get it fixed. The issue crept in with ufs-community/ufs-weather-model#2445 and we didn't catch it. If you go back one-commit of ufs-waether-model, hopefully things will run. We'll get a fix in as soon as possible.

@jswhit
Copy link
Contributor

jswhit commented Nov 19, 2024

@JessicaMeixner-NOAA I'm seeing this error in the gdas_fcst step on c6 when I run with ufs-wx-model 2448

424:  (abort_ice)ABORTED:
424:  (abort_ice) error =
424:  (construct_filename) ERROR: history filename already used for another history s
424:  tream iceh_inst.2021-03-24-10800.nc

and the traceback looks like this

473: ufs_model.x        0000000005E9CD8B  ice_broadcast_mp_         252  ice_broadcast.F90
473: ufs_model.x        0000000005F055E3  ice_history_write         169  ice_history_write.F90
473: ufs_model.x        0000000005C2A4E2  ice_history_mp_ac        4134  ice_history.F90
473: ufs_model.x        0000000005EE77FC  cice_runmod_mp_ci         367  CICE_RunMod.F90
473: ufs_model.x        0000000005B7DA06  ice_comp_nuopc_mp        1204  ice_comp_nuopc.F90
473: ufs_model.x        0000000000D05438  Unknown               Unknown  Unknown

Do you know of any recenter cice changes that could cause this?

@JessicaMeixner-NOAA
Copy link
Contributor

I don't know but I'm not as caught up on all the recent ufs wm changes as I normally am, but taking a quick look at ufs-weather-model says CICE hasn't been updated in 2 months.

@jswhit
Copy link
Contributor

jswhit commented Nov 19, 2024

For some more context on the cice error, from ice_diag.d:

(ice_comp_nuopc):(ModelAdvance) cice istep, nextsw_cday =         15      0.83111111111111D+02
 (ice_pio_init) create file ./CICE_OUTPUT/iceh_inst.2021-03-24-09600.nc

 Finished writing ./CICE_OUTPUT/iceh_inst.2021-03-24-09600.nc
(ice_comp_nuopc):(ModelAdvance) cice istep, nextsw_cday =         16      0.83118055555556D+02
 (ice_pio_init) create file ./CICE_OUTPUT/iceh_inst.2021-03-24-10200.nc

 Finished writing ./CICE_OUTPUT/iceh_inst.2021-03-24-10200.nc
(ice_comp_nuopc):(ModelAdvance) cice istep, nextsw_cday =         17      0.83125000000000D+02
 (ice_pio_init) create file ./CICE_OUTPUT/iceh_inst.2021-03-24-10800.nc

 Finished writing ./CICE_OUTPUT/iceh_inst.2021-03-24-10800.nc
 (construct_filename) history stream =            4
 (construct_filename) history filename = iceh_inst.2021-03-24-10800.nc
 (construct_filename) filename in use for stream            3
 (construct_filename) filename for stream iceh_inst.2021-03-24-10800.nc
 (construct_filename) Use namelist hist_suffix so history filenames are unique

@jswhit2 jswhit2 mentioned this pull request Nov 21, 2024
10 tasks
@jswhit2
Copy link
Contributor

jswhit2 commented Nov 21, 2024

The problem with the ice model (and a potential fix) are documented in PR #3121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GW submodules no longer building on Gaea-C5 after OS upgrade; Also add Gaea-C6 build
4 participants