Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building of the ufs-weather-model, WW3 pre/post execs for GFS, GEFS, SFS in the same clone of global-workflow #3098

Open
wants to merge 34 commits into
base: develop
Choose a base branch
from

Conversation

aerorahul
Copy link
Contributor

@aerorahul aerorahul commented Nov 14, 2024

Description

GFS and GEFS (and now SFS) uses different compile time options for the UFS-weather-model. For the purposes of CI testing, a multi-build pipeline under Jenkins was created by @TerrenceMcGuinness-NOAA. This served well, until now. With the inclusion of SFS, a third variety of the model is being built. Under the multi-build pipeline paradigm, a second (or third) clone and build of the global-workflow is required. This adds cloning and compilation time of the global-workflow.

This PR allows compiling the ufs-weather-model in a single clone of the global-workflow. The compiled executables based on the options for GFS, GEFS, and SFS results in a model executable as gfs_model.x, gefs_model.x, and sfs_model.x. The forecast script uses the right executable.

This PR also differentiates the WW3 pre/post executables based on gfs_ or gefs_. SFS variants need to be introduced, when NET=sfs is added.
In the process of updating the WW3 pre/post executable names, it was discovered (via grep) the following are not used:

  • ww3_prep
  • ww3_outf
  • ww3_ounf
  • ww3_ounp
    These need to be confirmed by running the workflow to ensure they are indeed not used at runtime.

This PR also updates the Jenkinsfile to use the multi-build from the single location.

Type of change

  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

This PR does not update any submodules.

How has this been tested?

In progress

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

sorc/build_ww3prepost.sh Fixed Show fixed Hide fixed
scripts/exgfs_wave_post_pnt.sh Fixed Show fixed Hide fixed
ush/wave_grib2_sbs.sh Fixed Show fixed Hide fixed
ush/wave_grib2_sbs.sh Fixed Show fixed Hide fixed
ush/wave_grib2_sbs.sh Fixed Show fixed Hide fixed
ush/wave_grid_interp_sbs.sh Fixed Show fixed Hide fixed
ush/wave_grid_moddef.sh Fixed Show fixed Hide fixed
ush/wave_prnc_cur.sh Fixed Show fixed Hide fixed
@aerorahul aerorahul changed the title Allow building of the ufs-weather-model for GFS, GEFS, SFS in the same clone of global-workflow Allow building of the ufs-weather-model, WW3 pre/post execs for GFS, GEFS, SFS in the same clone of global-workflow Dec 5, 2024
@aerorahul
Copy link
Contributor Author

I am going to mark this as ready for review to gather feedback on the code changes before I commence extensive testing.

sorc/build_ww3prepost.sh Fixed Show fixed Hide fixed
sorc/build_ww3prepost.sh Show resolved Hide resolved
ush/wave_grid_interp_sbs.sh Outdated Show resolved Hide resolved
sorc/build_ww3prepost.sh Show resolved Hide resolved
@aerorahul
Copy link
Contributor Author

Thats the thing, we don't currently have that capability of doing quick diffs on few CI tests. Its coming, but until then its manual.

@JessicaMeixner-NOAA
Copy link
Contributor

@aerorahul Might be worth a manual test given the scope of the change here? Maybe 1 gfs and 1 gefs test?

The error from @WalterKolczynski-NOAA is in /lfs/h2/emc/stmp/walter.kolczynski/RUNDIRS/C48_S2SW_3098/gfs.2021032312/waveinit.24886/moddef_glo_200/grid_glo_200.out

which says:

/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/ush/wave_grid_moddef.sh: line 96: /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/exec/gfs_ww3_grid.x: No such file or directory

Looking at somefiles, we link things:
ls -l /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/exec/gfs_ww3_grid.x
lrwxrwxrwx 1 walter.kolczynski emc 115 Dec 11 21:00 /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/exec/gfs_ww3_grid.x -> /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/sorc/ufs_model.fd/WW3/install/pdlib_ON/bin/ww3_grid.x

but seems like things got cleaned up:
ls /lfs/h2/emc/ptmp/walter.kolczynsPR_3098/global-workflow/sorc/ufs_model.fd/WW3/install/pdlib_ON/bin/ww3_grid.x
ls: cannot access '/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/sorc/ufs_model.fd/WW3/install/pdlib_ON/bin/ww3_grid.x': No such file or directory

Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed labels Dec 11, 2024
@WalterKolczynski-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA The results will be available in the path given above (/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/RUNTESTS). I'll keep it until you have a chance to review.

@JessicaMeixner-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA The results will be available in the path given above (/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/RUNTESTS). I'll keep it until you have a chance to review.

where will the develop output be for the comparisons

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Dec 12, 2024
@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Dec 12, 2024

C48_S2SW is still failing, now at wavepostsbs when it tries to execute gfs_ww3_grib.x. Error code 41

Digging in, I find a couple different things. The ww3_grib error seems to come an inability to open a file (output_20210323120000/glo_200_grib/grib2_global_000.out):

 Additional GRIB parameters : 
 -----------------------------------------------------
      Run time           : 2021/03/23 12:00:00 UTC
      GRIB center ID     :    7
      GRIB gen. proc. ID :   11
      GRIB grid ID       :  255
      GRIB GDS parameter :    0

 *** WAVEWATCH III ERROR IN W3IOGO : 
     ERROR IN OPENING FILE
     IOSTAT =   29

There is a zero-size gribfile and a dangling symlink for out_grd.ww3

There is also a failure in wave_grib_interp_sbs because it is trying to write to fix, which I think is supposed to be the source of the out_grd.ww3 target:

+ wave_grid_interp_sbs.sh[147](glo_200): '[' no = no ']'
+ wave_grid_interp_sbs.sh[149](glo_200): cp -f ./WHTGRIDINT.bin /lfs/h2/emc/stmp/walter.kolczynski/RUNDIRS/C48_S2SW_3098/gfs.2021032312/wavepostsbs.181086/ww3_gint.WHTGRIDINT.bi
n.glo_200
+ wave_grid_interp_sbs.sh[150](glo_200): cp -f ./WHTGRIDINT.bin /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/fix/wave/ww3_gint.WHTGRIDINT.bin.glo_200
cp: cannot create regular file '/lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3098/global-workflow/fix/wave/ww3_gint.WHTGRIDINT.bin.glo_200': Permission denied
+ wave_grid_interp_sbs.sh[1](glo_200): postamble wave_grid_interp_sbs.sh 1733980516 1
+ preamble.sh[70](glo_200): set +x
End wave_grid_interp_sbs.sh at 05:15:28 with error code 1 (time elapsed: 00:00:12)

This doesn't immediately fail, but continues until the ww3_grib failure later.

wave_grid_interp_sbs.sh definitely shouldn't be writing to fix/wave, but I don't understand why this is only a problem now.

The logs don't all get piped to STDOUT, so you have to root around in the DATA directory to find them (here's one: /lfs/h2/emc/stmp/walter.kolczynski/RUNDIRS/C48_S2SW_3098/gfs.2021032312/wavepostsbs.181086).

@WalterKolczynski-NOAA
Copy link
Contributor

I'll also note the C48_S2SWA_gefs case completes successfully.

export err=$?;err_chk

# Write interpolation file to main TEMP dir area if not there yet
if [ "wht_OK" = 'no' ]
if [ "${wht_OK}" = 'no' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this might be the error @WalterKolczynski-NOAA mentions here #3098 (comment) perhaps the best thing is to back this change out and let's make an issue so it's not holding up this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GEFS job succeeding would make sense because it wouldn't be doing any interpolation so it'd skip this part of the code.

There might be other issues - but my guess is this is our problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the change and added FIXME back to be addressed later. The script will no longer enter this if-block until the logic is fixed.

@TerrenceMcGuinness-NOAA
Copy link
Collaborator

This fail is from PR 307 in my forked repo for testing this build using the updated CI pipline:

running the C48mx500_hybAOWCDA FAILED on Hercules in Build# 6 with error logs:

/work2/noaa/global/CI/HERCULES/307/RUNTESTS/COMROOT/C48mx500_hybAOWCDA_22a4e28b/logs/2021032500/gdas_marineanlletkf.log

Follow link here to view the contents of the above file(s): (link)

All the other CI tests are all languishing in the Priority queues on Hercules

…s another bug that needs evaluation on the purpose of the logic. A FIXME tag has been added
@aerorahul
Copy link
Contributor Author

@guillaumevernieres

This fail is from PR 307 in my forked repo for testing this build using the updated CI pipline:

running the C48mx500_hybAOWCDA FAILED on Hercules in Build# 6 with error logs:

/work2/noaa/global/CI/HERCULES/307/RUNTESTS/COMROOT/C48mx500_hybAOWCDA_22a4e28b/logs/2021032500/gdas_marineanlletkf.log

Follow link here to view the contents of the above file(s): (link)

All the other CI tests are all languishing in the Priority queues on Hercules

@TerrenceMcGuinness-NOAA
I am not sure why this is the case. It seems like files are missing. I will wait for 3149 to finish running on Hercules to confirm it ran this test successfully in that PR.

Copy link
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates to the Jenkinsfile pipeline script look good. The try/catch on the scm checkout looks strong too. An initial test of the pipeline worked fine in a forked branch in the development project. When CM (@WalterKolczynski-NOAA) is ready we should launch a CI test directly in this PR using multiple labels.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Dec 12, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants