Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing CI #207

Merged
merged 2 commits into from
Sep 14, 2022
Merged

Implementing CI #207

merged 2 commits into from
Sep 14, 2022

Conversation

laurenchilutti
Copy link
Contributor

@laurenchilutti laurenchilutti commented Sep 2, 2022

Description
Parallelworks is currently down, so I will keep this as a draft until it comes back on. - This is resolved.

This PR brings in a github actions yaml file that uses a Self-Hosted Runner (Parallelworks rdhpcs cloud resources) to automatically compile solo shield and run idealized tests on Pull Requests to the main branch of the GFDL_atmos_cubed_sphere repository. The actual scripts referenced in this yaml are stored on the Parallelworks resource. The following is a description of how the CI works:

  • The CI automatically triggers when anyone submits a PR to the main branch
  • The CI consists of 3 stages:
  1. Build
  2. Test
  3. Shut down
  • The Build stage has 4 steps:
  1. Checkout Code
  2. Build the sw solo shield configuration
  3. Build the nh solo shield configuration
  4. Build the hydro solo shield configuration
  • The Test stage only contains one test in this initial release. This stage will run a test and then compare restarts against a baseline. The test will fail if the run fails or the restarts differ
  • We are working to speed up these tests on the cloud (currently they are too slow to run all idealized tests)

How Has This Been Tested?

This has been tested with generating test PRs in my fork:

laurenchilutti#14 shows that when a PR targets a branch not named "main" the tests will not run

laurenchilutti#15 shows that when a PR targets the main branch tests run

Checklist:

Please check all whether they apply or not

@laurenchilutti laurenchilutti changed the title Draft: Implementing CI DRAFT: Implementing CI Sep 2, 2022
@laurenchilutti laurenchilutti marked this pull request as draft September 2, 2022 13:50
Comment on lines 16 to 17
runscript: [FV3checkoutStartClusters.py, FV3swStartClusters.py, FV3nhStartClusters.py, FV3hydroStartClusters.py]
steps:
Copy link
Member

@thomas-robinson thomas-robinson Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do each of these run through the cluster spin up part of the API script? Do they all build something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They all run through the start up script and just have a different testcmd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first only checks out code, the next 3 each build a different configuration

name: SOLO SHiELD build
strategy:
fail-fast: true
max-parallel: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many runners do you have? You have 4 jobs, do they need to all run one after the other? You could set up 4 runners to run all 4 at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have 1 right now. But I will set up multiple runners so that we can do multiple tests at the same time. For the build step I need them all to build in series (and not parallel).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the builds could be done in parallel if we have the CI system build the FMS library first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be an issue with the builds occurring simultaneously when they all use the same Build/exec folder? I could alter the Build system so that they all build with different exec folders

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current build system uses exec/${config}_${compiler}/ It could be altered to use a combination of config, hydro, comp, bit, and compiler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will start making those changes - building in parallel would make everything much faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to run the FMS build as a standalone process. Once we bring in the updates to build the few NCEPlibs needed by SHiELD, that would become part of that first step.

@laurenchilutti laurenchilutti changed the title DRAFT: Implementing CI Implementing CI Sep 2, 2022
@laurenchilutti laurenchilutti marked this pull request as ready for review September 2, 2022 15:00
@laurenchilutti
Copy link
Contributor Author

I have created PR NOAA-GFDL/SHiELD_build#15 to now have scripts to build libFMS and nceplibs. There is also the ability to store these libraries external to the SHiELD_build so that libFMS does not need to be built with every new CI instance. And I have updated the compile such that there are unique exec/ sub-directories for each configuration so that we can do all 3 builds in parallel.

I have also added more Github runners (3 in total) so we can run builds and tests in parallel. Once This PR from SHiELD_build is merged I will rerun the CI on this PR so we can see how it works

@thomas-robinson
Copy link
Member

@laurenchilutti After this gets merged it, submit a PR with code that should fail to make sure that the tests fail when expected.

@laurenchilutti
Copy link
Contributor Author

Tests passed and builds ran in parallel - future improvements I want to make:

  • Add more tests
  • Add more runners if needed (currently there are 3 so we can only run 3 jobs in parallel)
  • Make it such that any nh configuration tests only rely on the completion of the nhcompile, and the same for hydro and sw
  • Add tests that compare different layouts
  • Add tests that run from restarts to ensure restart functionality is as expected.

@laurenchilutti laurenchilutti merged commit ebd5a46 into NOAA-GFDL:main Sep 14, 2022
laurenchilutti added a commit that referenced this pull request Oct 13, 2022
* updated fv_regional_bc.F90 to read levsp from GFS_ctl file (#152)

* Updating a continue statement in fv_nudge (#153)

* update in fv_nudge to fix 666 label logic

* fix logic errors (#138)

* update to external_ic::remap_scale to skip remapping non-existent IC tracers (#159)

* Fix nudge logic (#157)

* fix logic errors

* fix answer changes

* adding back a line that was mistakenly deleted in a previous commit (#166)

* Release 042022 (#184)

* Merge of updates from GFDL Weather and Climate Dynamics Division (202204):

    *add license header to missing files and fix typo in header

    *updates needed for fv3_gfsphysics to have access to bounded_domain

    *remove obsoleted driver/SHiELD files

    *updating to fix bug where long_name and units attributes were not being captured in the RESTARTS

    *remove unused function fv_diagnostics::max_vorticity_hy1

    *remove avec timer remnants

    *adding ability to specify prefix and directory when reading and writing restarts

    *remove old style namelist read in favor of read from internal character variable

    *Added option for a mean wind

    *The planetary radius and rotation rate are now re-scalable by a namelist parameter (small_earth_scale) instead of using exclusively the hard-coded FMS constant.

    *fv_mapz: Cleanup and added helpful annotations to avoid getting lost so easily

    * remove duplicate code and fix lstatus on all grids depending on gfs_data and gfs_data.tile1

    * New idealized tests

    *Makes the non-hydrostatic restart variables optional for reads to allow hydrostatic ICs

    *Fix the hydrostatic TE remapping; Add GMAO cubic for TE remapping, which is used if kord_tm=0 and remap_te=.true.

    *Add a TE remapping option (kord_tm=0)

    *Addressing GNU Warnings

    *Add the L75 vertical config from HAFS

    * clean up fms_mp_mod and remove mp_bcst

* cherry pick 5193c6b from dev/emc

* Attempt at integrating fixes on top of dev/emc branch. (#173)

* remove outdated GFDL_tools/fv_diag_column.F90 which exists as the result of an improper merge.  The correct file is tools/fv_diag_column.F90 (#191)

* various io fixes (#192)

* fixes io performance issues by making everyone a reader when io_layout=1,1

adds capability to use FMS feature to ignore data integrity checksums in restarts

* rename enforce_rst_cksum to ignore_rst_cksum and change the default value for compatibility

* fix index error (#196)

* New notebooks (#190)

* Moving source files into source directory

* Added advection notebook

* Fixed subplot spacing

* New 3D case notebooks

* New idealized tests

Updated mtn_wave_tests_1km to restore missing graphics.

* first try at held-suarez

* Updated H-S superrotation notebook

* New level placement tool in an interactive note

* Minor updates to notebooks; deleted fv_eta binary.

* Regional decomposition test fix (when nrows_blend > 0) (#194) (#200)

* Regional bc blend changes to extend into interior halos and overlap on corners. Still not working for u and v.

dyn_core.F90 : Fix from Jun Wang to correct sync of u,v
fv_regional_bc.F90 : add check for nrows_blend > tile size; fix error when nrows_blend=1

Conflicts (not taken during chery-pick):
	driver/SHiELD/atmosphere.F90
	model/fv_sg.F90

Co-authored-by: Ted Mansell <[email protected]>

* Implementing CI (#207)

* CI Parallelworks update (#211)

* Update call to read_input_nml and remove unnecessary code. (#161)

* Removing use of INTERNAL_FILE_NML and cleaning up read_namelist_test_cases to remove unused argument

* deleting duplicate call to read_namelist_test_case_nml in fv_control

* removing commented code in fv_control

Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: menzel-gfdl <[email protected]>
Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: MatthewPyle-NOAA <[email protected]>
Co-authored-by: lharris4 <[email protected]>
Co-authored-by: Ted Mansell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants