Skip to content

Commit

Permalink
Merge #395
Browse files Browse the repository at this point in the history
395: Use ClimaAtmos new config file interface r=LenkaNovak a=valeriabarra

## Purpose 
This PR updates ClimaAtmos to v0.16.0 and its new interface for configuration files (rather than CLI options), and keeps the existing functionality for the Coupler to still use CLI options. 

This new interface reduces the maintenance burden and code duplication that we used to have (we used to manually copy/paste ClimaAtmos' default CLI options). Now, if any default in ClimaAtmos will change in the future, the Coupler won't need to overwrite them. They will be reflected directly in the Coupler.

Closes #388 

## To-do
- [x] Update all buildkite regular and longrun pipeline jobs
- [x] Complete tests in interactive mode
- [x] Test & Debug


## Content
- [x] Updated to latest ClimaAtmos release (v0.16.0)
- [x] Got rid of ClimaAtmos parsed_args table (keeping the only one entry that exists now, `--config_file`)
- [x] Renamed `"AMIP - modular, Float32 test"` -> `"AMIP - modular, Float64 test"`, since we were using Float64 in that test
- [x] Cleaned up `flame.jl` and `flame_diff.jl` scripts (they were referring to an old `job_id` `target_amip_n32_shortrun` that we don't use anymore) 
 
## Remap and MPI trouble shooting (issues that came up after cluster upgrade)
- [x] mpi hdf5 circ dependency circular dependencies throwing wrong library errors
  - need to use the JuliaProject.toml and specify `JULIA_LOAD_PATH` (soon to be resolved with the TempestRemap new release, but for now we need to load packages in a specific order)
- [x] bus errors, race conditions when writing new regrid files
  - tests exhibit different behaviour with different modules (solution above)
  - apply barriers were needed
- [x] overloaded ApplyOfflineMap remapping 
  - apply the correct `const comms_ctx = ClimaComms.context(ClimaComms.CPUSingleThreaded())`
- [x] coupler AMIP race conditions
  - ensure regrid directory contains the run-specific `run_name` 
- [x] tempest remap file error
  - TR has a character limit, so the regrid directory path shouldn't be too long.  

Review checklist

I have:
- followed the codebase contribution guide: https://clima.github.io/ClimateMachine.jl/latest/Contributing/
- followed the style guide: https://clima.github.io/ClimateMachine.jl/latest/DevDocs/CodeStyle/
- followed the documentation policy: https://github.com/CliMA/policies/wiki/Documentation-Policy
- checked that this PR does not duplicate an open PR.

In the Content, I have included 
- relevant unit tests, and integration tests, 
- appropriate docstrings on all functions, structs, and modules, and included relevant documentation.

-->

----
- [x] I have read and checked the items on the review checklist.


Co-authored-by: Valeria Barra <[email protected]>
  • Loading branch information
bors[bot] and valeriabarra authored Sep 21, 2023
2 parents c435636 + 45f86f6 commit 5f678b9
Show file tree
Hide file tree
Showing 49 changed files with 1,213 additions and 849 deletions.
18 changes: 18 additions & 0 deletions .buildkite/JuliaProject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[extras]
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
HDF5_jll = "0234f1f7-429e-5d53-9886-15a909be8d59"
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"

[preferences.CUDA_Runtime_jll]
version = "local"

[preferences.HDF5_jll]
libhdf5_path = "libhdf5"
libhdf5_hl_path = "libhdf5_hl"

[preferences.MPIPreferences]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
libmpi = "libmpi"
mpiexec = "mpiexec"
18 changes: 10 additions & 8 deletions .buildkite/longruns/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ env:
BUILDKITE_COMMIT: "${BUILDKITE_COMMIT}"
BUILDKITE_BRANCH: "${BUILDKITE_BRANCH}"
JULIA_MAX_NUM_PRECOMPILE_FILES: 100
CONFIG_PATH: "config/longrun_configs"
PERF_CONFIG_PATH: "config/perf_configs"
# JULIA_DEPOT_PATH: "${BUILDKITE_BUILD_PATH}/${BUILDKITE_PIPELINE_SLUG}/depot/cpu"

agents:
Expand Down Expand Up @@ -56,7 +58,7 @@ steps:

- label: "Slabplanet: default"
key: "slabplanet_default_longrun"
command: "julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name slabplanet_default_longrun --FLOAT_TYPE Float64 --coupled true --surface_setup PrescribedSurface --moist equil --vert_diff true --rad gray --energy_check true --mode_name slabplanet --t_end 60days --dt_save_to_sol 10days --dt_cpl 200 --dt 200secs --mono_surface true --h_elem 6 --precip_model 0M --anim true --job_id slabplanet_default_longrun"
command: "julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/slabplanet_default_longrun.yml"
artifact_paths: "experiments/AMIP/modular/output/slabplanet/slabplanet_default_longrun_artifacts/*"
env:
BUILD_HISTORY_HANDLE: ""
Expand All @@ -77,7 +79,7 @@ steps:

- label: "MPI AMIP FINE: target longrun"
key: "amip_longrun_target"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name amip_longrun_target --coupled true --anim true --surface_setup PrescribedSurface --dt_cpl 150 --energy_check false --mode_name amip --mono_surface false --vert_diff true --moist equil --rad clearsky --precip_model 0M --z_elem 35 --dz_bottom 50 --h_elem 12 --kappa_4 3e16 --rayleigh_sponge true --alpha_rayleigh_uh 0 --dt 150secs --t_end 140days --job_id amip_longrun_target --dt_save_to_sol 5days --dt_save_to_disk 1days --FLOAT_TYPE Float64"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/amip_longrun_target.yml"
artifact_paths: "experiments/AMIP/modular/output/amip/amip_longrun_target_artifacts/*"
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand All @@ -89,7 +91,7 @@ steps:
# MPI performance scaling (10 days)
- label: "MPI AMIP FINE: n64"
key: "mpi_amip_fine_n64"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name amip_n64_shortrun --coupled true --surface_setup PrescribedSurface --moist equil --vert_diff true --rad gray --z_elem 50 --dz_top 3000 --dz_bottom 30 --h_elem 16 --kappa_4 1e16 --z_stretch false --rayleigh_sponge true --alpha_rayleigh_uh 0 --alpha_rayleigh_w 10 --dt_cpl 150 --dt 150secs --dt_rad 1hours --FLOAT_TYPE Float64 --energy_check false --mode_name amip --t_end 10days --dt_save_to_sol 100days --mono_surface false --precip_model 0M"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/amip_n64_shortrun.yml"
artifact_paths: "experiments/AMIP/modular/output/amip/amip_n64_shortrun_artifacts/*"
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand All @@ -101,7 +103,7 @@ steps:

- label: "MPI AMIP FINE: n32"
key: "mpi_amip_fine_n32"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name amip_n32_shortrun --coupled true --surface_setup PrescribedSurface --moist equil --vert_diff true --rad gray --z_elem 50 --dz_top 3000 --dz_bottom 30 --h_elem 16 --kappa_4 1e16 --z_stretch false --rayleigh_sponge true --alpha_rayleigh_uh 0 --alpha_rayleigh_w 10 --dt_cpl 150 --dt 150secs --dt_rad 1hours --FLOAT_TYPE Float64 --energy_check false --mode_name amip --t_end 10days --dt_save_to_sol 100days --mono_surface false --precip_model 0M"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/amip_n32_shortrun.yml"
artifact_paths: "experiments/AMIP/modular/output/amip/amip_n32_shortrun_artifacts/*"
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand All @@ -113,7 +115,7 @@ steps:

- label: "MPI AMIP FINE: n8"
key: "mpi_amip_fine_n8"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name amip_n8_shortrun --coupled true --surface_setup PrescribedSurface --moist equil --vert_diff true --rad gray --z_elem 50 --dz_top 3000 --dz_bottom 30 --h_elem 16 --kappa_4 1e16 --z_stretch false --rayleigh_sponge true --alpha_rayleigh_uh 0 --alpha_rayleigh_w 10 --dt_cpl 150 --dt 150secs --dt_rad 1hours --FLOAT_TYPE Float64 --energy_check false --mode_name amip --t_end 10days --dt_save_to_sol 100days --mono_surface false --precip_model 0M"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/amip_n8_shortrun.yml"
artifact_paths: "experiments/AMIP/modular/output/amip/amip_n8_shortrun_artifacts/*"
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand All @@ -125,7 +127,7 @@ steps:

- label: "MPI AMIP FINE: n2" # 10d take 21h, so reducing to 1d
key: "mpi_amip_fine_n2"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name amip_n2_shortrun --coupled true --surface_setup PrescribedSurface --moist equil --vert_diff true --rad gray --z_elem 50 --dz_top 3000 --dz_bottom 30 --h_elem 16 --kappa_4 1e16 --z_stretch false --rayleigh_sponge true --alpha_rayleigh_uh 0 --alpha_rayleigh_w 10 --dt_cpl 150 --dt 150secs --dt_rad 1hours --FLOAT_TYPE Float64 --energy_check false --mode_name amip --t_end 1days --dt_save_to_sol 100days --mono_surface false --precip_model 0M"
command: "mpiexec julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/amip_n2_shortrun.yml"
artifact_paths: "experiments/AMIP/modular/output/amip/amip_n2_shortrun_artifacts/*"
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand All @@ -137,7 +139,7 @@ steps:

- label: "MPI AMIP FINE: n1" # also reported by longruns with a flame graph; 10d take 21h, so reducing to 1d
key: "mpi_amip_fine_n1"
command: "julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --run_name amip_n1_shortrun --coupled true --surface_setup PrescribedSurface --moist equil --vert_diff true --rad gray --z_elem 50 --dz_top 3000 --dz_bottom 30 --h_elem 16 --kappa_4 1e16 --z_stretch false --rayleigh_sponge true --alpha_rayleigh_uh 0 --alpha_rayleigh_w 10 --dt_cpl 150 --dt 150secs --dt_rad 1hours --FLOAT_TYPE Float64 --energy_check false --mode_name amip --t_end 1days --dt_save_to_sol 100days --mono_surface false --precip_model 0M"
command: "julia --color=yes --project=experiments/AMIP/modular/ experiments/AMIP/modular/coupler_driver_modular.jl --config_file $CONFIG_PATH/amip_n1_shortrun.yml"
artifact_paths: "experiments/AMIP/modular/output/amip/amip_n1_shortrun_artifacts/*"
env:
BUILD_HISTORY_HANDLE: ""
Expand All @@ -146,7 +148,7 @@ steps:

# mpi_amip_fine_n1 flame graph report (NB: arguments passed from the ci pipeline.yml)
- label: ":rocket: performance: flame graph diff: perf_target_amip_n1_shortrun"
command: "julia --color=yes --project=perf perf/flame_diff.jl --run_name 4"
command: "julia --color=yes --project=perf perf/flame_diff.jl --config_file $PERF_CONFIG_PATH/perf_diff_target_amip_n1_shortrun.yml"
artifact_paths: "perf/output/perf_diff_target_amip_n1_shortrun/*"
agents:
slurm_mem: 20GB
Expand Down
Loading

0 comments on commit 5f678b9

Please sign in to comment.