Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialised data to ci #338

Merged
merged 18 commits into from
Jan 5, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixes
samkellerhals committed Jan 3, 2024
commit a22537dc66f3ac3619d3903e6491e898e6f89798
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@ _local
_external_src
_reports
tmp
serialized_data
testdata
simple_mesh*.nc

### GT4Py ####
20 changes: 10 additions & 10 deletions ci/cscs.yml
Original file line number Diff line number Diff line change
@@ -43,7 +43,7 @@ variables:
NUM_PROCESSES: auto
VIRTUALENV_SYSTEM_SITE_PACKAGES: 1
CSCS_NEEDED_DATA: icon4py
SERIALIZED_DATA_PATH: "/apps/daint/UES/jenkssl/ciext/icon4py"
TEST_DATA_PATH: "/apps/daint/UES/jenkssl/ciext/icon4py"

build_job:
extends: .build_template
@@ -52,14 +52,14 @@ test_model_job_roundtrip_simple_grid:
extends: .test_template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving my 2 cents here: All of these jobs could be easily expressed using https://docs.gitlab.com/ee/ci/yaml/#needsparallelmatrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this in a new PR.

stage: test
script:
- tox -r -c model/ --verbose -- --benchmark-skip -n auto
- tox -r -e run_stencil_tests -c model/ --verbose

test_model_job_dace_cpu_simple_grid:
extends: .test_template
stage: test
script:
- pip install dace==$DACE_VERSION
- tox -r -e run_stencil_tests -c model/ --verbose -- --benchmark-skip -n auto --backend=dace_cpu
- tox -r -e run_stencil_tests -c model/ --verbose -- --backend=dace_cpu
only:
- main
allow_failure: true
@@ -69,7 +69,7 @@ test_model_job_dace_gpu_simple_grid:
stage: test
script:
- pip install dace==$DACE_VERSION
- tox -r -e run_stencil_tests -c model/ --verbose -- --benchmark-skip -n auto --backend=dace_gpu
- tox -r -e run_stencil_tests -c model/ --verbose -- --backend=dace_gpu
only:
- main
allow_failure: true
@@ -78,13 +78,13 @@ test_model_job_gtfn_cpu_simple_grid:
extends: .test_template
stage: test
script:
- tox -r -e run_stencil_tests -c model/ --verbose -- --benchmark-skip -n auto --backend=gtfn_cpu
- tox -r -e run_stencil_tests -c model/ --verbose -- --backend=gtfn_cpu

test_model_job_gtfn_gpu_simple_grid:
extends: .test_template
stage: test
script:
- tox -r -e run_stencil_tests -c model/ --verbose -- --benchmark-skip -n auto --backend=gtfn_gpu
- tox -r -e run_stencil_tests -c model/ --verbose -- --backend=gtfn_gpu

test_tools_job:
extends: .test_template
@@ -97,7 +97,7 @@ benchmark_model_dace_cpu_icon_grid:
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_cpu --grid=icon_grid
- tox -r -e run_benchmarks -c model/ -- --backend=dace_cpu --grid=icon_grid
only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering: does our cscs-ci run count as manual trigger? I guess so... Also what does the only: -main refer to the target branch of the PR or are these options ignored in our setup since we run it from outside gitlab?

(don't know how it works and currentlyit runs always all of the jobs and the benchmarks take quite long... once we add the datatest that will get worse, response time wise...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to be honest since @edopao added these dace jobs, maybe he can explain more. I would assume these benchmarks run only on main but have to be manually triggered, how I am not sure. Currently the dace jobs seem to be not run when using cscs-ci run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented in today's standup meeting, the intention of this setting was to run the dace benchmark on main after PR is merged. However, this setting is ignored in our setup, as also noted above. I agree that we could have a separate CI pipeline for benchmarking, automatically triggered after PR is merged or by a daily job.

- main
when: manual
@@ -107,7 +107,7 @@ benchmark_model_dace_gpu_icon_grid:
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_gpu --grid=icon_grid
- tox -r -e run_benchmarks -c model/ -- --backend=dace_gpu --grid=icon_grid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this double double-dashes? Or did you simply forget to delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the double dashes are used to denote the end of arguments passed to tox itself, and that any subsequent arguments are to be treated as positional arguments passed to whatever command tox invokes, in this case pytest

only:
- main
when: manual
@@ -116,10 +116,10 @@ benchmark_model_gtfn_cpu_icon_grid:
extends: .test_template
stage: benchmark
script:
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=gtfn_cpu --grid=icon_grid
- tox -r -e run_benchmarks -c model/ -- --backend=gtfn_cpu --grid=icon_grid

benchmark_model_gtfn_gpu_icon_grid:
extends: .test_template
stage: benchmark
script:
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=gtfn_gpu --grid=icon_grid
- tox -r -e run_benchmarks -c model/ -- --backend=gtfn_gpu --grid=icon_grid
Original file line number Diff line number Diff line change
@@ -16,23 +16,23 @@
from icon4py.model.common.decomposition.definitions import get_processor_properties


DEFAULT_SERIALIZED_DATA_FOLDER = "serialized_data"
DEFAULT_TEST_DATA_FOLDER = "testdata"


def get_serialized_data_root_path() -> Path:
def get_test_data_root_path() -> Path:
test_utils_path = Path(__file__).parent
model_path = test_utils_path.parent.parent
common_path = model_path.parent.parent.parent.parent
env_base_path = os.getenv("SERIALIZED_DATA_PATH")
env_base_path = os.getenv("TEST_DATA_PATH")

if env_base_path:
return Path(env_base_path)
else:
return common_path.parent.joinpath(DEFAULT_SERIALIZED_DATA_FOLDER)
return common_path.parent.joinpath(DEFAULT_TEST_DATA_FOLDER)


SERIALIZED_DATA_ROOT = get_serialized_data_root_path()
SERIALIZED_DATA_PATH = SERIALIZED_DATA_ROOT.joinpath("ser_icondata")
TEST_DATA_ROOT = get_test_data_root_path()
SERIALIZED_DATA_PATH = TEST_DATA_ROOT.joinpath("ser_icondata")

# TODO: a run that contains all the fields needed for dycore, diffusion, interpolation fields needs to be consolidated
DATA_URIS = {
4 changes: 2 additions & 2 deletions model/common/tests/grid_tests/conftest.py
Original file line number Diff line number Diff line change
@@ -26,10 +26,10 @@
processor_props,
ranked_data_path,
)
from icon4py.model.common.test_utils.datatest_utils import SERIALIZED_DATA_PATH
from icon4py.model.common.test_utils.datatest_utils import TEST_DATA_ROOT


grids_path = SERIALIZED_DATA_PATH.joinpath("grids")
grids_path = TEST_DATA_ROOT.joinpath("grids")
r04b09_dsl_grid_path = grids_path.joinpath("mch_ch_r04b09_dsl")
r04b09_dsl_data_file = r04b09_dsl_grid_path.joinpath("mch_ch_r04b09_dsl_grids_v1.tar.gz").name
r02b04_global_grid_path = grids_path.joinpath("r02b04_global")
4 changes: 2 additions & 2 deletions model/driver/README.md
Original file line number Diff line number Diff line change
@@ -18,7 +18,7 @@ See the general instructions in the [README.md](../../README.md) in the base fol

```bash
export ICON4PY_ROOT=<path to the icon4py clone>
dycore_driver $ICON4PY_ROOT/serialized_data/ser_icondata/mpitask1/mch_ch_r04b09_dsl/ser_data --run_path=$ICON4PY_ROOT/output
dycore_driver $ICON4PY_ROOT/testdata/ser_icondata/mpitask1/mch_ch_r04b09_dsl/ser_data --run_path=$ICON4PY_ROOT/output
```

The driver code runs in parallel, in order to do this you need to install the optional communication libraries with:
@@ -32,7 +32,7 @@ pip install -r requirements-dev-opt.txt
then run

```bash
mpirun -np 2 dycore_driver $ICON4PY_ROOT/serialized_data/ser_icondata/mpitask2/mch_ch_r04b09_dsl/ser_data --mpi=True --run_path=$ICON4PY_ROOT/output
mpirun -np 2 dycore_driver $ICON4PY_ROOT/testdata/ser_icondata/mpitask2/mch_ch_r04b09_dsl/ser_data --mpi=True --run_path=$ICON4PY_ROOT/output
```

#### Remarks
2 changes: 1 addition & 1 deletion model/driver/src/icon4py/model/driver/dycore_driver.py
Original file line number Diff line number Diff line change
@@ -383,7 +383,7 @@ def main(input_path, run_path, mpi):
"""
Run the driver.
usage: python dycore_driver.py abs_path_to_icon4py/serialized_data/ser_icondata/mpitask1/mch_ch_r04b09_dsl/ser_data
usage: python dycore_driver.py abs_path_to_icon4py/testdata/ser_icondata/mpitask1/mch_ch_r04b09_dsl/ser_data
steps:
1. initialize model from serialized data:
8 changes: 4 additions & 4 deletions model/tox.ini
Original file line number Diff line number Diff line change
@@ -28,13 +28,13 @@ allowlist_externals =

[testenv:run_stencil_tests]
commands =
pytest -v -s -m "not slow_tests" --cov --cov-append atmosphere/diffusion/tests/diffusion_stencil_tests {posargs}
pytest -v -s -m "not slow_tests" --cov --cov-append atmosphere/dycore/tests/dycore_stencil_tests {posargs}
pytest -v -s -m "not slow_tests" --cov --cov-append atmosphere/diffusion/tests/diffusion_stencil_tests --benchmark-skip -n auto {posargs}
pytest -v -s -m "not slow_tests" --cov --cov-append atmosphere/dycore/tests/dycore_stencil_tests --benchmark-skip -n auto {posargs}

[testenv:run_benchmarks]
commands =
pytest -s -m "not slow_tests" atmosphere/diffusion/tests/diffusion_stencil_tests {posargs}
pytest -s -m "not slow_tests" atmosphere/dycore/tests/dycore_stencil_tests {posargs}
pytest -s -m "not slow_tests" atmosphere/diffusion/tests/diffusion_stencil_tests --benchmark-only {posargs}
pytest -s -m "not slow_tests" atmosphere/dycore/tests/dycore_stencil_tests --benchmark-only {posargs}


[testenv:dev]