Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedded backend config edits #636

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open

Conversation

nfarabullini
Copy link
Contributor

@nfarabullini nfarabullini commented Jan 9, 2025

setting embedded as default backend and related edits

@nfarabullini nfarabullini marked this pull request as draft January 9, 2025 11:02
ci/default.yml Outdated Show resolved Hide resolved
ci/default.yml Outdated Show resolved Hide resolved
@@ -21,6 +21,7 @@
class TestCalculateNabla2AndSmagCoefficientsForVn(StencilTest):
PROGRAM = calculate_nabla2_and_smag_coefficients_for_vn
OUTPUTS = ("kh_smag_e", "kh_smag_ec", "z_nabla2_e")
MARKERS = (pytest.mark.gtfn_miss_neighbors,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid gtfn_ in the marker, if possible. Do you know why it is failing? Can it be that the numpy reference is incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it only fails with the icon grid it could be one of these examples where we should run over a subset of the domain.

@@ -66,6 +66,7 @@ def calculate_nabla4_numpy(
class TestCalculateNabla4(StencilTest):
PROGRAM = calculate_nabla4
OUTPUTS = ("z_nabla4_e2",)
MARKERS = (pytest.mark.gtfn_miss_neighbors,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I suspect that for this kind of tests the reference numpy code does not support skip_values in the connectivity table.

model/testing/src/icon4py/model/testing/pytest_config.py Outdated Show resolved Hide resolved
@nfarabullini
Copy link
Contributor Author

cscs-ci run default

@@ -21,6 +21,7 @@
class TestCalculateNabla2AndSmagCoefficientsForVn(StencilTest):
PROGRAM = calculate_nabla2_and_smag_coefficients_for_vn
OUTPUTS = ("kh_smag_e", "kh_smag_ec", "z_nabla2_e")
MARKERS = (pytest.mark.gtfn_miss_neighbors,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it only fails with the icon grid it could be one of these examples where we should run over a subset of the domain.

for m in marker:
match m.markname:
case "embedded_remap_error" if is_embedded(backend):
pytest.xfail("Embedded backend currently fails in remap function.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add the link to the gt4py issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, it is not consistent with the other markers

model/testing/src/icon4py/model/testing/helpers.py Outdated Show resolved Hide resolved
Comment on lines +85 to +86
case "gtfn_miss_neighbors" if backend and ("gtfn" in backend.name):
pytest.xfail("gtfn_gpu and gtfn_cpu do not support missing neighbors.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtfn_gpu and gtfn_cpu do not support missing neighbors.

Is not true, let's figure out what the issue is.

Comment on lines +83 to +84
case "domain_dims_mismatch" if is_embedded(backend):
pytest.xfail("Stencil does not support missing neighbors.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marker name and xfail message seem unrelated. Please comment what this marker is about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the error:

self = <[AttributeError("'Domain' object has no attribute 'dims'") raised in repr()] Domain object at 0x13daebcd0>
dims = (Dimension(value='Cell', kind=<DimensionKind.HORIZONTAL: 'horizontal'>), Dimension(value='K', kind=<DimensionKind.VERTICAL: 'vertical'>))
ranges = (UnitRange(0, 18),), args = ()

    def __init__(
        self,
        *args: NamedRange[_Rng],
        dims: Optional[Sequence[Dimension]] = None,
        ranges: Optional[Sequence[_Rng]] = None,
    ) -> None:
        if dims is not None or ranges is not None:
            if dims is None and ranges is None:
                raise ValueError("Either specify both 'dims' and 'ranges' or neither.")
            if len(args) > 0:
                raise ValueError(
                    "No extra 'args' allowed when constructing from 'dims' and 'ranges'."
                )
    
            assert dims is not None and ranges is not None  # for mypy
            if not all(isinstance(dim, Dimension) for dim in dims):
                raise ValueError(
                    f"'dims' argument needs to be a 'tuple[Dimension, ...]', got '{dims}'."
                )
            if not all(isinstance(rng, UnitRange) for rng in ranges):
                raise ValueError(
                    f"'ranges' argument needs to be a 'tuple[UnitRange, ...]', got '{ranges}'."
                )
            if len(dims) != len(ranges):
>               raise ValueError(
                    f"Number of provided dimensions ({len(dims)}) does not match number of provided ranges ({len(ranges)})."
                )
E               ValueError: Number of provided dimensions (2) does not match number of provided ranges (1).

Copy link
Contributor

@havogt havogt Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding both tests marked with domain_dims_mismatch should instead be marked with both embedded_remap_error and embedded_as_offset_error. They fail for theta_v(C2E2C[0]) in truly_horizontal_diffusion_nabla_of_theta_over_steep_points.py:32, which is the problem of GridTools/gt4py#1583 but with concrete neighbor. After that would be fixed, it would still need an as_offset.

@@ -66,6 +71,21 @@ def allocate_data(backend, input_data):
return input_data


def _match_marker(marker, backend):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _match_marker(marker, backend):
def _apply_xfail(marker, backend):

maybe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function needs a better name

match m.markname:
case "embedded_remap_error" if is_embedded(backend):
pytest.xfail("Embedded backend currently fails in remap function.")
case "embedded_as_offset_error" if is_embedded(backend):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "embedded_as_offset_error" if is_embedded(backend):
case "uses_as_offset" if is_embedded(backend):

would be more in the spirit of the gt4py markers

Comment on lines +126 to +128
for marker in item.own_markers:
if marker.name.startswith("embedded") and item.config.getoption("--backend") == "embedded":
pytest.skip("test not compatible with embedded backend")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for all non-StencilTests, right? Then we should probably use the same function as in helpers.py.

model/testing/src/icon4py/model/testing/helpers.py Outdated Show resolved Hide resolved
Comment on lines 12 to 16
rules:
# exclude slow test configurations
- if: $BACKEND == "roundtrip" && $GRID == "icon_grid"
- if: $BACKEND == "embedded" && $GRID == "icon_grid"
when: never
- when: on_success
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edopao Can you help me parse these lines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix includes two grids: GRID: [simple_grid, icon_grid]
The combination $BACKEND == "roundtrip" && $GRID == "icon_grid" is never run. All other combinations are run when the parent job (the build stage) is successful.
Therefore, the roundtrip backend was only executed on the simple_grid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would propose to check performance of icon_grid with embedded and if it looks good, remove the conditional.

Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • launch jenkins spack

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

  • launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

@nfarabullini nfarabullini marked this pull request as ready for review January 16, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants