Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: validate v3 dtypes when loading/creating v3 metadata #2209

Merged
merged 10 commits into from
Sep 23, 2024

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Sep 18, 2024

This PR validates data types in v3 metadata. The behavior implemented raises an error if a data type outside of the current v3 spec is discovered in either a zarr.json metadata document or in the Array constructor.

xref: #2200

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@@ -29,7 +29,7 @@ def __init__(
prototype: BufferPrototype,
) -> None:
shape_parsed = parse_shapelike(shape)
dtype_parsed = parse_dtype(dtype)
dtype_parsed = dtype # parsing is likely not needed here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dtype_parsed = dtype # parsing is likely not needed here
dtype_parsed = dtype

The input type is already np.dtype[Any] so I don't think we need to parse this.

@@ -157,6 +157,11 @@ def update_attributes(self, attributes: dict[str, JSON]) -> Self:
return replace(self, attributes=attributes)


def parse_dtype(data: npt.DTypeLike) -> np.dtype[Any]:
# todo: real validation
return np.dtype(data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now parse v2 and v3 dtypes differently.

dtype=npst.scalar_dtypes().filter(lambda x: x.kind != "m"),
dtype=npst.scalar_dtypes().filter(
lambda x: (x.kind not in ["m", "M"]) and (x.byteorder not in [">"])
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me how endianness works in v3. As far as I can tell, we are not handling this in a meaningful way today.

cc @dcherian and @d-v-b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. It's not being handled today

async def test_datetime_metadata(fill_value: int, precision: str) -> None:
# @pytest.mark.parametrize("fill_value", [-1, 0, 1, 2932897])
# @pytest.mark.parametrize("precision", ["ns", "D"])
# async def test_datetime_metadata(fill_value: int, precision: str) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was creating an invalid dtype (datetime). We can leave it in place and bring it back when there is support for this dtype in v3.

(0, "int16"),
(1e10, "uint64"),
(-999, "float32"),
(1e32, "float64"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a test for NaN fill values

src/zarr/core/metadata/v3.py Outdated Show resolved Hide resolved
@d-v-b
Copy link
Contributor

d-v-b commented Sep 23, 2024

this looks great, thanks!

@jhamman jhamman merged commit 2d3a36c into zarr-developers:v3 Sep 23, 2024
26 checks passed
dcherian added a commit to dcherian/zarr-python that referenced this pull request Sep 24, 2024
* v3:
  chore: update pre-commit hooks (zarr-developers#2222)
  fix: validate v3 dtypes when loading/creating v3 metadata (zarr-developers#2209)
  fix typo in store integration test (zarr-developers#2223)
  Basic Zarr-python 2.x compatibility changes (zarr-developers#2098)
  Make Group.arrays, groups compatible with v2 (zarr-developers#2213)
  Typing fixes to test_indexing (zarr-developers#2193)
  Default to RemoteStore for fsspec URIs (zarr-developers#2198)
  Make MemoryStore serialiazable (zarr-developers#2204)
  [v3] Implement Group methods for empty, full, ones, and zeros (zarr-developers#2210)
  implement `store.list_prefix` and `store._set_many` (zarr-developers#2064)
  Fixed codec for v2 data with no fill value (zarr-developers#2207)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants