Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility for zarr-python 3.x #9552

Merged
merged 93 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
c54a052
Remove zarr pin
TomAugspurger Sep 27, 2024
483eb7f
Define zarr_v3 helper
TomAugspurger Sep 23, 2024
40a746c
zarr-v3: filters / compressors -> codecs
TomAugspurger Sep 23, 2024
6c8d2bb
zarr-v3: update tests to avoid values equal to fillValue
TomAugspurger Sep 23, 2024
531b521
Various test fixes
TomAugspurger Sep 27, 2024
849df40
zarr_version fixes
TomAugspurger Sep 30, 2024
88bd64b
fixup! zarr-v3: filters / compressors -> codecs
TomAugspurger Sep 30, 2024
ef1549a
fixup! fixup! zarr-v3: filters / compressors -> codecs
TomAugspurger Sep 30, 2024
20c22bd
fixup
TomAugspurger Sep 30, 2024
6087e5e
path / key normalization in set_variables
TomAugspurger Sep 30, 2024
15fe55e
fixes
TomAugspurger Oct 1, 2024
8e06bc7
workaround nested consolidated metadata
TomAugspurger Oct 1, 2024
f22100e
Merge remote-tracking branch 'upstream/main' into fix/zarr-v3
TomAugspurger Oct 1, 2024
f8c427f
test: avoid fill_value
TomAugspurger Oct 1, 2024
594d36d
test: Adjust call counts
TomAugspurger Oct 2, 2024
046d37e
zarr-python 3.x Array.resize doesn't mutate
TomAugspurger Oct 2, 2024
6b0ca62
test compatibility
TomAugspurger Oct 2, 2024
d315583
skip ZipStore with_mode
TomAugspurger Oct 2, 2024
389cc82
test: more fill_value avoidance
TomAugspurger Oct 2, 2024
1fe409a
test: more fill_value avoidance
TomAugspurger Oct 2, 2024
7c29ea6
v3 compat for instrumented test
TomAugspurger Oct 2, 2024
efb66dd
Merge remote-tracking branch 'upstream/main' into fix/zarr-v3
TomAugspurger Oct 7, 2024
90c0ae6
Merge remote-tracking branch 'upstream/main' into fix/zarr-v3
TomAugspurger Oct 8, 2024
9b3c288
Handle zarr_version / zarr_format deprecation
TomAugspurger Oct 8, 2024
3717391
wip
rabernat Oct 9, 2024
8d16bb2
most Zarr tests passing
rabernat Oct 9, 2024
bd978b0
unskip tests
rabernat Oct 9, 2024
34c4c24
add custom Zarr _FillValue encoding / decoding
rabernat Oct 9, 2024
118e50e
Merge branch 'fix/zarr-v3' into ryan/fix/zarr-3
TomAugspurger Oct 9, 2024
e6e2066
Merge pull request #1 from rabernat/ryan/fix/zarr-3
TomAugspurger Oct 9, 2024
1d1d9cb
relax dtype comparison in test_roundtrip_empty_vlen_string_array
rabernat Oct 9, 2024
9089508
Merge remote-tracking branch 'tom/fix/zarr-v3' into ryan/fix/zarr-3
rabernat Oct 9, 2024
ea00308
fix test_explicitly_omit_fill_value_via_encoding_kwarg
rabernat Oct 9, 2024
a330e4b
fix test_append_string_length_mismatch_raises
rabernat Oct 9, 2024
bde42ee
fix test_check_encoding_is_consistent_after_append for v3
rabernat Oct 10, 2024
b15705d
skip roundtrip_endian for zarr v3
rabernat Oct 10, 2024
38f43b9
unskip datetimes and fix test_compressor_encoding
rabernat Oct 10, 2024
1cfc458
unskip tests
rabernat Oct 10, 2024
af1a0b8
add back dtype skip
rabernat Oct 10, 2024
d9d6fee
Merge branches 'ryan/fix/zarr-v3-2', 'ryan/fix/zarr-v3-3', 'ryan/fix/…
rabernat Oct 10, 2024
4c54371
Merge pull request #2 from rabernat/ryan/fix/zarr-v3-2
TomAugspurger Oct 10, 2024
1ce8878
Merge pull request #9 from rabernat/ryan/fix/zarr-v3-combined
TomAugspurger Oct 10, 2024
0c2e260
point upstream to v3 branch
TomAugspurger Oct 10, 2024
fc2738a
Merge branch 'main' into fix/zarr-v3
jhamman Oct 11, 2024
0e47c3f
Create temporary directory before using it
TomAugspurger Oct 11, 2024
5b39f42
Avoid zarr.storage.zip on v2
TomAugspurger Oct 11, 2024
7d9fc05
fixed close_store_on_close bug
TomAugspurger Oct 11, 2024
0fa94ee
Remove workaround, fixed upstream
TomAugspurger Oct 11, 2024
c2fd6f1
Restore original `w` mode.
TomAugspurger Oct 11, 2024
ac2ef29
workaround for store closing with mode=w
TomAugspurger Oct 11, 2024
c6be467
typing fixes
TomAugspurger Oct 11, 2024
5b5b77f
compat
TomAugspurger Oct 11, 2024
4f07eb7
Remove unnecessary pop
TomAugspurger Oct 11, 2024
5151bc2
fixed skip
TomAugspurger Oct 11, 2024
00c62d7
fixup types
TomAugspurger Oct 11, 2024
e0390a5
fixup types
TomAugspurger Oct 11, 2024
2e7ec07
[test-upstream]
TomAugspurger Oct 12, 2024
26081d4
Update install-upstream-wheels.sh
jhamman Oct 12, 2024
0350056
set use_consolidated to false when user provides consolidated=False
jhamman Oct 13, 2024
a38bff6
fix: import consolidated_metadata from package root
jhamman Oct 14, 2024
08f0594
fix: relax instrumented store checks for v3
jhamman Oct 14, 2024
0e81edf
Merge pull request #10 from jhamman/fix/zarr-v3-consolidated-false
TomAugspurger Oct 14, 2024
55d852d
Merge pull request #11 from jhamman/fix/cm-import
TomAugspurger Oct 14, 2024
3491137
Merge pull request #12 from jhamman/fix/instrumented-store-check
TomAugspurger Oct 14, 2024
5bf5f2a
Merge remote-tracking branch 'upstream/main' into fix/zarr-v3
TomAugspurger Oct 14, 2024
f2f9fff
Adjust 2.18.3 thresholds
TomAugspurger Oct 14, 2024
a84fa79
skip datatree zarr tests w/ zarr 3 for now
jhamman Oct 14, 2024
c280f24
Merge pull request #13 from jhamman/fix/skip-datatree-tests
TomAugspurger Oct 14, 2024
9fec1d6
fixed kvstore usage
TomAugspurger Oct 14, 2024
04c017e
typing fixes
TomAugspurger Oct 14, 2024
625591e
move zarr.codecs import
TomAugspurger Oct 14, 2024
3795b07
fixup ignores
TomAugspurger Oct 14, 2024
ea2cb57
storage options fix, skip
TomAugspurger Oct 14, 2024
4f617d2
fixed types
TomAugspurger Oct 14, 2024
d1e3c73
Update ci/install-upstream-wheels.sh
jhamman Oct 14, 2024
45d5a78
type fixes
jhamman Oct 14, 2024
f208c39
whats-new
jhamman Oct 14, 2024
968217c
Update xarray/tests/test_backends_datatree.py
jhamman Oct 14, 2024
45a37f6
fix type import
jhamman Oct 14, 2024
c15e856
Merge branch 'fix/zarr-v3' of github.com:TomAugspurger/xarray into fi…
jhamman Oct 14, 2024
c10bfc0
set mapper, chunk_mapper
TomAugspurger Oct 15, 2024
82e6a6d
Pass through zarr_format
TomAugspurger Oct 15, 2024
d752693
Fixup
TomAugspurger Oct 15, 2024
0fd4103
Merge remote-tracking branch 'upstream/main' into fix/zarr-v3
TomAugspurger Oct 15, 2024
c2a47a1
more cleanup
TomAugspurger Oct 15, 2024
26b2661
revert test changes
TomAugspurger Oct 15, 2024
1d73d36
Update xarray/backends/zarr.py
dcherian Oct 21, 2024
be79e88
cleanup
dcherian Oct 21, 2024
ff0f2c0
update docstring
dcherian Oct 21, 2024
5f37042
Merge branch 'main' into fix/zarr-v3
dcherian Oct 21, 2024
268e3eb
fix rtd
dcherian Oct 22, 2024
1abb2ba
Merge branch 'main' into fix/zarr-v3
dcherian Oct 22, 2024
7682bf4
tweak
dcherian Oct 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -845,8 +845,9 @@ For example:
.. ipython:: python

import zarr
from numcodecs.blosc import Blosc
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
compressor = Blosc(cname="zstd", clevel=3, shuffle=2)
ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})

.. note::
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ dev = [
"ruff",
"xarray[complete]",
]
io = ["netCDF4", "h5netcdf", "scipy", 'pydap; python_version<"3.10"', "zarr<3", "fsspec", "cftime", "pooch"]
io = ["netCDF4", "h5netcdf", "scipy", 'pydap; python_version<"3.10"', "zarr", "fsspec", "cftime", "pooch"]
parallel = ["dask[complete]"]
viz = ["matplotlib", "seaborn", "nc-time-axis"]

Expand Down Expand Up @@ -116,6 +116,7 @@ module = [
"nc_time_axis.*",
"netCDF4.*",
"netcdftime.*",
"numcodecs.*",
"opt_einsum.*",
"pint.*",
"pooch.*",
Expand Down
7 changes: 0 additions & 7 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1732,13 +1732,6 @@ def to_zarr(
# validate Dataset keys, DataArray names
_validate_dataset_names(dataset)

if zarr_version is None:
# default to 2 if store doesn't specify its version (e.g. a path)
zarr_version = int(getattr(store, "_store_version", 2))

if consolidated is None and zarr_version > 2:
consolidated = False

if mode == "r+":
already_consolidated = consolidated
consolidate_on_close = False
Expand Down
102 changes: 75 additions & 27 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
from __future__ import annotations

import functools
import json
import os
import warnings
from collections.abc import Callable, Iterable
from typing import TYPE_CHECKING, Any
from typing import TYPE_CHECKING, Any, Literal

import numpy as np
import packaging.version
import pandas as pd

from xarray import coding, conventions
Expand Down Expand Up @@ -40,8 +42,20 @@
from xarray.core.dataset import Dataset
from xarray.core.datatree import DataTree


@functools.lru_cache
def _zarr_v3() -> bool:
try:
import zarr
except ImportError:
return False
else:
return packaging.version.parse(zarr.__version__).major >= 3
dcherian marked this conversation as resolved.
Show resolved Hide resolved


# need some special secret attributes to tell us the dimensions
DIMENSION_KEY = "_ARRAY_DIMENSIONS"
ZarrFormat = Literal[2, 3]


def encode_zarr_attr_value(value):
Expand Down Expand Up @@ -75,8 +89,10 @@ def __init__(self, zarr_array):
self.shape = self._array.shape

# preserve vlen string object dtype (GH 7328)
if self._array.filters is not None and any(
[filt.codec_id == "vlen-utf8" for filt in self._array.filters]
if (
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
not _zarr_v3()
and self._array.filters is not None
and any([filt.codec_id == "vlen-utf8" for filt in self._array.filters])
):
dtype = coding.strings.create_vlen_dtype(str)
else:
Expand Down Expand Up @@ -317,6 +333,7 @@ def extract_zarr_variable_encoding(

safe_to_drop = {"source", "original_shape"}
valid_encodings = {
"codecs",
"chunks",
"compressor",
"filters",
Expand Down Expand Up @@ -614,9 +631,25 @@ def open_store_variable(self, name, zarr_array=None):
encoding = {
"chunks": zarr_array.chunks,
"preferred_chunks": dict(zip(dimensions, zarr_array.chunks, strict=True)),
"compressor": zarr_array.compressor,
"filters": zarr_array.filters,
}

if _zarr_v3() and zarr_array.metadata.zarr_format == 3:
encoding["codecs"] = [x.to_dict() for x in zarr_array.metadata.codecs]
elif _zarr_v3():
encoding.update(
{
"compressor": zarr_array.metadata.compressor,
"filters": zarr_array.metadata.filters,
}
)
else:
encoding.update(
{
"compressor": zarr_array.compressor,
"filters": zarr_array.filters,
}
)

# _FillValue needs to be in attributes, not encoding, so it will get
# picked up by decode_cf
if zarr_array.fill_value is not None:
Expand Down Expand Up @@ -786,7 +819,11 @@ def store(
variables_to_set, check_encoding_set, writer, unlimited_dims=unlimited_dims
)
if self._consolidate_on_close:
zarr.consolidate_metadata(self.zarr_group.store)
kwargs = {}
if _zarr_v3():
# https://github.com/zarr-developers/zarr-python/pull/2113#issuecomment-2386718323
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed at some point in the future? If so, it would be good to add a TODO

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look more closely later, but for now I think this will be required, following a deliberate change in zarr v3 consolidated metadata.

With v2 metadata, I think that consolidated happened at the store-level, and was all-or-nothing. If you have two Groups with Arrays, the consolidated metadata will be placed at the store root, and will contain everything:

# zarr v2

In [1]: import json, xarray as xr

In [2]: store = {}

In [3]: a = xr.tutorial.load_dataset("air_temperature")

In [4]: b = xr.tutorial.load_dataset("rasm")

In [5]: a.to_zarr(store=store, group="A")
/Users/tom/gh/zarr-developers/zarr-v2/.direnv/python-3.10/lib/python3.10/site-packages/xarray/core/dataset.py:2562: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs
  return to_zarr(  # type: ignore[call-overload,misc]
Out[5]: <xarray.backends.zarr.ZarrStore at 0x11113edc0>

In [6]: b.to_zarr(store=store, group="B")
Out[6]: <xarray.backends.zarr.ZarrStore at 0x10cab2440>

In [7]: list(json.loads(store['.zmetadata'])['metadata'])
Out[7]:  # contains nodes from both A and B
['.zgroup',
 'A/.zattrs',
 'A/.zgroup',
 'A/air/.zarray',
 'A/air/.zattrs',
 'A/lat/.zarray',
 'A/lat/.zattrs',
 'A/lon/.zarray',
 'A/lon/.zattrs',
 'A/time/.zarray',
 'A/time/.zattrs',
 'B/.zattrs',
 'B/.zgroup',
 'B/Tair/.zarray',
 'B/Tair/.zattrs',
 'B/time/.zarray',
 'B/time/.zattrs',
 'B/xc/.zarray',
 'B/xc/.zattrs',
 'B/yc/.zarray',
 'B/yc/.zattrs']

With v3, consolidated metadata is scoped to a Group, so we can provide the group we want to consolidated (the zarr-python API does support "consolidate everything in the store at the root", but I don't think we want that because you'd need to open it at the root when reading, and I think it's kinda where for ds.to_zarr(group="A") to be reading / writing stuff outside of the A prefix).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially it would make sense to have two versions of consolidated metadata:

  • Everything at a specific group/node level
  • Everything in a group and all of its subgroups (i.e., for DataTree)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. zarr-developers/zarr-specs#309 has some discussion on adding a depth field to the spec for consolidated metadata. That's currently implicitly depth=None, which is everything below a group. depth=0 or 1 would be just the immediate children. That's not standardized or implemented anywhere yet, but the current implementation is forwards compatible and it shouldn't be a ton of effort.

kwargs["path"] = self.zarr_group.name.lstrip("/")
zarr.consolidate_metadata(self.zarr_group.store, **kwargs)

def sync(self):
pass
Expand Down Expand Up @@ -850,9 +887,19 @@ def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=No
# - Existing variables already have their attrs included in the consolidated metadata file.
# - The size of dimensions can not be expanded, that would require a call using `append_dim`
# which is mutually exclusive with `region`
kwargs = {}
if _zarr_v3():
kwargs["store"] = self.zarr_group.store
else:
kwargs["store"] = self.zarr_group.chunk_store

# TODO: see if zarr should normalize these strings.
zarr_array = zarr.open(
store=self.zarr_group.chunk_store,
path=f"{self.zarr_group.name}/{name}",
**kwargs,
dcherian marked this conversation as resolved.
Show resolved Hide resolved
# path=f"{self.zarr_group.name}/{name}",
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
path="/".join([self.zarr_group.name.rstrip("/"), name]).lstrip(
"/"
),
write_empty_chunks=self._write_empty,
)
else:
Expand All @@ -868,7 +915,10 @@ def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=No

new_shape = list(zarr_array.shape)
new_shape[append_axis] += v.shape[append_axis]
zarr_array.resize(new_shape)
if _zarr_v3():
zarr_array = zarr_array.resize(new_shape)
else:
zarr_array.resize(new_shape)

zarr_shape = zarr_array.shape

Expand Down Expand Up @@ -913,6 +963,10 @@ def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=No
else:
encoding["write_empty_chunks"] = self._write_empty

if "codecs" in encoding:
pipeline = encoding.pop("codecs")
encoding["codecs"] = pipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I missing something or is this a circular pop replace?


zarr_array = self.zarr_group.create(
name,
shape=shape,
Expand Down Expand Up @@ -1396,40 +1450,34 @@ def _get_open_params(
if isinstance(store, os.PathLike):
store = os.fspath(store)

if zarr_version is None:
# default to 2 if store doesn't specify it's version (e.g. a path)
zarr_version = getattr(store, "_store_version", 2)

open_kwargs = dict(
# mode='a-' is a handcrafted xarray specialty
mode="a" if mode == "a-" else mode,
synchronizer=synchronizer,
path=group,
)
open_kwargs["storage_options"] = storage_options
if zarr_version > 2:
open_kwargs["zarr_version"] = zarr_version

if consolidated or consolidate_on_close:
raise ValueError(
"consolidated metadata has not been implemented for zarr "
f"version {zarr_version} yet. Set consolidated=False for "
f"zarr version {zarr_version}. See also "
"https://github.com/zarr-developers/zarr-specs/issues/136"
)

if consolidated is None:
consolidated = False
if _zarr_v3():
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
open_kwargs["zarr_format"] = zarr_version
else:
open_kwargs["zarr_version"] = zarr_version

if chunk_store is not None:
open_kwargs["chunk_store"] = chunk_store
if consolidated is None:
consolidated = False

if _zarr_v3():
missing_exc: type[Exception] = ValueError
else:
missing_exc = zarr.errors.GroupNotFoundError

if consolidated is None:
try:
zarr_group = zarr.open_consolidated(store, **open_kwargs)
except KeyError:
except (ValueError, KeyError):
dcherian marked this conversation as resolved.
Show resolved Hide resolved
# ValueError in zarr-python 3.x, KeyError in 2.x.
try:
zarr_group = zarr.open_group(store, **open_kwargs)
warnings.warn(
Expand All @@ -1447,7 +1495,7 @@ def _get_open_params(
RuntimeWarning,
stacklevel=stacklevel,
)
except zarr.errors.GroupNotFoundError as err:
except missing_exc as err:
raise FileNotFoundError(
f"No such file or directory: '{store}'"
) from err
Expand Down
Loading
Loading