Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr-v3 Consolidated Metadata #2113

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
73d53d7
Fixed MemoryStore.list_dir
TomAugspurger Aug 25, 2024
90940a0
fixup s3
TomAugspurger Aug 25, 2024
8ee89f4
recursive Group.members
TomAugspurger Aug 25, 2024
65a8bd4
Zarr-v3 Consolidated Metadata
TomAugspurger Aug 23, 2024
2515ca3
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 3, 2024
cdaf81f
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 6, 2024
5a86789
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 6, 2024
a839f16
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 10, 2024
5a31390
fixup
TomAugspurger Sep 10, 2024
79bf235
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 11, 2024
fc901eb
read zarr-v2 consolidated metadata
TomAugspurger Sep 11, 2024
3a3eb9d
check writablem
TomAugspurger Sep 12, 2024
78af362
Handle non-root paths
TomAugspurger Sep 12, 2024
750668c
Some error handling
TomAugspurger Sep 12, 2024
63697ab
cleanup
TomAugspurger Sep 12, 2024
5d79274
refactor open
TomAugspurger Sep 12, 2024
0c67972
remove dupe file
TomAugspurger Sep 12, 2024
657ad1e
v2 getitem
TomAugspurger Sep 12, 2024
511ff76
fixup
TomAugspurger Sep 12, 2024
b360eb4
Optimzied members
TomAugspurger Sep 12, 2024
abcdbe6
Impl flatten
TomAugspurger Sep 12, 2024
b9bcfe8
Fixups
TomAugspurger Sep 13, 2024
3575cda
doc
TomAugspurger Sep 13, 2024
7b6bd17
nest the tests
TomAugspurger Sep 13, 2024
500a91e
fixup
TomAugspurger Sep 13, 2024
22d501e
Fixups
TomAugspurger Sep 13, 2024
762cf96
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 13, 2024
d6c6cc7
fixup
TomAugspurger Sep 13, 2024
6755fbc
fixup
TomAugspurger Sep 13, 2024
e406f86
fixup
TomAugspurger Sep 13, 2024
07248ea
fixup
TomAugspurger Sep 13, 2024
bdf15ad
fixup
TomAugspurger Sep 13, 2024
18eb172
consistent open_consolidated handling
TomAugspurger Sep 16, 2024
c11f1ad
fixup
TomAugspurger Sep 16, 2024
f6397f4
make clear that flat_to_nested mutates
TomAugspurger Sep 16, 2024
f55aa37
fixujp
TomAugspurger Sep 16, 2024
123dc60
fixup
TomAugspurger Sep 16, 2024
34c7720
fixup
TomAugspurger Sep 17, 2024
4db042b
Fixup
TomAugspurger Sep 17, 2024
8febba3
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 19, 2024
d730350
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 20, 2024
a1f1ebb
fixup
TomAugspurger Sep 20, 2024
35a3832
fixup
TomAugspurger Sep 20, 2024
c1837fd
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 20, 2024
d03f4bd
fixup
TomAugspurger Sep 20, 2024
cddd01f
fixup
TomAugspurger Sep 20, 2024
9303cd0
added docs
TomAugspurger Sep 20, 2024
87b65f1
fixup
TomAugspurger Sep 20, 2024
ee5d130
Ensure empty dict
TomAugspurger Sep 23, 2024
af9788f
fixed name
TomAugspurger Sep 23, 2024
5a08466
fixup nested
TomAugspurger Sep 23, 2024
d236e53
removed dupe tests
TomAugspurger Sep 23, 2024
2824de6
fixup
TomAugspurger Sep 23, 2024
08a7682
doc fix
TomAugspurger Sep 23, 2024
b8b5f51
fixups
TomAugspurger Sep 24, 2024
ba4fb47
fixup
TomAugspurger Sep 24, 2024
10d062f
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 24, 2024
e6142d8
fixup
TomAugspurger Sep 24, 2024
8ad3738
v2 writer
TomAugspurger Sep 24, 2024
fc94933
fixup
TomAugspurger Sep 24, 2024
79246dd
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 25, 2024
a62240b
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 28, 2024
4bfad1b
fixup
TomAugspurger Sep 28, 2024
ae02bb5
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Sep 30, 2024
3265abd
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Oct 1, 2024
8728440
path fix
TomAugspurger Oct 1, 2024
20c97a4
Fixed v2 use_consolidated=False
TomAugspurger Oct 1, 2024
f7e5b3f
fixupg
TomAugspurger Oct 1, 2024
c31f8a1
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Oct 7, 2024
483681b
Special case object dtype
TomAugspurger Oct 9, 2024
7e76e9e
fixup
TomAugspurger Oct 9, 2024
19b9271
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Oct 9, 2024
418bc6b
Merge branch 'tom/fix/dtype-str-special-case' into user/tom/feature/c…
TomAugspurger Oct 9, 2024
97fa2a0
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Oct 9, 2024
6fab362
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Oct 10, 2024
cbffcbb
docs
TomAugspurger Oct 10, 2024
56d2704
pr review
TomAugspurger Oct 10, 2024
8ade87d
must_understand
TomAugspurger Oct 10, 2024
b5fb721
Updated from_dict checking
TomAugspurger Oct 10, 2024
d17f955
cleanup
TomAugspurger Oct 10, 2024
1d17140
cleanup
TomAugspurger Oct 10, 2024
2b2e3da
Fixed fill_value
TomAugspurger Oct 10, 2024
96b274c
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger Oct 10, 2024
c9229d1
fixup
TomAugspurger Oct 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 52 additions & 7 deletions src/zarr/api/asynchronous.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,24 @@
from __future__ import annotations

import asyncio
import dataclasses
import warnings
from typing import TYPE_CHECKING, Any, Literal, Union, cast

import numpy as np
import numpy.typing as npt

from zarr.core.array import Array, AsyncArray
from zarr.core.common import JSON, AccessModeLiteral, ChunkCoords, MemoryOrder, ZarrFormat
from zarr.core.group import AsyncGroup
from zarr.core.buffer import NDArrayLike
from zarr.core.chunk_key_encodings import ChunkKeyEncoding
from zarr.core.common import (
JSON,
AccessModeLiteral,
ChunkCoords,
MemoryOrder,
ZarrFormat,
)
from zarr.core.group import AsyncGroup, ConsolidatedMetadata
from zarr.core.metadata import ArrayV2Metadata, ArrayV3Metadata
from zarr.store import (
StoreLike,
Expand Down Expand Up @@ -129,8 +138,38 @@ def _default_zarr_version() -> ZarrFormat:
return 3


async def consolidate_metadata(*args: Any, **kwargs: Any) -> AsyncGroup:
raise NotImplementedError
async def consolidate_metadata(store: StoreLike) -> AsyncGroup:
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
"""
Consolidate the metadata of all nodes in a hierarchy.

Upon completion, the metadata of the root node in the Zarr hierarchy will be
updated to include all the metadata of child nodes.

Parameters
----------
store: StoreLike
The store-like object whose metadata you wish to consolidate.

TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
Returns
-------
group: AsyncGroup
The group, with the ``consolidated_metadata`` field set to include
the metadata of each child node.
"""
group = await AsyncGroup.open(store)
members = dict([x async for x in group.members(max_depth=None)])
members_metadata = {}

members_metadata = {k: v.metadata for k, v in members.items()}

consolidated_metadata = ConsolidatedMetadata(metadata=members_metadata)
metadata = dataclasses.replace(group.metadata, consolidated_metadata=consolidated_metadata)
group = dataclasses.replace(
group,
metadata=metadata,
)
await group._save_metadata()
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
return group


async def copy(*args: Any, **kwargs: Any) -> tuple[int, int, int]:
Expand Down Expand Up @@ -232,7 +271,8 @@ async def open(


async def open_consolidated(*args: Any, **kwargs: Any) -> AsyncGroup:
raise NotImplementedError
kwargs.setdefault("open_consolidated", True)
return await open_group(*args, **kwargs)


async def save(
Expand Down Expand Up @@ -492,6 +532,7 @@ async def open_group(
zarr_format: ZarrFormat | None = None,
meta_array: Any | None = None, # not used
attributes: dict[str, JSON] | None = None,
open_consolidated: bool = False,
) -> AsyncGroup:
"""Open a group using file-mode-like semantics.

Expand Down Expand Up @@ -551,7 +592,9 @@ async def open_group(
attributes = {}

try:
return await AsyncGroup.open(store_path, zarr_format=zarr_format)
return await AsyncGroup.open(
store_path, zarr_format=zarr_format, open_consolidated=open_consolidated
)
except (KeyError, FileNotFoundError):
return await AsyncGroup.create(
store_path, zarr_format=zarr_format, exists_ok=True, attributes=attributes
Expand Down Expand Up @@ -706,7 +749,9 @@ async def create(
)
else:
warnings.warn(
"dimension_separator is not yet implemented", RuntimeWarning, stacklevel=2
"dimension_separator is not yet implemented",
RuntimeWarning,
stacklevel=2,
)
if write_empty_chunks:
warnings.warn("write_empty_chunks is not yet implemented", RuntimeWarning, stacklevel=2)
Expand Down
8 changes: 7 additions & 1 deletion src/zarr/core/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,9 +386,15 @@ async def open(
else:
# V3 arrays are comprised of a zarr.json object
assert zarr_json_bytes is not None
zarr_metadata = json.loads(zarr_json_bytes.to_bytes())
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
if zarr_metadata.get("node_type") != "array":
# This KeyError is load bearing for `open`. That currently tries
# to open the node as an `array` and then falls back to opening
# as a group.
raise KeyError
return cls(
store_path=store_path,
metadata=ArrayV3Metadata.from_dict(json.loads(zarr_json_bytes.to_bytes())),
metadata=ArrayV3Metadata.from_dict(zarr_metadata),
)

@property
Expand Down
31 changes: 31 additions & 0 deletions src/zarr/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
overload,
)

import numcodecs

if TYPE_CHECKING:
from collections.abc import Awaitable, Callable, Iterator

Expand All @@ -26,6 +28,7 @@
ZARRAY_JSON = ".zarray"
ZGROUP_JSON = ".zgroup"
ZATTRS_JSON = ".zattrs"
ZMETADATA_v2_JSON = ".zmetadata"

BytesLike = bytes | bytearray | memoryview
ShapeLike = tuple[int, ...] | int
Expand Down Expand Up @@ -168,3 +171,31 @@ def parse_order(data: Any) -> Literal["C", "F"]:
if data in ("C", "F"):
return cast(Literal["C", "F"], data)
raise ValueError(f"Expected one of ('C', 'F'), got {data} instead.")


def _json_convert(o: Any) -> Any:
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(o, np.dtype):
return str(o)
if np.isscalar(o):
# convert numpy scalar to python type, and pass
# python types through
if hasattr(o, "dtype") and o.dtype.kind == "M" and hasattr(o, "view"):
# https://github.com/zarr-developers/zarr-python/issues/2119
# `.item()` on a datetime type might or might not return an
# integer, depending on the value.
# Explicitly cast to an int first, and then grab .item()
out = o.view("i8").item()
else:
out = getattr(o, "item", lambda: o)()
if isinstance(out, complex):
# python complex types are not JSON serializable, so we use the
# serialization defined in the zarr v3 spec
return [out.real, out.imag]
return out
if isinstance(o, Enum):
return o.name
# this serializes numcodecs compressors
# todo: implement to_dict for codecs
elif isinstance(o, numcodecs.abc.Codec):
config: dict[str, Any] = o.get_config()
return config
Loading