Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append References #203

Merged
merged 41 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8be834f
Append References
mavaylon1 Jul 3, 2024
c54504b
Merge branch 'dev' into append_ref
mavaylon1 Jul 3, 2024
cfd2eba
ruff updates
mavaylon1 Jul 3, 2024
dad5e7d
Merge branch 'append_ref' of https://github.com/hdmf-dev/hdmf-zarr in…
mavaylon1 Jul 3, 2024
c17445a
ruff updates
mavaylon1 Jul 3, 2024
cafc657
ruff updates
mavaylon1 Jul 3, 2024
1fa504b
test
mavaylon1 Jul 3, 2024
d29251f
test
mavaylon1 Jul 3, 2024
dfde2f0
test
mavaylon1 Jul 3, 2024
d5c77a3
test
mavaylon1 Jul 8, 2024
73371b9
test
mavaylon1 Jul 8, 2024
6111b63
Update pyproject.toml
mavaylon1 Jul 8, 2024
7b15f6a
Update requirements-min.txt
mavaylon1 Jul 8, 2024
543225a
Update CHANGELOG.md
mavaylon1 Jul 8, 2024
fe70acf
ruff check
mavaylon1 Jul 8, 2024
732a323
Merge branch 'append_ref' of https://github.com/hdmf-dev/hdmf-zarr in…
mavaylon1 Jul 8, 2024
63e83e3
ruff check
mavaylon1 Jul 8, 2024
d5af631
clean up
mavaylon1 Jul 10, 2024
cc783fa
clear
mavaylon1 Jul 13, 2024
6868473
clear
mavaylon1 Jul 13, 2024
f5e6d7f
clear
mavaylon1 Jul 13, 2024
5373c7c
clear
mavaylon1 Jul 13, 2024
13caa37
Rename ZarrIO.__get_ref to ZarrIO._create_ref
rly Jul 13, 2024
ff6e435
fixed tests
mavaylon1 Jul 22, 2024
5e6a617
Merge branch 'dev' into append_ref
mavaylon1 Jul 22, 2024
c646c6d
clean up
mavaylon1 Jul 22, 2024
a6b5ac7
Merge branch 'dev' into append_ref
rly Jul 25, 2024
19470af
Add failing check to test
rly Jul 25, 2024
e8190ae
Fix
mavaylon1 Jul 26, 2024
2bcc4f6
Update CHANGELOG.md
rly Jul 26, 2024
10f0d06
Update src/hdmf_zarr/backend.py
rly Jul 26, 2024
4b2d32c
Update src/hdmf_zarr/backend.py
rly Jul 26, 2024
f716916
Update src/hdmf_zarr/backend.py
rly Jul 26, 2024
af72b53
Update src/hdmf_zarr/backend.py
rly Jul 26, 2024
7f7bda9
Update src/hdmf_zarr/backend.py
rly Jul 26, 2024
0909867
Update src/hdmf_zarr/backend.py
rly Jul 26, 2024
1d073a3
root
mavaylon1 Jul 28, 2024
8fc669f
Merge branch 'append_ref' of https://github.com/hdmf-dev/hdmf-zarr in…
mavaylon1 Jul 28, 2024
f32632c
notes
mavaylon1 Jul 28, 2024
798917f
loop
mavaylon1 Jul 28, 2024
7c5c0af
Update zarr_utils.py
mavaylon1 Jul 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## 0.9.0 (Upcoming)
### Enhancements
* Added support for appending a dataset of references. @mavaylon1 [#203]([https://github.com/hdmf-dev/hdmf-zarr/pull/172](https://github.com/hdmf-dev/hdmf-zarr/pull/203))
rly marked this conversation as resolved.
Show resolved Hide resolved
* NWBZarrIO load_namespaces=True by default. @mavaylon1 [#204](https://github.com/hdmf-dev/hdmf-zarr/pull/204)
* Added test for opening file with consolidated metadata from DANDI. @mavaylon1 [#206](https://github.com/hdmf-dev/hdmf-zarr/pull/206)

Expand Down
2 changes: 1 addition & 1 deletion docs/source/integrating_data_stores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Updating ZarrIO
:py:meth:`~hdmf_zarr.backend.ZarrIO.get_builder_exists_on_disk`
method may need to be updated to ensure
references are opened correctly on read for files stored with your new store. The
:py:meth:`~hdmf_zarr.backend.ZarrIO.__get_ref` function may also need to be updated, in
:py:meth:`~hdmf_zarr.backend.ZarrIO._create_ref` function may also need to be updated, in
particular in case the links to your store also modify the storage schema for links
(e.g., if you need to store additional metadata in order to resolve links to your store).

Expand Down
6 changes: 3 additions & 3 deletions docs/source/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ For example:

In :py:class:`~hdmf_zarr.backend.ZarrIO`, links are written by the
:py:meth:`~hdmf_zarr.backend.ZarrIO.__write_link__` function, which also uses the helper functions
i) :py:meth:`~hdmf_zarr.backend.ZarrIO.__get_ref` to construct py:meth:`~hdmf_zarr.utils.ZarrRefernce`
i) :py:meth:`~hdmf_zarr.backend.ZarrIO._create_ref` to construct py:meth:`~hdmf_zarr.utils.ZarrRefernce`
and ii) :py:meth:`~hdmf_zarr.backend.ZarrIO.__add_link__` to add a link to the Zarr file.
:py:meth:`~hdmf_zarr.backend.ZarrIO.__read_links` then parses links and also uses the
:py:meth:`~hdmf_zarr.backend.ZarrIO.__resolve_ref` helper function to resolve the paths stored in links.
Expand Down Expand Up @@ -245,7 +245,7 @@ by their location (i.e., index) in the dataset. As such, object references only
the relative path to the target Zarr file, and the ``path`` identifying the object within the source
Zarr file. The individual object references are defined in the
:py:class:`~hdmf_zarr.backend.ZarrIO` as py:class:`~hdmf_zarr.utils.ZarrReference` object created via
the :py:meth:`~hdmf_zarr.backend.ZarrIO.__get_ref` helper function.
the :py:meth:`~hdmf_zarr.backend.ZarrIO._create_ref` helper function.

By default, :py:class:`~hdmf_zarr.backend.ZarrIO` uses the ``numcodecs.pickles.Pickle`` codec to
encode object references defined as py:class:`~hdmf_zarr.utils.ZarrReference` dicts in datasets.
Expand Down Expand Up @@ -297,7 +297,7 @@ store the definition of the ``region`` that is being referenced, e.g., a slice o
To implement region references will require updating:
1) py:class:`~hdmf_zarr.utils.ZarrReference` to add a ``region`` key to support storing
the selection for the region,
2) :py:meth:`~hdmf_zarr.backend.ZarrIO.__get_ref` to support passing in the region definition to
2) :py:meth:`~hdmf_zarr.backend.ZarrIO._create_ref` to support passing in the region definition to
be added to the py:class:`~hdmf_zarr.utils.ZarrReference`,
3) :py:meth:`~hdmf_zarr.backend.ZarrIO.write_dataset` already partially implements the required
logic for creating region references by checking for :py:class:`hdmf.build.RegionBuilder` inputs
Expand Down
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ classifiers = [
"Topic :: Scientific/Engineering :: Medical Science Apps."
]
dependencies = [
'hdmf>=3.9.0',
'hdmf>=3.14.2',
'zarr>=2.11.0, <3.0', # pin below 3.0 until HDMF-zarr supports zarr 3.0
'numpy>=1.24, <2.0', # pin below 2.0 until HDMF supports numpy 2.0
'numcodecs>=0.9.1',
Expand Down Expand Up @@ -109,7 +109,7 @@ force-exclude = '''
'''

[tool.ruff]
select = ["E", "F", "T100", "T201", "T203"]
lint.select = ["E", "F", "T100", "T201", "T203"]
exclude = [
".git",
".tox",
Expand All @@ -128,5 +128,5 @@ line-length = 120
"src/*/__init__.py" = ["F401"]
"test_gallery.py" = ["T201"]

[tool.ruff.mccabe]
[tool.ruff.lint.mccabe]
max-complexity = 17
2 changes: 1 addition & 1 deletion requirements-min.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
hdmf==3.12.0
hdmf==3.14.2
zarr==2.11.0
numcodecs==0.9.1
pynwb==2.5.0
Expand Down
35 changes: 21 additions & 14 deletions src/hdmf_zarr/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@ def write_builder(self, **kwargs):
f_builder, link_data, exhaust_dci, export_source, consolidate_metadata = getargs(
'builder', 'link_data', 'exhaust_dci', 'export_source', 'consolidate_metadata', kwargs
)
# breakpoint()
rly marked this conversation as resolved.
Show resolved Hide resolved
for name, gbldr in f_builder.groups.items():
self.write_group(
parent=self.__file,
Expand Down Expand Up @@ -519,6 +520,7 @@ def write_group(self, **kwargs):
parent, builder, link_data, exhaust_dci, export_source = getargs(
'parent', 'builder', 'link_data', 'exhaust_dci', 'export_source', kwargs
)
# breakpoint()
rly marked this conversation as resolved.
Show resolved Hide resolved

if self.get_written(builder):
group = parent[builder.name]
Expand Down Expand Up @@ -595,13 +597,13 @@ def write_attributes(self, **kwargs):
# TODO: Region References are not yet supported
# if isinstance(value, RegionBuilder):
# type_str = 'region'
# refs = self.__get_ref(value.builder)
# refs = self._create_ref(value.builder)
if isinstance(value, (ReferenceBuilder, Container, Builder)):
type_str = 'object'
if isinstance(value, Builder):
refs = self.__get_ref(value, export_source)
refs = self._create_ref(value, export_source)
else:
refs = self.__get_ref(value.builder, export_source)
refs = self._create_ref(value.builder, export_source)
tmp = {'zarr_dtype': type_str, 'value': refs}
obj.attrs[key] = tmp
# Case 3: Scalar attributes
Expand Down Expand Up @@ -629,6 +631,7 @@ def __get_path(self, builder):
determines the path by constructing it iteratively from the parents of the
builder.
"""
# breakpoint()
rly marked this conversation as resolved.
Show resolved Hide resolved
if builder.location is not None:
path = os.path.normpath(os.path.join(builder.location, builder.name)).replace("\\", "/")
else:
Expand Down Expand Up @@ -734,6 +737,7 @@ def resolve_ref(self, zarr_ref):
target_zarr_obj = self.__open_file_consolidated(store=source_file,
mode='r',
storage_options=self.__storage_options)
# breakpoint()
rly marked this conversation as resolved.
Show resolved Hide resolved
if object_path is not None:
try:
target_zarr_obj = target_zarr_obj[object_path]
Expand All @@ -742,14 +746,15 @@ def resolve_ref(self, zarr_ref):
# Return the create path
return target_name, target_zarr_obj

def __get_ref(self, ref_object, export_source=None):
def _create_ref(self, ref_object, export_source=None):
"""
Create a ZarrReference object that points to the given container

:param ref_object: the object to be referenced
:type ref_object: Builder, Container, ReferenceBuilder
:returns: ZarrReference object
"""
# breakpoint()
rly marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(ref_object, RegionBuilder): # or region is not None: TODO: Add to support regions
raise NotImplementedError("Region references are currently not supported by ZarrIO")
if isinstance(ref_object, Builder):
Expand All @@ -761,7 +766,9 @@ def __get_ref(self, ref_object, export_source=None):
builder = ref_object.builder
else:
builder = self.manager.build(ref_object)

path = self.__get_path(builder)
# breakpoint()
rly marked this conversation as resolved.
Show resolved Hide resolved
# TODO Add to get region for region references.
# Also add {'name': 'region', 'type': (slice, list, tuple),
# 'doc': 'the region reference indexing object', 'default': None},
Expand Down Expand Up @@ -838,7 +845,7 @@ def write_link(self, **kwargs):
name = builder.name
target_builder = builder.builder
# Get the reference
zarr_ref = self.__get_ref(target_builder)
zarr_ref = self._create_ref(target_builder)
# EXPORT WITH LINKS: Fix link source
# if the target and source are both the same, then we need to ALWAYS use ourselves as a source
# When exporting from one source to another, the LinkBuilders.source are not updated, i.e,. the
Expand Down Expand Up @@ -982,7 +989,7 @@ def write_dataset(self, **kwargs): # noqa: C901
elif isinstance(data, HDMFDataset):
# If we have a dataset of containers we need to make the references to the containers
if len(data) > 0 and isinstance(data[0], Container):
ref_data = [self.__get_ref(data[i], export_source=export_source) for i in range(len(data))]
ref_data = [self._create_ref(data[i], export_source=export_source) for i in range(len(data))]
shape = (len(data), )
type_str = 'object'
dset = parent.require_dataset(name,
Expand Down Expand Up @@ -1015,7 +1022,7 @@ def write_dataset(self, **kwargs): # noqa: C901
for i, dts in enumerate(options['dtype']):
if self.__is_ref(dts['dtype']):
refs.append(i)
ref_tmp = self.__get_ref(data[0][i], export_source=export_source)
ref_tmp = self._create_ref(data[0][i], export_source=export_source)
if isinstance(ref_tmp, ZarrReference):
dts_str = 'object'
else:
Expand All @@ -1035,7 +1042,7 @@ def write_dataset(self, **kwargs): # noqa: C901
for j, item in enumerate(data):
new_item = list(item)
for i in refs:
new_item[i] = self.__get_ref(item[i], export_source=export_source)
new_item[i] = self._create_ref(item[i], export_source=export_source)
new_items.append(tuple(new_item))

# Create dtype for storage, replacing values to match hdmf's hdf5 behavior
Expand Down Expand Up @@ -1080,20 +1087,20 @@ def write_dataset(self, **kwargs): # noqa: C901
# if isinstance(data, RegionBuilder):
# shape = (1,)
# type_str = 'region'
# refs = self.__get_ref(data.builder, data.region)
# refs = self._create_ref(data.builder, data.region)
if isinstance(data, ReferenceBuilder):
shape = (1,)
type_str = 'object'
refs = self.__get_ref(data.builder, export_source=export_source)
refs = self._create_ref(data.builder, export_source=export_source)
# TODO: Region References are not yet supported
# elif options['dtype'] == 'region':
# shape = (len(data), )
# type_str = 'region'
# refs = [self.__get_ref(item.builder, item.region) for item in data]
# refs = [self._create_ref(item.builder, item.region) for item in data]
else:
shape = (len(data), )
type_str = 'object'
refs = [self.__get_ref(item, export_source=export_source) for item in data]
refs = [self._create_ref(item, export_source=export_source) for item in data]

dset = parent.require_dataset(name,
shape=shape,
Expand Down Expand Up @@ -1283,7 +1290,7 @@ def __list_fill__(self, parent, name, data, options=None): # noqa: C901
dset.attrs['zarr_dtype'] = type_str

# Write the data to file
if dtype == object:
if dtype == object: # noqa: E721
for c in np.ndindex(data_shape):
o = data
for i in c:
Expand Down Expand Up @@ -1317,7 +1324,7 @@ def __scalar_fill__(self, parent, name, data, options=None):
except Exception as exc:
msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
raise Exception(msg) from exc
if dtype == object:
if dtype == object: # noqa: E721
io_settings['object_codec'] = self.__codec_cls()

dset = parent.require_dataset(name, shape=(1, ), dtype=dtype, **io_settings)
Expand Down
11 changes: 11 additions & 0 deletions src/hdmf_zarr/zarr_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from zarr import Array as ZarrArray

from hdmf.build import DatasetBuilder
from hdmf.data_utils import append_data
from hdmf.array import Array
from hdmf.query import HDMFDataset, ReferenceResolver, ContainerResolver, BuilderResolver
from hdmf.utils import docval, popargs, get_docval
Expand Down Expand Up @@ -73,6 +74,16 @@ def __iter__(self):
def __next__(self):
return self._get_ref(super().__next__())

def append(self, arg):
# Building the parent first will build the new child.
# This correctly sets the parent of the child builder, which is needed to create the reference.
self.io.manager.build(arg.parent)
builder = self.io.manager.build(arg)
mavaylon1 marked this conversation as resolved.
Show resolved Hide resolved

# Create ZarrReference
ref = self.io._create_ref(builder)
mavaylon1 marked this conversation as resolved.
Show resolved Hide resolved
append_data(self.dataset, ref)


class BuilderResolverMixin(BuilderResolver): # refactor to backend/utils.py
"""
Expand Down
51 changes: 51 additions & 0 deletions tests/unit/test_zarrio.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,12 @@
from zarr.storage import (DirectoryStore,
TempStore,
NestedDirectoryStore)
from tests.unit.utils import (Baz, BazData, BazBucket, get_baz_buildmanager)
import zarr
from hdmf_zarr.backend import ZarrIO
import os
import shutil
import warnings


CUR_DIR = os.path.dirname(os.path.realpath(__file__))
Expand Down Expand Up @@ -181,3 +184,51 @@ def test_force_open_without_consolidated_fails(self):
read_io._ZarrIO__open_file_consolidated(store=self.store, mode='r')
except ValueError as e:
self.fail("ZarrIO.__open_file_consolidated raised an unexpected ValueError: {}".format(e))


class TestDatasetofReferences(ZarrStoreTestCase):
def setUp(self):
self.store_path = "test_io.zarr"
self.store = DirectoryStore(self.store_path)

def tearDown(self):
"""
Remove all files and folders defined by self.store_path
"""
paths = self.store_path if isinstance(self.store_path, list) else [self.store_path, ]
for path in paths:
if os.path.exists(path):
if os.path.isdir(path):
shutil.rmtree(path)
elif os.path.isfile(path):
os.remove(path)
else:
warnings.warn("Could not remove: %s" % path)

def test_append_references(self):
# Setup a file container with references
num_bazs = 10
bazs = [] # set up dataset of references
for i in range(num_bazs):
bazs.append(Baz(name='baz%d' % i))
baz_data = BazData(name='baz_data', data=bazs)
container = BazBucket(bazs=bazs, baz_data=baz_data)
manager = get_baz_buildmanager()

with ZarrIO(self.store, manager=manager, mode='w') as writer:
writer.write(container=container)

with ZarrIO(self.store, manager=manager, mode='a') as append_io:
read_container = append_io.read()
new_baz = Baz(name='new')
read_container.add_baz(new_baz)

DoR = read_container.baz_data.data
DoR.append(new_baz)

append_io.write(read_container)

with ZarrIO(self.store, manager=manager, mode='r') as append_io:
read_container = append_io.read()
self.assertEqual(len(read_container.baz_data.data), 11)
self.assertIs(read_container.baz_data.data[10], read_container.bazs["new"])
Loading