Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Compatibility with Zarr 3 #1235

Open
QuLogic opened this issue Jan 20, 2025 · 1 comment
Open

[Feature]: Compatibility with Zarr 3 #1235

QuLogic opened this issue Jan 20, 2025 · 1 comment
Assignees
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of users
Milestone

Comments

@QuLogic
Copy link

QuLogic commented Jan 20, 2025

What would you like to see added to HDMF?

I've run a test of Zarr 3 compatibility across packages, and hdmf 3.14.6 currently fails (and 3.14.6 passes with Zarr 2.18.4).

Here are the test failures
=================================== FAILURES ===================================
_______________ TestWriteHDF5withZarrInput.test_roundtrip_basic ________________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data'
data = array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, ...25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
>           dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1488: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_roundtrip_basic>

    def test_roundtrip_basic(self):
        # Setup all the data we need
        zarr.save(self.zarr_path, np.arange(50).reshape(5, 10))
        zarr_data = zarr.open(self.zarr_path, 'r')
        foo1 = Foo(name='foo1',
                   my_data=zarr_data,
                   attr1="I am foo1",
                   attr2=17,
                   attr3=3.14)
        foobucket = FooBucket('bucket1', [foo1])
        foofile = FooFile(buckets=[foobucket])
    
        with HDF5IO(self.path, manager=self.manager, mode='w') as io:
>           io.write(foofile)

tests/unit/test_io_hdf5_h5tools.py:3630: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:396: in write
    super().write(**kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/io.py:99: in write
    self.write_builder(f_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:843: in write_builder
    self.write_group(self.__file, gbldr, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1030: in write_group
    self.write_dataset(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1335: in write_dataset
    dset = self.__list_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data'
data = array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, ...25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
            dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create dataset %s in %s with shape %s, dtype %s, and iosettings %s. %s" % \
                  (name, parent.name, str(data_shape), str(dtype), str(io_settings), str(exc))
>           raise Exception(msg) from exc
E           Exception: Could not create dataset my_data in /buckets/bucket1/foo_holder/foo1 with shape (5, 10), dtype <class 'numpy.ndarray'>, and iosettings {}. Object dtype dtype('O') has no native HDF5 equivalent

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1492: Exception
___________ TestWriteHDF5withZarrInput.test_roundtrip_empty_dataset ____________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data', data = array([], dtype=int64)
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
>           dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1488: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_roundtrip_empty_dataset>

    def test_roundtrip_empty_dataset(self):
        zarr.save(self.zarr_path, np.asarray([]).astype('int64'))
        zarr_data = zarr.open(self.zarr_path, 'r')
        foo1 = Foo('foo1', zarr_data, "I am foo1", 17, 3.14)
        foobucket = FooBucket('bucket1', [foo1])
        foofile = FooFile(buckets=[foobucket])
    
        with HDF5IO(self.path, manager=self.manager, mode='w') as io:
>           io.write(foofile)

tests/unit/test_io_hdf5_h5tools.py:3645: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:396: in write
    super().write(**kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/io.py:99: in write
    self.write_builder(f_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:843: in write_builder
    self.write_group(self.__file, gbldr, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1025: in write_group
    self.write_group(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1030: in write_group
    self.write_dataset(group, sub_builder, **kwargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1335: in write_dataset
    dset = self.__list_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>, parent = <Closed HDF5 group>
name = 'my_data', data = array([], dtype=int64)
options = {'dtype': <class 'numpy.ndarray'>, 'io_settings': {}}

    @classmethod
    def __list_fill__(cls, parent, name, data, options=None):
        # define the io settings and data type if necessary
        io_settings = {}
        dtype = None
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)
    
        # Create the dataset
        try:
            dset = parent.create_dataset(name, shape=data_shape, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create dataset %s in %s with shape %s, dtype %s, and iosettings %s. %s" % \
                  (name, parent.name, str(data_shape), str(dtype), str(io_settings), str(exc))
>           raise Exception(msg) from exc
E           Exception: Could not create dataset my_data in /buckets/bucket1/foo_holder/foo1 with shape (0,), dtype <class 'numpy.ndarray'>, and iosettings {}. Object dtype dtype('O') has no native HDF5 equivalent

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1492: Exception
_______ TestWriteHDF5withZarrInput.test_write_zarr_dataset_compress_gzip _______

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_dataset_compress_gzip>

    def test_write_zarr_dataset_compress_gzip(self):
        base_data = np.arange(50).reshape(5, 10).astype('float32')
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
>       a = H5DataIO(zarr_data,
                     compression='gzip',
                     compression_opts=5,
                     shuffle=True,
                     fletcher32=True)

tests/unit/test_io_hdf5_h5tools.py:3694: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:667: in func_call
    pargs = _check_args(args, kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<hdmf.backends.hdf5.h5_utils.H5DataIO object at 0x7f271755ef50>, <Array file:///tmp/tmpuf5mlz5w shape=(5, 10) dtype=float32>)
kwargs = {'compression': 'gzip', 'compression_opts': 5, 'fletcher32': True, 'shuffle': True}

    def _check_args(args, kwargs):
        """Parse and check arguments to decorated function. Raise warnings and errors as appropriate."""
        # this function was separated from func_call() in order to make stepping through lines of code using pdb
        # easier
    
        parsed = __parse_args(
            loc_val,
            args[1:] if is_method else args,
            kwargs,
            enforce_type=enforce_type,
            enforce_shape=enforce_shape,
            allow_extra=allow_extra,
            allow_positional=allow_positional
        )
    
        parse_warnings = parsed.get('future_warnings')
        if parse_warnings:
            msg = '%s: %s' % (func.__qualname__, ', '.join(parse_warnings))
            warnings.warn(msg, category=FutureWarning, stacklevel=3)
    
        for error_type, ExceptionType in (('type_errors', TypeError),
                                          ('value_errors', ValueError),
                                          ('syntax_errors', SyntaxError)):
            parse_err = parsed.get(error_type)
            if parse_err:
                msg = '%s: %s' % (func.__qualname__, ', '.join(parse_err))
>               raise ExceptionType(msg)
E               TypeError: H5DataIO.__init__: incorrect type for 'data' (got 'Array', expected 'ndarray, list, tuple, Dataset or Iterable')

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:660: TypeError
__________ TestWriteHDF5withZarrInput.test_write_zarr_float32_dataset __________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmpb7gw6onl" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpep7hiok8 shape=(5, 10) dtype=float32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
>           dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_float32_dataset>

    def test_write_zarr_float32_dataset(self):
        base_data = np.arange(50).reshape(5, 10).astype('float32')
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
        io = HDF5IO(self.path, mode='a')
        f = io._file
>       io.write_dataset(f, DatasetBuilder(name='test_dataset', data=zarr_data, attributes={}))

tests/unit/test_io_hdf5_h5tools.py:3671: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
    dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmpb7gw6onl" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpep7hiok8 shape=(5, 10) dtype=float32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
            dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
>           raise Exception(msg) from exc
E           Exception: Could not create scalar dataset test_dataset in /

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
___________ TestWriteHDF5withZarrInput.test_write_zarr_int32_dataset ___________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmp348qnibi" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpbuod7l8k shape=(5, 10) dtype=int32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
>           dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_int32_dataset>

    def test_write_zarr_int32_dataset(self):
        base_data = np.arange(50).reshape(5, 10).astype('int32')
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
        io = HDF5IO(self.path, mode='a')
        f = io._file
>       io.write_dataset(f, DatasetBuilder(name='test_dataset', data=zarr_data, attributes={}))

tests/unit/test_io_hdf5_h5tools.py:3657: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
    dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmp348qnibi" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpbuod7l8k shape=(5, 10) dtype=int32>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
            dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
>           raise Exception(msg) from exc
E           Exception: Could not create scalar dataset test_dataset in /

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception
__________ TestWriteHDF5withZarrInput.test_write_zarr_string_dataset ___________

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmphhu9vfqv" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpyvu6t9_n shape=(2,) dtype=StringDType()>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
>           dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.13/site-packages/h5py/_hl/group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
/usr/lib64/python3.13/site-packages/h5py/_hl/dataset.py:87: in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
h5py/h5t.pyx:1658: in h5py.h5t.py_create
    ???
h5py/h5t.pyx:1682: in h5py.h5t.py_create
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: Object dtype dtype('O') has no native HDF5 equivalent

h5py/h5t.pyx:1742: TypeError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_io_hdf5_h5tools.TestWriteHDF5withZarrInput testMethod=test_write_zarr_string_dataset>

    def test_write_zarr_string_dataset(self):
        base_data = np.array(['string1', 'string2'], dtype=str)
        zarr.save(self.zarr_path, base_data)
        zarr_data = zarr.open(self.zarr_path, 'r')
        io = HDF5IO(self.path, mode='a')
        f = io._file
>       io.write_dataset(f, DatasetBuilder('test_dataset', zarr_data, attributes={}))

tests/unit/test_io_hdf5_h5tools.py:3685: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1338: in write_dataset
    dset = self.__scalar_fill__(parent, name, data, options)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'hdmf.backends.hdf5.h5tools.HDF5IO'>
parent = <HDF5 file "tmphhu9vfqv" (mode r+)>, name = 'test_dataset'
data = <Array file:///tmp/tmpyvu6t9_n shape=(2,) dtype=StringDType()>
options = {'dtype': None, 'io_settings': {}}

    @classmethod
    def __scalar_fill__(cls, parent, name, data, options=None):
        dtype = None
        io_settings = {}
        if options is not None:
            dtype = options.get('dtype')
            io_settings = options.get('io_settings')
        if not isinstance(dtype, type):
            try:
                dtype = cls.__resolve_dtype__(dtype, data)
            except Exception as exc:
                msg = 'cannot add %s to %s - could not determine type' % (name, parent.name)
                raise Exception(msg) from exc
        try:
            dset = parent.create_dataset(name, data=data, shape=None, dtype=dtype, **io_settings)
        except Exception as exc:
            msg = "Could not create scalar dataset %s in %s" % (name, parent.name)
>           raise Exception(msg) from exc
E           Exception: Could not create scalar dataset test_dataset in /

../BUILDROOT/usr/lib/python3.13/site-packages/hdmf/backends/hdf5/h5tools.py:1366: Exception

Strangely, the failures all seem to be in the HDF5 backends, but I'd guess it's really some change in Zarr that can't be saved to HDF5.

What solution would you like?

See https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html I believe.

Do you have any interest in helping implement the feature?

No.

@rly
Copy link
Contributor

rly commented Jan 21, 2025

hdmf does not yet support zarr v3, which introduces a number of breaking changes that affect the hdmf package. The upcoming release of hdmf 4.0.0 will set the upper bound of the optional zarr dependency to <3 until v3 support is added.

See also the related but separate effort to add support for zarr v3 in hdmf-zarr: hdmf-dev/hdmf-zarr#202

@rly rly added category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of users labels Jan 21, 2025
@rly rly added this to the Future milestone Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

No branches or pull requests

3 participants