v0.5
Release notes:
features:
Array overhaul:
- Support for applying lazy functions to modify the data, with a special emphasis on slicing. You can now do something like time_slice_array = open_ds("path/to/data.zarr/array").lazy_op(np.s_[0:5]) which would open your data as a
funlib.persistence.Array
, and then slice the first 5 time steps (assuming your data has time in the first channel. You can also apply functions such as:thresholded_array = open_ds("path/to/data.zarr/array").adapt(lambda x: x > 0.5)
which will lazily apply the function and will appropriately update thethresholded_array.dtype
soassert thresholded_array.dtype == bool
should pass. You can write to the array if you only use slicing operations, but once you apply a function to your data it will no longer be writable. Arrays are now backed bydask
so our support extends to but is also limited by the lazy slicing and processing thatdask
supports. - Slight interface change.
open_ds
andprepare_ds
take a single store argument. This is directly passed tozarr.open
, so we now both expand our support to anything zarr supports (zipped stores, cloud stores, etc.) but also limit ourselves (no more hdf5 etc.). Note this limitation only applies to the convenience functionsopen_ds
andprepare_ds
which come with expectations on data format and metadata format.Array
will still work with any array like object that can be converted to adask.Array
withdask.from_array
. If your data does not match our priors, we recommend writing customopen_ds
andprepare_ds
alternatives - No longer provide the
total_roi
andnum_channels
when usingprepare_ds
or directly callingArray
. We now just passoffset
(in units defined by the "units" attribute) andshape
(voxels). This means we now support any number of channel dimensions. I.e. you can doprepare_ds(..., offset = (100,200,300), shape = (3, 3, 300, 300, 300))
to have 2 channel dimensions and 3 physical which previously wouldn't have been straightforward - expanded metadata. We now have
axis_names
,units
,voxel_size
, andoffset
. I have separated out a metadata class and a metadata parsing class that can be modified to cover a fairly large variety of simple metadata schemes, and added some reasonable defaults so this metadata will always be present or errors will be thrown if metadata is contradictory. If your metadata requires special parsing (e.g. you store your metadata on the multiscale group instead of directly on the array you are opening) then it is easy to pass in metadata fields to skip the automatic parsing so you can write your own thin wrapper for your specific data. - Added support for configuring the default metadata schema. We check the following paths:
"pyproject.toml"
,"funlib_persistence.toml"
,Path.home() / ".config/funlib_persistence/funlib_persistence.toml"
,"/etc/funlib_persistence/funlib_persistence.toml"
for configs. The attributes that can be provided arevoxel_size_attr
,axis_names_attr
,units_attr
, andoffset_attr
. Whatever attributes you provide will be used for both reading and writing metadata. You can also override the default metadata in each python script viafunlib.persistence.arrays.metadata.set_default_metadata_format(...)
.
drop-edges: you can now drop just the edges from a graphdb.