Skip to content

v0.5

Compare
Choose a tag to compare
@pattonw pattonw released this 29 Aug 23:19
· 16 commits to main since this release

Release notes:

features:
Array overhaul:

  1. Support for applying lazy functions to modify the data, with a special emphasis on slicing. You can now do something like time_slice_array = open_ds("path/to/data.zarr/array").lazy_op(np.s_[0:5]) which would open your data as a funlib.persistence.Array, and then slice the first 5 time steps (assuming your data has time in the first channel. You can also apply functions such as: thresholded_array = open_ds("path/to/data.zarr/array").adapt(lambda x: x > 0.5) which will lazily apply the function and will appropriately update the thresholded_array.dtype so assert thresholded_array.dtype == bool should pass. You can write to the array if you only use slicing operations, but once you apply a function to your data it will no longer be writable. Arrays are now backed by dask so our support extends to but is also limited by the lazy slicing and processing that dask supports.
  2. Slight interface change. open_ds and prepare_ds take a single store argument. This is directly passed to zarr.open, so we now both expand our support to anything zarr supports (zipped stores, cloud stores, etc.) but also limit ourselves (no more hdf5 etc.). Note this limitation only applies to the convenience functions open_ds and prepare_ds which come with expectations on data format and metadata format. Array will still work with any array like object that can be converted to a dask.Array with dask.from_array. If your data does not match our priors, we recommend writing custom open_ds and prepare_ds alternatives
  3. No longer provide the total_roi and num_channels when using prepare_ds or directly calling Array. We now just pass offset (in units defined by the "units" attribute) and shape (voxels). This means we now support any number of channel dimensions. I.e. you can do prepare_ds(..., offset = (100,200,300), shape = (3, 3, 300, 300, 300)) to have 2 channel dimensions and 3 physical which previously wouldn't have been straightforward
  4. expanded metadata. We now have axis_names, units, voxel_size, and offset. I have separated out a metadata class and a metadata parsing class that can be modified to cover a fairly large variety of simple metadata schemes, and added some reasonable defaults so this metadata will always be present or errors will be thrown if metadata is contradictory. If your metadata requires special parsing (e.g. you store your metadata on the multiscale group instead of directly on the array you are opening) then it is easy to pass in metadata fields to skip the automatic parsing so you can write your own thin wrapper for your specific data.
  5. Added support for configuring the default metadata schema. We check the following paths: "pyproject.toml", "funlib_persistence.toml", Path.home() / ".config/funlib_persistence/funlib_persistence.toml", "/etc/funlib_persistence/funlib_persistence.toml" for configs. The attributes that can be provided are voxel_size_attr, axis_names_attr, units_attr, and offset_attr. Whatever attributes you provide will be used for both reading and writing metadata. You can also override the default metadata in each python script via funlib.persistence.arrays.metadata.set_default_metadata_format(...).

drop-edges: you can now drop just the edges from a graphdb.