Introduce Pydantic models for NGFF metadata #528

tcompa · 2023-09-19T12:03:06Z

close #351
close #501
close #532
close #533

Add unit tests for new ngff module
Review all docstring descriptions of metadata.
Remove debugging statements from lib_ngff.py
Remove as many functions as possible from lib_zattrs_utils.py
Add docstrings to lib_ngff.py
I added an appropriate entry to CHANGELOG.md
Open issue on de-duplication of models w.r.t. lib_channels.py - ref De-duplicate Pydantic models in lib_channels.py and lib_ngff.py #540

The main goal is to extract some ome-ngff-related metadata directly from an ome-zarr, rather than from the metadata input parameter (ref #351). This is a clear goal, that can be achieved in multiple ways. The simplest one is to have several helper functions that go and construct a given parameter (e.g. num_levels) based on the attributes of an OME-Zarr group.

With the current PR, I'm proposing a different approach that brings in broader changes. I'm introducing Pydantic models that encode (part of) OME-NGFF specs, and then implementing additional logic as part of these models (via properties or methods), e.g. to extract num_levels.
Instead of writing the Pydantic models myself, I generated them automatically from the JSON Schema of 0.4 NGFF (via https://github.com/koxudaxi/datamodel-code-generator) and then tweaked them a bit - resulting e.g. in a NgffImage model - see also note below on effort duplication.

If we pursue this route, then most of the functions in lib_zattrs_utils.py will become methods of the appropriate NGFF object (e.g. NgffImage or NgffWell). The example I already included in the preliminary commits of this PR is that also extract_zyx_pixel_sizes can be re-written in a simpler way as a method of NgffImage. Example:

# Current status
from fractal_tasks_core.lib_zattrs_utils import extract_zyx_pixel_sizes
def yokogawa_to_ome_zarr(..):
    ...
    parameters = get_parameters_from_metadata(
        keys=[
             "original_paths",
             "num_levels",
             "coarsening_xy",
             "image_extension",
             "image_glob_patterns",
         ]
     )
    num_levels = parameters["num_levels"]
    coarsening_xy = parameters["coarsening_xy"]
    pxl_size = extract_zyx_pixel_sizes(f"{zarrurl}/.zattrs")
    ...

# This PR
def yokogawa_to_ome_zarr(..):
    ...
    ngff_image = NgffImage(**zarr.open_group(zarrurl).attrs.asdict())
    num_levels = ngff_image.num_levels
    coarsening_xy = ngff_image.coarsening_xy
    pxl_size = ngff_image.get_pixel_sizes_zyx(level=0)
    ...
    parameters = get_parameters_from_metadata(
        keys=[
             "original_paths",
             "image_extension",
             "image_glob_patterns",
         ]
     )
    ...

Scope of this change

To be clear: this new lib_image.py module (to be renamed into lib_ngff.py as soon as we also include Well or something else) may become an important one, since it will also contribute to defining our compliance/support w.r.t OME-NGFF. Some aspects:

We now have multiple places in the code base where we check for a single OME-Zarr property (e.g. we can only accept a single multiscale per image). This information would be more centralized if a set of validators were in place for each Well/Image/Plate model...
We now have a couple of check that the NGFF version is 0.4, and we are not supporting anything else. In principle the use of lib_ngff.py modules would also be one way to simplify support for multiple versions in the future (e.g. by using NgffImageV04 or NgffImageV05 based on the version stored in the zarr attributes).
Having a more strict structure of models won't make any difference w.r.t. Support arrays of mixed-dimensionality #150, but my opinion is that it could make some internal functions simpler (see for instance the current way to extract pixel sizes in the PR, which can "natively" go and look for axes and their names as object attributes).

This PR may also affect:

Making coordinateTransformations flexible for axes & comply with spec (include channels transformation) #420 (see also logic introduced in Add 2D support for Cellpose & napari workflows task (closes #398) #403, for the extract_zyx_pixel_sizes function)
Support arrays of mixed-dimensionality #150
Replace loading .zattrs file with accessing zarr attrs #501

A note on effort duplication

The same procedure (transorming at least a subset of the NGFF specs into Python models) is already done elsewhere:

At https://github.com/JaneliaSciComp/pydantic-ome-ngff, where this procedure is the core purpose of the library;
In https://github.com/czbiohub-sf/iohub/blob/main/iohub/ngff_meta.py (with Pydantic)
In https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/reader.py (without Pydantic, and with a slightly different scope)

If/when one of these efforts grows and/or becomes the officially sanctioned Pydantic version of OME-NGFF (or something we want to rely on, for whatever reason), then we should consider using their models instead of our "custom" ones. The transition would take place by defining a FractalNgffImage class that inherits from the "official" one and adds additional methods/properties that are needed for fractal-tasks-core tasks (e.g it would expose a num_levels property).

Neglecting the different attribute names, the change could be as smooth as in:

# This PR

class NgffImage(BaseModel):  # <--------- fractal-specific class
    multiscales: list[Multiscale] = Field(
        ...,
        description="The multiscale datasets for this image",
        min_items=1,
        unique_items=True,
    )
    omero: Optional[Omero] = None

    @property
    def multiscale(self) -> Multiscale:
        if len(self.multiscales) > 1:
            raise NotImplementedError(
                "Only images with one multiscale are supported "
                f"(given: {len(self.multiscales)}"
            )
        return self.multiscales[0]

    @property
    def axes(self) -> list[Axe]:
        return self.multiscale.axes

    @property
    def datasets(self) -> list[Dataset]:
        return self.multiscale.datasets

    @property
    def num_levels(self) -> int:
        return len(self.datasets)

# A possible future version

from xyz import NgffImage

class FractalNgffImage(NgffImage):   # <--------- use official model as parent class

    @property
    def multiscale(self) -> Multiscale:
        if len(self.multiscales) > 1:
            raise NotImplementedError(
                "Only images with one multiscale are supported "
                f"(given: {len(self.multiscales)}"
            )
        return self.multiscales[0]

    @property
    def axes(self) -> list[Axe]:
        return self.multiscale.axes

    @property
    def datasets(self) -> list[Dataset]:
        return self.multiscale.datasets

    @property
    def num_levels(self) -> int:
        return len(self.datasets)

#150)

github-actions · 2023-09-19T12:15:21Z

Coverage report

The coverage rate went from 89.89% to 90.2% ⬆️
The branch rate is 83%.

98.19% of new lines are covered.

Diff Coverage details (click to unfold)

fractal_tasks_core/lib_ngff.py

98.01% of new lines are covered (94.08% of the complete file).
Missing lines: 414, 415, 419

fractal_tasks_core/lib_zattrs_utils.py

100% of new lines are covered (96.15% of the complete file).

fractal_tasks_core/tasks/apply_registration_to_ROI_tables.py

100% of new lines are covered (90.17% of the complete file).

fractal_tasks_core/tasks/apply_registration_to_image.py

90.9% of new lines are covered (79.22% of the complete file).
Missing lines: 153

fractal_tasks_core/tasks/calculate_registration_image_based.py

100% of new lines are covered (91.22% of the complete file).

fractal_tasks_core/tasks/cellpose_segmentation.py

100% of new lines are covered (85.6% of the complete file).

fractal_tasks_core/tasks/copy_ome_zarr.py

100% of new lines are covered (89.42% of the complete file).

fractal_tasks_core/tasks/create_ome_zarr.py

100% of new lines are covered (82.9% of the complete file).

fractal_tasks_core/tasks/illumination_correction.py

100% of new lines are covered (82.01% of the complete file).

fractal_tasks_core/tasks/maximum_intensity_projection.py

100% of new lines are covered (87.01% of the complete file).

fractal_tasks_core/tasks/napari_workflows_wrapper.py

100% of new lines are covered (90.3% of the complete file).

fractal_tasks_core/tasks/yokogawa_to_ome_zarr.py

100% of new lines are covered (91.37% of the complete file).

…than-from-metadata-whenever-possible

Old version had `scale` transformation with 3 items instead of 4

jluethi · 2023-09-20T08:48:39Z

Great.
Let's adapt wording to make clear we're just loading metadata, not image data

…, and `lib_image.py->lib_ngff.py`

…tion_to_ROI_tables` task

tcompa · 2023-09-22T10:41:41Z

This PR is essentially ready on my side. Its contents is briefly summarized as:

Introduce (a first version of) Pydantic models for NGFF images and wells; based on some autogenerated ones, and then with quite some clean up. These models also several constraints which we have in-place (e.g. the fact that we only support images with a single multiscale), thus reducing the number of checks in the tasks.
Extract num_levels and coarsening_xy parameters from NGFF objects, rather than from metadata task input .
Replace several helper functions (get_axes_names, extract_zyx_pixel_sizes and get_acquisition_paths), that are now methods/properties of the Pydantic models.
Load Zarr attributes from groups, rather than from .zattrs files.

Relevant issues closed by this PR:

It is quite a large PR, but at least a high-level review would be very useful - cc @jluethi.

fractal_tasks_core/tasks/cellpose_segmentation.py

fractal_tasks_core/tasks/calculate_registration_image_based.py

jluethi

@tcompa This looks great! I added a few minor comments on the NGFF pydantic models and some questions on whether some lines should now be moved to a different part of the task (open to hear where they are placed better, just stood out that we could move them together with this refactor).
Also, we can use the NGFF Pydantic models to start generalizing axes. Sounds like a good idea to start like this with an pixel_sizes_zyx property and then move to more flexible handling in the Pydantic model that the tasks can adopt :)

I didn't review the test changes.

fractal_tasks_core/lib_ngff.py

fractal_tasks_core/tasks/cellpose_segmentation.py

fractal_tasks_core/tasks/illumination_correction.py

fractal_tasks_core/tasks/napari_workflows_wrapper.py

fractal_tasks_core/tasks/yokogawa_to_ome_zarr.py

…than-from-metadata-whenever-possible

1. Extract pixel sizes together with other attributes; 2. Remove redundant extraction of pixel sizes for anisotropy calculation; 3. Remove redundant X/Y homogeneity check (now part of `coarsening_xy` extraction); 4. Improve logging.

tcompa · 2023-09-27T14:40:09Z

I added a few minor comments on the NGFF pydantic models and some questions on whether some lines should now be moved to a different part of the task

These should all be covered now.

I didn't review the test changes.

That's OK.
I'm mostly unit-testing all the new models, and then updated existing tests that were relying on the old helper functions.

tcompa · 2023-09-27T14:40:51Z

All seems good, and we have new issues in-place for parts that we postponed (e.g. #540 or #150).
Merging.

tcompa added 4 commits September 19, 2023 11:54

Commit first version of lib_image.py

b71b2ae

Simplify lib_image.py

7461a0a

Fix validator in `lib_image.py

de03f63

Use new NgffImage model in yokogawa_to_ome_zarr task (ref #351, ref

30d23d5

#150)

tcompa linked an issue Sep 19, 2023 that may be closed by this pull request

Extract attributes from ome-zarr rather than from metadata (whenever possible) #351

Closed

tcompa added 7 commits September 19, 2023 14:18

Remove useless property and debugging statements

5d49a66

Introduce load_NgffImage_from_zarr

2d125d4

Update copy-ome-zarr task with new NgffImage model

cbce620

Update MIP task with new NgffImage model

7883e0c

Update cellpose task with new NgffImage model

2a61cb9

Merge branch 'main' into 351-extract-attributes-from-ome-zarr-rather-…

cf3aea9

…than-from-metadata-whenever-possible

Add comment [skip ci]

65968c0

tcompa changed the title ~~[work in progress] Use NGFF-image metadata~~ Use NGFF-image metadata Sep 20, 2023

tcompa added 3 commits September 20, 2023 09:31

Update illumination-correction task with new NgffImage model

958136d

Relax constraints on NgffImage.omero attributes

d6383ee

Adapt test data to specs

acd2bf9

Old version had `scale` transformation with 3 items instead of 4

tcompa added 12 commits September 20, 2023 14:30

Rename NgffImage->NgffImageMeta (and corresponding load function)…

4a64f98

…, and `lib_image.py->lib_ngff.py`

Align more with ngff specs w.r.t omero metadata

5658bd1

Add FIXME docstring

b440f0e

Update testing data to better comply with ngff specs

445e904

Update testing data to better comply with ngff specs

db7b2c3

Add FIXME

c069fb8

Add debugging to lib_ngff.py

a6b756c

Partial use of ngff_image_meta in napari-workflows wrapper

00dac38

Add NgffWellMeta

5dd6062

Add NgffWellMeta.get_acquisition_paths method

36df5f7

Introduce load_NgffWellMeta

7664222

Switch to ngff_well_meta.get_acquisition_paths() in `apply_registra…

2e9eb38

…tion_to_ROI_tables` task

Fully switch to Ngff objects in cellpose task (ref #501)

3d62b29

tcompa requested a review from jluethi September 22, 2023 10:41

Update CHANGELOG

c2d075e

tcompa added a commit that referenced this pull request Sep 22, 2023

Pull CHANGELOG from #528

9a692a6

tcompa commented Sep 22, 2023

View reviewed changes

fractal_tasks_core/tasks/cellpose_segmentation.py Outdated Show resolved Hide resolved

Update CHANGELOG

c8ae0b4

tcompa commented Sep 22, 2023

View reviewed changes

fractal_tasks_core/tasks/calculate_registration_image_based.py Outdated Show resolved Hide resolved

Change type of NgffImageMeta.pixel_sizes_zyx (from tuple to list)

7f22d63

jluethi reviewed Sep 26, 2023

View reviewed changes

tcompa added 15 commits September 27, 2023 09:06

Merge branch 'main' into 351-extract-attributes-from-ome-zarr-rather-…

77c1678

…than-from-metadata-whenever-possible

Fix typo in comment

932d4d1

Add Axis.unit attribute

716cb29

Improve error message re: global scale transformations

f1f6bb8

fix typo in docstring

bf05a9e

Clarify Image docstring

b7107e2

Clarify acquisition-related error message

d6d21db

Merge branch 'main' into 351-extract-attributes-from-ome-zarr-rather-…

0656866

…than-from-metadata-whenever-possible

Rename Image into ImageInWell

f73545f

Remove FIXME (deferring to #540)

1002999

Fix multiline string

2a90eee

Review use of NGFF image in illumination-correction task

0674165

Update comment

00cf52c

Clean up use of NGFF image in yokogawa-to-zarr

7d45368

tcompa marked this pull request as ready for review September 27, 2023 14:37

tcompa merged commit 72ce1bd into main Sep 27, 2023

tcompa deleted the 351-extract-attributes-from-ome-zarr-rather-than-from-metadata-whenever-possible branch September 27, 2023 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Pydantic models for NGFF metadata #528

Introduce Pydantic models for NGFF metadata #528

tcompa commented Sep 19, 2023 •

edited

Loading

github-actions bot commented Sep 19, 2023 •

edited

Loading

fractal_tasks_core/lib_ngff.py

fractal_tasks_core/lib_zattrs_utils.py

fractal_tasks_core/tasks/apply_registration_to_ROI_tables.py

fractal_tasks_core/tasks/apply_registration_to_image.py

fractal_tasks_core/tasks/calculate_registration_image_based.py

fractal_tasks_core/tasks/cellpose_segmentation.py

fractal_tasks_core/tasks/copy_ome_zarr.py

fractal_tasks_core/tasks/create_ome_zarr.py

fractal_tasks_core/tasks/illumination_correction.py

fractal_tasks_core/tasks/maximum_intensity_projection.py

fractal_tasks_core/tasks/napari_workflows_wrapper.py

fractal_tasks_core/tasks/yokogawa_to_ome_zarr.py

jluethi commented Sep 20, 2023

tcompa commented Sep 22, 2023

jluethi left a comment

tcompa commented Sep 27, 2023

tcompa commented Sep 27, 2023

Introduce Pydantic models for NGFF metadata #528

Introduce Pydantic models for NGFF metadata #528

Conversation

tcompa commented Sep 19, 2023 • edited Loading

Scope of this change

A note on effort duplication

github-actions bot commented Sep 19, 2023 • edited Loading

Coverage report

fractal_tasks_core/lib_ngff.py

fractal_tasks_core/lib_zattrs_utils.py

fractal_tasks_core/tasks/apply_registration_to_ROI_tables.py

fractal_tasks_core/tasks/apply_registration_to_image.py

fractal_tasks_core/tasks/calculate_registration_image_based.py

fractal_tasks_core/tasks/cellpose_segmentation.py

fractal_tasks_core/tasks/copy_ome_zarr.py

fractal_tasks_core/tasks/create_ome_zarr.py

fractal_tasks_core/tasks/illumination_correction.py

fractal_tasks_core/tasks/maximum_intensity_projection.py

fractal_tasks_core/tasks/napari_workflows_wrapper.py

fractal_tasks_core/tasks/yokogawa_to_ome_zarr.py

jluethi commented Sep 20, 2023

tcompa commented Sep 22, 2023

jluethi left a comment

Choose a reason for hiding this comment

tcompa commented Sep 27, 2023

tcompa commented Sep 27, 2023

tcompa commented Sep 19, 2023 •

edited

Loading

github-actions bot commented Sep 19, 2023 •

edited

Loading