Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Pydantic models for NGFF metadata #528

Conversation

tcompa
Copy link
Collaborator

@tcompa tcompa commented Sep 19, 2023

close #351
close #501
close #532
close #533

  • Add unit tests for new ngff module
  • Review all docstring descriptions of metadata.
  • Remove debugging statements from lib_ngff.py
  • Remove as many functions as possible from lib_zattrs_utils.py
  • Add docstrings to lib_ngff.py
  • I added an appropriate entry to CHANGELOG.md
  • Open issue on de-duplication of models w.r.t. lib_channels.py - ref De-duplicate Pydantic models in lib_channels.py and lib_ngff.py #540

The main goal is to extract some ome-ngff-related metadata directly from an ome-zarr, rather than from the metadata input parameter (ref #351). This is a clear goal, that can be achieved in multiple ways. The simplest one is to have several helper functions that go and construct a given parameter (e.g. num_levels) based on the attributes of an OME-Zarr group.

With the current PR, I'm proposing a different approach that brings in broader changes. I'm introducing Pydantic models that encode (part of) OME-NGFF specs, and then implementing additional logic as part of these models (via properties or methods), e.g. to extract num_levels.
Instead of writing the Pydantic models myself, I generated them automatically from the JSON Schema of 0.4 NGFF (via https://github.com/koxudaxi/datamodel-code-generator) and then tweaked them a bit - resulting e.g. in a NgffImage model - see also note below on effort duplication.

If we pursue this route, then most of the functions in lib_zattrs_utils.py will become methods of the appropriate NGFF object (e.g. NgffImage or NgffWell). The example I already included in the preliminary commits of this PR is that also extract_zyx_pixel_sizes can be re-written in a simpler way as a method of NgffImage. Example:

# Current status
from fractal_tasks_core.lib_zattrs_utils import extract_zyx_pixel_sizes
def yokogawa_to_ome_zarr(..):
    ...
    parameters = get_parameters_from_metadata(
        keys=[
             "original_paths",
             "num_levels",
             "coarsening_xy",
             "image_extension",
             "image_glob_patterns",
         ]
     )
    num_levels = parameters["num_levels"]
    coarsening_xy = parameters["coarsening_xy"]
    pxl_size = extract_zyx_pixel_sizes(f"{zarrurl}/.zattrs")
    ...

# This PR
def yokogawa_to_ome_zarr(..):
    ...
    ngff_image = NgffImage(**zarr.open_group(zarrurl).attrs.asdict())
    num_levels = ngff_image.num_levels
    coarsening_xy = ngff_image.coarsening_xy
    pxl_size = ngff_image.get_pixel_sizes_zyx(level=0)
    ...
    parameters = get_parameters_from_metadata(
        keys=[
             "original_paths",
             "image_extension",
             "image_glob_patterns",
         ]
     )
    ...

Scope of this change

To be clear: this new lib_image.py module (to be renamed into lib_ngff.py as soon as we also include Well or something else) may become an important one, since it will also contribute to defining our compliance/support w.r.t OME-NGFF. Some aspects:

  1. We now have multiple places in the code base where we check for a single OME-Zarr property (e.g. we can only accept a single multiscale per image). This information would be more centralized if a set of validators were in place for each Well/Image/Plate model...
  2. We now have a couple of check that the NGFF version is 0.4, and we are not supporting anything else. In principle the use of lib_ngff.py modules would also be one way to simplify support for multiple versions in the future (e.g. by using NgffImageV04 or NgffImageV05 based on the version stored in the zarr attributes).
  3. Having a more strict structure of models won't make any difference w.r.t. Support arrays of mixed-dimensionality #150, but my opinion is that it could make some internal functions simpler (see for instance the current way to extract pixel sizes in the PR, which can "natively" go and look for axes and their names as object attributes).

This PR may also affect:

A note on effort duplication

The same procedure (transorming at least a subset of the NGFF specs into Python models) is already done elsewhere:

If/when one of these efforts grows and/or becomes the officially sanctioned Pydantic version of OME-NGFF (or something we want to rely on, for whatever reason), then we should consider using their models instead of our "custom" ones. The transition would take place by defining a FractalNgffImage class that inherits from the "official" one and adds additional methods/properties that are needed for fractal-tasks-core tasks (e.g it would expose a num_levels property).

Neglecting the different attribute names, the change could be as smooth as in:

# This PR

class NgffImage(BaseModel):  # <--------- fractal-specific class
    multiscales: list[Multiscale] = Field(
        ...,
        description="The multiscale datasets for this image",
        min_items=1,
        unique_items=True,
    )
    omero: Optional[Omero] = None

    @property
    def multiscale(self) -> Multiscale:
        if len(self.multiscales) > 1:
            raise NotImplementedError(
                "Only images with one multiscale are supported "
                f"(given: {len(self.multiscales)}"
            )
        return self.multiscales[0]

    @property
    def axes(self) -> list[Axe]:
        return self.multiscale.axes

    @property
    def datasets(self) -> list[Dataset]:
        return self.multiscale.datasets

    @property
    def num_levels(self) -> int:
        return len(self.datasets)

# A possible future version

from xyz import NgffImage

class FractalNgffImage(NgffImage):   # <--------- use official model as parent class

    @property
    def multiscale(self) -> Multiscale:
        if len(self.multiscales) > 1:
            raise NotImplementedError(
                "Only images with one multiscale are supported "
                f"(given: {len(self.multiscales)}"
            )
        return self.multiscales[0]

    @property
    def axes(self) -> list[Axe]:
        return self.multiscale.axes

    @property
    def datasets(self) -> list[Dataset]:
        return self.multiscale.datasets

    @property
    def num_levels(self) -> int:
        return len(self.datasets)

@github-actions
Copy link

github-actions bot commented Sep 19, 2023

Coverage report

The coverage rate went from 89.89% to 90.2% ⬆️
The branch rate is 83%.

98.19% of new lines are covered.

Diff Coverage details (click to unfold)

fractal_tasks_core/lib_ngff.py

98.01% of new lines are covered (94.08% of the complete file).
Missing lines: 414, 415, 419

fractal_tasks_core/lib_zattrs_utils.py

100% of new lines are covered (96.15% of the complete file).

fractal_tasks_core/tasks/apply_registration_to_ROI_tables.py

100% of new lines are covered (90.17% of the complete file).

fractal_tasks_core/tasks/apply_registration_to_image.py

90.9% of new lines are covered (79.22% of the complete file).
Missing lines: 153

fractal_tasks_core/tasks/calculate_registration_image_based.py

100% of new lines are covered (91.22% of the complete file).

fractal_tasks_core/tasks/cellpose_segmentation.py

100% of new lines are covered (85.6% of the complete file).

fractal_tasks_core/tasks/copy_ome_zarr.py

100% of new lines are covered (89.42% of the complete file).

fractal_tasks_core/tasks/create_ome_zarr.py

100% of new lines are covered (82.9% of the complete file).

fractal_tasks_core/tasks/illumination_correction.py

100% of new lines are covered (82.01% of the complete file).

fractal_tasks_core/tasks/maximum_intensity_projection.py

100% of new lines are covered (87.01% of the complete file).

fractal_tasks_core/tasks/napari_workflows_wrapper.py

100% of new lines are covered (90.3% of the complete file).

fractal_tasks_core/tasks/yokogawa_to_ome_zarr.py

100% of new lines are covered (91.37% of the complete file).

@tcompa tcompa changed the title [work in progress] Use NGFF-image metadata Use NGFF-image metadata Sep 20, 2023
@jluethi
Copy link
Collaborator

jluethi commented Sep 20, 2023

Great.
Let's adapt wording to make clear we're just loading metadata, not image data

@tcompa
Copy link
Collaborator Author

tcompa commented Sep 22, 2023

This PR is essentially ready on my side. Its contents is briefly summarized as:

  1. Introduce (a first version of) Pydantic models for NGFF images and wells; based on some autogenerated ones, and then with quite some clean up. These models also several constraints which we have in-place (e.g. the fact that we only support images with a single multiscale), thus reducing the number of checks in the tasks.
  2. Extract num_levels and coarsening_xy parameters from NGFF objects, rather than from metadata task input .
  3. Replace several helper functions (get_axes_names, extract_zyx_pixel_sizes and get_acquisition_paths), that are now methods/properties of the Pydantic models.
  4. Load Zarr attributes from groups, rather than from .zattrs files.

Relevant issues closed by this PR:

It is quite a large PR, but at least a high-level review would be very useful - cc @jluethi.

@tcompa tcompa requested a review from jluethi September 22, 2023 10:41
tcompa added a commit that referenced this pull request Sep 22, 2023
Copy link
Collaborator

@jluethi jluethi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcompa This looks great! I added a few minor comments on the NGFF pydantic models and some questions on whether some lines should now be moved to a different part of the task (open to hear where they are placed better, just stood out that we could move them together with this refactor).
Also, we can use the NGFF Pydantic models to start generalizing axes. Sounds like a good idea to start like this with an pixel_sizes_zyx property and then move to more flexible handling in the Pydantic model that the tasks can adopt :)

I didn't review the test changes.

fractal_tasks_core/lib_ngff.py Show resolved Hide resolved
fractal_tasks_core/lib_ngff.py Outdated Show resolved Hide resolved
fractal_tasks_core/lib_ngff.py Show resolved Hide resolved
fractal_tasks_core/lib_ngff.py Outdated Show resolved Hide resolved
fractal_tasks_core/lib_ngff.py Show resolved Hide resolved
fractal_tasks_core/tasks/cellpose_segmentation.py Outdated Show resolved Hide resolved
fractal_tasks_core/tasks/illumination_correction.py Outdated Show resolved Hide resolved
fractal_tasks_core/tasks/illumination_correction.py Outdated Show resolved Hide resolved
fractal_tasks_core/tasks/napari_workflows_wrapper.py Outdated Show resolved Hide resolved
fractal_tasks_core/tasks/yokogawa_to_ome_zarr.py Outdated Show resolved Hide resolved
@tcompa tcompa marked this pull request as ready for review September 27, 2023 14:37
@tcompa
Copy link
Collaborator Author

tcompa commented Sep 27, 2023

I added a few minor comments on the NGFF pydantic models and some questions on whether some lines should now be moved to a different part of the task

These should all be covered now.

I didn't review the test changes.

That's OK.
I'm mostly unit-testing all the new models, and then updated existing tests that were relying on the old helper functions.

@tcompa
Copy link
Collaborator Author

tcompa commented Sep 27, 2023

All seems good, and we have new issues in-place for parts that we postponed (e.g. #540 or #150).
Merging.

@tcompa tcompa merged commit 72ce1bd into main Sep 27, 2023
@tcompa tcompa deleted the 351-extract-attributes-from-ome-zarr-rather-than-from-metadata-whenever-possible branch September 27, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants