-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate axes types v0.4 #124
Conversation
Codecov Report
@@ Coverage Diff @@
## master #124 +/- ##
==========================================
+ Coverage 80.49% 81.53% +1.04%
==========================================
Files 11 12 +1
Lines 1174 1251 +77
==========================================
+ Hits 945 1020 +75
- Misses 229 231 +2
Continue to review full report at Codecov.
|
Hi @will-moore sorry I'm not more responsive - I'm out of work till Tuesday, will dig in all that then! |
@glyg No problem. No hurry - just pinging you in case it helps avoid any duplicate effort. |
6052098
to
478a4e7
Compare
478a4e7
to
a2b3b3e
Compare
ome_zarr/reader.py
Outdated
axes = multiscales[0].get("axes") | ||
fmt = format_from_version(version) | ||
# Raises ValueError if not valid | ||
validate_axes(None, axes, fmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not consuming the return values of validate_axes
, the goal here is "only" to validate an axes
but not modify it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - good point. Looking at this again, I realise that validate_axes(None, axes, fmt)
is really designed for writing. So, what's returned will be valid, even if what it's passed isn't valid. E.g. it will convert "tczyx" to an axis array, and will allow v0.3 to have None
if 2D or 5D.
What we really want here is Axes(axes, fmt).validate()
axes = tuple(multiscales[0].get("axes", ["t", "c", "z", "y", "x"])) | ||
if len(set(axes) - axes_values) > 0: | ||
raise RuntimeError(f"Invalid axes names: {set(axes) - axes_values}") | ||
axes = multiscales[0].get("axes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the main implication of not setting a default value here?
For 0.1/0.2 data, this means, the node.metadata["axes"]
might be None
as opposed to ["t", "c", "z", "y", "x"]
previously i.e. we are preserving the value stored in the metadata? Is there an impact on clients relying on node.metadata["axes"]
?
ome_zarr/writer.py
Outdated
ndim: int, axes: Union[str, List[str]] = None, fmt: Format = CurrentFormat() | ||
) -> Union[None, List[str]]: | ||
"""Returns validated list of axes names or raise exception if invalid""" | ||
def validate_axes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least I find the following lines
axes_obj = Axes(axes)
axes_obj.validate(fmt)
clarifies a lot of the logic happening here. It brings the question of whether additional logic should be moved to the constructor. Said otherwise, what is the added value of calling the validate_axes
API vs the two-liner:
axes = Axes(axes, fmt=fmt, ndim=ddim)
axes.validate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the outcome of #124 (comment). Should the name of the method be updated to reflect this is a writer/constructor rather than a validator? Since this API is moved to be a public API, this is increasingly important. Alternatively, we can keep it prefixed with _
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few additional comments mostly revolving around API names and behaviors. But I think the bulk of the work looks good
As discussed this morning, we will need to get this consumed by downstream clients e.g. omero-cli-zarr
and/or napari-ome-zarr
. To respect the fact the 0.4 specification is not yet published, I don't think this can land into a public release of ome-zarr-py. Instead, can we try to make the minimal decision to target a 0.3 pre-release?
ome_zarr/writer.py
Outdated
ndim: int, axes: Union[str, List[str]] = None, fmt: Format = CurrentFormat() | ||
) -> Union[None, List[str]]: | ||
"""Returns validated list of axes names or raise exception if invalid""" | ||
def validate_axes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the outcome of #124 (comment). Should the name of the method be updated to reflect this is a writer/constructor rather than a validator? Since this API is moved to be a public API, this is increasingly important. Alternatively, we can keep it prefixed with _
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. One minor detail question below and one higher-level here: Does anyone have a feeling for when we might want to start adding submodules, e.g. are we approaching a "util" or "internals" with his PR?
axes_names.append(axis["name"]) | ||
return axes_names | ||
|
||
def _validate_03(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to read more here to get my head around the new class, but my immediate reaction to methods that are version specific is that I'd hope they could live in the Format
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hindsight. At some point, I was also wondering whether Axes03
Axes04
should be introduced to toggle between behaviors. It would be interesting to see how to make use the Format
API to handle this as this can be re-used in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think this through... It's possible for everything to become version specific if it changes in a future version, and we wouldn't want the API to change when methods move to a Format
object.
Everything in the Axes
class is version specific, (e.g. _validate_axes_types
will soon become more liberal). So do we move those internal methods to Format
objects when they change?
This will mean that eventually all the code for everything will migrate Format
objects.
Are the methods of Format
objects (e.g. validate_axes()
) expected to be part of the external API?
What's the use-case or benefit of all version-specific code being in Format
objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to think of another example with the 0.4 HCS changes to the wells
element discussed in #157, I could certainly imagine migrating _validate_well_images(wells)
by fmt.validate_well_images(wells)
or similar and let FormatV04
vs FormatV{01,02,03}
override the methods as appropriate.
As discussed yesterday, merging to move forward towards a pre-release of |
This builds on top of #123 to add validation of new axes dicts, based on the axes types.
Based on spec at ome/ngff#57 (not merged yet).
There is a fair bit of validation code here cc @glyg
Breaking changes:
axes
now returns a list of dicts instead of list of strBut the writing API isn't breaking. Anything that was valid for v0.3 is also valid for v0.4, since we automatically add
type
info fortczyx
dimensions.