Data dimensionality and axes metadata #35

constantinpape · 2021-02-28T17:24:57Z

In last weeks meeting the question of data dimensionality came up again (in the morning it was raised by @jni, and I think it came up in the afternoon as well).
Currently, the spec demands that all data is 5 dimensional (I think with axis order TCXYZ, but I am not quite sure).

Do we want to lift the restriction and allow data of lower dimensionality? In this case, we would add metadata in multiscales to describe the axes (e.g. "axes": ["x", "y", "z"]).

Note that this is also important for the transformation spec #28, where we need to clarify which axes a transformation applies to.

Independent of the decisions, we should add a field that describes the physical units of the axes, e.g. "units": ["micrometer", "micrometer", "micrometer"].

The text was updated successfully, but these errors were encountered:

jni · 2021-03-01T12:08:21Z

(I think with axis order TCXYZ, but I am not quite sure).

TCZYX. ;)

Do we want to lift the restriction and allow data of lower dimensionality?

Yes please! And in fact the axis names should be whatever, I we should not be limited to subsets of "TCZYX". eg could be ["lat", "lon"] or ["left-right", "superior-inferior", "anterior-posterior"].

joshmoore · 2021-03-02T14:46:48Z

@jni: what behavior would you expect for an array with no x or y?

tischi · 2021-03-02T18:33:33Z

Based on how this discussion evolved: #28 I guess the axis names may be part of the specification of the transformation from data space to physical space, is it?

tischi · 2021-03-02T18:37:18Z

what behavior would you expect for an array with no x or y?

@joshmoore What do you mean by "behavior", maybe "how it would be rendered in a viewer"?

d-v-b · 2021-03-02T18:48:50Z

@jni: what behavior would you expect for an array with no x or y?

In my opinion, a generic image viewer should have no intrinsic opinion about the particular axis names of the data it displays. If the user has 2D data with axes labelled X and B, then the viewer should display the data (with a default, but overrideable, mapping from data coordinates to viewer coordinates) as an image with one axis labelled "X" and the other axis labelled "B". If the data axis labelled "X" happens to be mapped to a display axis also called "X", then that is just a happy coincidence. A general-purpose data visualization tool should not assign any "meaning" to an axis name like "X" or "T". A more specialized tool might have an opinion about axis names, though.

tischi · 2021-03-03T08:32:51Z

then the viewer should display the data (with a default, but overrideable, mapping from data coordinates to viewer coordinates)

The way I interpreted the status of our discussion at #28 is that there is no default mapping, but a mapping must be always provided, or did I get this wrong?

d-v-b · 2021-03-03T16:14:58Z

Ah, sorry for causing confusion (and maybe we are straying away from the original question @joshmoore posed) -- Yes, I have the same interpretation of the discussion in #28. My (confusingly stated) point in the comment above was just that general purpose data visualization tools shouldn't have an opinion / preference for specific axis names in the transform metadata.

constantinpape · 2021-03-04T21:16:47Z

The way I interpreted the status of our discussion at #28 is that there is no default mapping, but a mapping must be always provided, or did I get this wrong?

I think this is still up for discussion. @axtimwalde made the point that no transformation could just be interpreted as identity transform. And no axes labels would mean that the data stays in pixel space.
This has the advantage that it's a non-breaking change.

@joshmoore what do you think about allowing to save also 2d, 3d and 4d data. I think this is the first important decision to drive #28 (and probably also other discussions) forward.

tischi · 2021-03-04T23:08:01Z

And in fact the axis names should be whatever, I we should not be limited to subsets of "TCZYX". eg could be ["lat", "lon"] or ["left-right", "superior-inferior", "anterior-posterior"].

@jni based on state of the discussion in #28 I wonder now whether your comment is about axis names in data space or in physical space. Currently, I would think we simply have no axis names at all in data space. In physical space I think it is nice to know which axis should be the "x" axis such that the viewer can display the data accordingly. Thus I think this information should be there.

What we could think of, on top of the specification which on is the "x" axis, to have something like optional axis_names metadata:

"axis_names" : { "x" : "anterior-posterior", "y": "dorsal-ventral" }

Would that work for you?

tischi · 2021-03-04T23:22:18Z

I think this is still up for discussion. @axtimwalde made the point that no transformation could just be interpreted as identity transform. And no axes labels would mean that the data stays in pixel space.

I think I'd prefer that it is required to specify the axes labels, because in practice it makes a big difference whether one displays a 3D data as xyz or xyc 😉 Unless we agree that specifying nothing defaults to axes of "type" : "space" with some default order like xyz.

jni · 2021-03-05T06:27:35Z

@tischi as mentioned on #28 we do not want to prescribe here where physical axes go on the screen. There is a third space, which is the screen space, and all kinds of transformations can happen between physical/world space and screen space, not least of which is a 3D -> 2D projection.

I also don't think axis label specification should be a requirement, but a strongly encouraged metadata. As mentioned by others, requirement makes the spec not backward-compatible. Indeed, treating channels as spatial by default is fine: most viewers have the ability to separate out channels. (napari notably doesn't 😅 but we are definitely planning it!)

tischi · 2021-03-05T08:02:47Z

I also don't think axis label specification should be a requirement, but a strongly encouraged metadata

OK, I guess I could live with "strongly encouraged" 😉

tischi · 2021-03-05T08:49:34Z

Indeed, treating channels as spatial by default is fine: most viewers have the ability to separate out channels.

@jni I get the point about requirements and backwards compatibility. But, in practice, let's say the vision is to be able to chain a set of napari plugins into an image processing workflow. My feeling is that it may be necessary to require to know which axes are spatial and which axis is the channel axis. What do you think?

joshmoore · 2021-03-05T11:48:33Z

#35 (comment) @joshmoore what do you think about allowing to save also 2d, 3d and 4d data.

I've been working under the assumption that it would eventually be necessary (cf. the IMS file structure). It certainly has the potential to complicate and possibly slow-down implementations, so I'd just urge balancing how soon its introduced against immediate need.

joshmoore · 2021-03-05T15:56:44Z

On the topic of XYZ or not necessarily XYZ, I have some concern that not having these takes us outside the realm of OME-* specs and closer to underlying numpy/zarr/n5/etc. specs, which is fine, but is something we should consider. If the axes are named arbitrarily, then quite possibly the axes metadata SHOULD additionally define which are orthogonal to one another and in what right-handed order

cf. (har) http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#coordinate-types

Edit: ah, I see while working through issues that this also came up in #28 (comment)

constantinpape · 2021-03-05T16:02:57Z

I've been working under the assumption that it would eventually be necessary (cf. the IMS file structure). It certainly has the potential to complicate and possibly slow-down implementations, so I'd just urge balancing how soon its introduced against immediate need.

As this is quite a big change and has implications for other parts of the spec, I would argue that this change should be done sooner than later if deemed necessary.
For example I am pretty sure that the transformation spec will look different if we decide on fixed 5d or 2d, 3d, 4d, 5d/

On the topic of XYZ or not necessarily XYZ, I have some concern that not having these takes us outside the realm of OME-* specs and closer to underlying numpy/zarr/n5/etc. specs, which is fine, but is something we should consider. If the axes are named arbitrarily, then quite possibly the axes metadata SHOULD additionally define which are orthogonal to one another and in what right-handed order

I personally also think we shouldn't allow for arbitrary axis naming and stick to XYZCT.

joshmoore · 2021-03-05T16:08:32Z

I personally also think we shouldn't allow for arbitrary axis naming and stick to XYZCT.

To be clear, I can certainly imagine having additional axes. But if there is no traditional X, Y, or Z axes in a given zarray, I don't know if I would consider it an image in the sense that is currently defined in this repository. (If anyone has a counter-example, I'd love to hear it.)

d-v-b · 2021-03-05T16:20:21Z

Medical imaging often uses anatomical coordinates, which do not involve the letters "X", "Y", or "Z": https://www.slicer.org/wiki/Coordinate_systems

joshmoore · 2021-03-05T16:33:02Z

@d-v-b: I guess I'm less concerned with naming, that's "just metadata". ;) But in all three you are in a 3D, right-handed coordinate system, right? I guess in my head (forgive me if I'm being biased) the ALS and IJK coordinate systems from slider.org could be equated to XYZ and then one need just provide which system one is under.

For comparison, in the high-content screening case, there are rows and plates but there's additionally metadata to say that the rows are letters and the columns are numbers.

tischi · 2021-03-05T21:40:20Z

I think the axes metadata part of this issue became quite overlapping with the discussion in this issue: #28, where the last posts were also about the handedness of the coordinate system and how much we want to commit to x, y, and z. Could it therefore make sense to continue this discussion on axes metadata in #28 and here just discuss how many data dimensions we would like to support?

axtimwalde · 2021-03-05T23:38:48Z

A data format that supports only 5 dimensions is asking to be obsolete within 2 weeks ;).

glyg · 2021-03-07T08:07:58Z

As a concrete case of more-than-5-D data, a team here is developing polarization microscopy, so each pixel has 7 coordinates: 3 spatial, the 3 components of the polarization vector, and time. Of course you can store the polarization as channels, but it gets tricky to encode a transformation then, as for example a rotation needs to apply to both the spatial and polarization coordinates.

constantinpape · 2021-03-07T10:06:07Z

Ok, so I think dropping the requirement for 5d is not really controversial, whereas there's still some discussion about the axes labels.

I have been thinking a bit about how to drive the spec forward, and I think it would make most sense to start with a rather small change:

Move the metadata spec from the zarr-specs/issues to the ngff spec, as that's a prerequisite for any changes to the metadata spec.
Lift the restriction to 5d.

What do you think @joshmoore? I can start working on this.

joshmoore · 2021-03-08T13:00:13Z

#35 (comment) A data format that supports only 5 dimensions is asking to be obsolete within 2 weeks ;).

I want this on a 👕 😉

#35 (comment) 7 coordinates: 3 spatial, the 3 components of the polarization vector, and time

How would you optionally encode them?

#35 (comment) What do you think @joshmoore? I can start working on this.

💯

glyg · 2021-03-08T14:07:31Z

How would you optionally encode them?

{
    "axes": ["x", "y", "z", "rho", "theta", "phi", "t"],
    "units": ["micrometer", "micrometer", "micrometer", "radians", "radians", "radians"]
}

tischi · 2021-03-08T15:18:03Z

@glyg That's an interesting use case! As mentioned above, I think this may be quite overlapping with #28 where we discuss how to map from data space (no units, just dimensions) to physical space (e.g. spatial or possibly angles). So maybe it could be useful to look at this issue and maybe re-post your example there.

constantinpape · 2021-03-13T14:52:57Z

I have proposed some initial changes in #39 to lift the 5d requirement, but otherwise did not change anything w.r.t. the current spec.
I will try to summarise the discussion here soon to see how to continue after #39 gets merged.

constantinpape · 2021-03-16T22:45:16Z

#39 now introduces axes as a MUST field in multiscales and allows up to 5 dimensions, with values for axes restricted to x, y, z, c, t. This change will be breaking with 0.1 and in the reviews @joshmoore remarked that it would a good idea to see if any of the potential changes we discussed here would be breaking with the (proposed) 0.2 again.

To summarize, I think we have discussed the following possible changes (relative to 0.2):

Allow more than 5 dimensions.
Allow arbitrary names in axes instead of just x, y, z, c, t
Add another field units to specify the physical dimension for each axis (side note: going back to the discussion in Transformation Specification #28 its unclear if this is necessary here or only in the transformation)

As far as I can see none of these changes would be breaking with the 0.2 proposal.
Anything I forgot here? Can anybody see issues with 0.2 that would require a breaking change in the future?

k-dominik · 2021-04-19T09:38:04Z

Hi - adding in a few cents here as well...

When I was reading it, I was thinking about what viewers would like best. I think this issue/discussion should allow a complete newcomer to design a super simple viewer, that enables rudimentary viewing of all data that claims to be ngff. One of the reasons people still go around using pngs, jpgs, tifs and the likes is that they can view them with their system image viewer, by simply drag and drop. Ever tried this with an hdf5 with the de-facto image viewer of the bioimage community - Fiji?! No dice. When the outcome of this discussion here is, we allow arbitrary data with arbitrary axes, then this is as good as doing nothing. No new developer will be able to come up with a viewer that makes sense based on the specification. I think this encourages fragmentation. No one would be able to "understand" the data. With a fixed, limited set of axes in the data/pixel/image/voxel space you could truly have a format that all viewers could support, where looking into the image space will look more or less the same in all. Isn't this one of the goals?

The semantic meaning of the axes and units and the likes can be handled by smarter viewers: depending on the application they might use the transformation (as discussed in #28).

k-dominik · 2021-04-20T08:00:23Z

Adding to the comment above: I think some axes should have fixed meaning and name: tzyx, the rest could be handled as channels by "naive" consumers, whereas applications, closer to the data can handle those in a specialized way.

joshmoore · 2021-04-27T11:57:52Z

See the new PR at #46

imagesc-bot · 2021-09-01T07:53:57Z

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-early-september-2021/55333/14

constantinpape · 2021-09-01T19:49:40Z

To summarize the current state:

Since v0.3 we allow 2 to 5d data and have the axes field, which labels each dimension and has allowed values tczyx (redundancy not allowed!)

I think it's straightforward to also add an optional field units with the same length as axes and this can be done in one of the next versions.

In addition, I can see two more controversial potential changes that lift the restrictions above:

allow arbitrary axis names (or some compromise with axis names with fixed meaning and implicit treatment of unknown names as channel, see Data dimensionality and axes metadata #35 (comment))
allow more than 5d data

I am personally more in favor of keeping the spec more restrictive, but we need to see if there are some important use-cases that cannot be covered with the current spec. This is also very relevant for the issue of specifying transformations.

constantinpape · 2021-09-01T20:03:16Z

Note also the proposal by @bogovicj and @axtimwalde here, which introduces a label, type and unit per dimension with a list of objects (=map/dict).
This diverges a bit from our current solution of having axes as list.
But it would be easy to have an equivalent solution using 3 lists, e.g. axes, axes_label and unit.

d-v-b · 2021-09-02T16:06:01Z

Once you've specified a unit (assuming it's an SI unit), you have basically already specified the axis type, no? So it seems like axis_type is unnecessary (and potentially confusing, if someone accidentally does something like {axis_type : time, unit: nm}

bogovicj · 2021-09-02T18:31:24Z

The below was discussed in the ngff meeting on 01 Sept 2020

A counter example might be channels acquired at different wavelengths (physical unit), which clashes with spatial domain.
Ideas:

Use a more general way of describing the domain that can describe a categorical / discrete axis
- how do we spec this? we should brainstorm
Use spatial frequency units instead of wavelength?
- i can imagine users and microscope vendors not liking this

tischi · 2021-09-06T13:30:12Z

Maybe the word channel is anyway a bit misleading? Maybe setup like in the BDV file format is more appropriate. For example, we sometimes acquire the exact same fluorescence "channel" in terms of emission wavelengths, but with a couple of different exposure times to accommodate for different sample brightness. Another example is to acquire the same emission wavelength but with different exposure wavelengths for some of the ratiometric sensor fluorophores. Thus associating "channel" very strictly with the emission wavelength band is maybe too limiting?

constantinpape · 2021-09-07T13:27:44Z

Follow up from last week's ngff meeting: there was fairly broad consensus that the axes label should be decoupled from the semantic meaning and in consequence a new field for the "semantic" axes type (time, space, channel (or similar, see comment by @tischi above). In addition, we want to add unit, which has some relation to type (e.g. type: time, unit: meter doesn't make sense, but there is not a strict one-to-one correspondence as @bogovicj pointed out above).
There was some additional discussions about allowing more than 5 dimensions and adding more axes types. My personal preference would be to not include these changes now, but rather make sure that the current changes allow extensibility to allow work on this in later versions.

I will start to work on spec v0.4 now and begin by making a PR for the changes laid out above; I will implement the solution that seems best to my judgment and try to lay out all discussion points I can see in the PR. We will announce once the PR is ready to be discussed on github and on image.sc.

thewtex · 2021-09-07T19:18:38Z

Thus associating "channel" very strictly with the emission wavelength band is maybe too limiting?

component could also be considered -- it is semantically more general but has the same non-space-time association, and it also starts with a c :-)

unidesigner · 2021-09-14T10:18:51Z

HI @constantinpape et al. Just wanted to make you aware of some of discussion around axes metadata in this neuroglancer issue. It'd be good to know how some of the discussions therein could be fed into the discussion/proposal process for the ome-ngff specs on axes metadata.

satra · 2021-09-25T13:36:40Z

as a slight aside: regarding units as text we have found this text representation quite useful: https://people.csail.mit.edu/jaffer/MIXF/CMIXF-12 and we adopted this in the BIDS standard (https://bids-specification.readthedocs.io/en/stable/99-appendices/05-units.html). here is a python library to support parsing: https://github.com/sensein/cmixf

constantinpape · 2021-10-01T09:21:28Z

I have started to put something together for the new axes metadata based on the discussions here in #57.
I am now working on transformations and will start a broader call for feedback once both proposals are done (given that these are linked), but feel free to comment on the axes metadata proposal already.

constantinpape · 2022-02-02T22:07:49Z

This is now implemented with v0.4 :).

This was referenced Feb 28, 2021

Transformation Specification #28

Closed

5D or List< 4D > #20

Closed

Collections Specification #31

Open

constantinpape self-assigned this Mar 7, 2021

constantinpape mentioned this issue Mar 13, 2021

Lift 5d requirement for images and move multiscales description into spec #39

Closed

constantinpape mentioned this issue Apr 12, 2021

add dimension characters i,d and f bioimage-io/spec-bioimage-io#69

Closed

thewtex mentioned this issue May 5, 2021

Compatibility with xarray #48

Open

This was referenced May 19, 2021

Add axes field to multiscale metadata #46

Merged

Support for multi-channel labels #19

Open

joshmoore mentioned this issue Aug 9, 2021

bump to ngff v0.3 (add support for axes) glencoesoftware/bioformats2raw#113

Closed

unidesigner mentioned this issue Sep 14, 2021

What are correct metadata attributes when adding a zarr array google/neuroglancer#333

Open

constantinpape mentioned this issue Oct 1, 2021

Extend the axes fields in multiscales metadata #57

Merged

jni mentioned this issue Dec 20, 2021

Proposal: explicit definition of a scene/depicted_world coordinate system napari/napari#3848

Open

bogovicj mentioned this issue Feb 2, 2022

Proposing spaces and transforms #94

Open

constantinpape closed this as completed Feb 2, 2022

oeway mentioned this issue Jun 14, 2023

BioImage.IO Meeting Minutes bioimage-io/bioimage.io#28

Open

Data dimensionality and axes metadata #35

Data dimensionality and axes metadata #35

Comments

constantinpape commented Feb 28, 2021

jni commented Mar 1, 2021

joshmoore commented Mar 2, 2021

tischi commented Mar 2, 2021

tischi commented Mar 2, 2021

d-v-b commented Mar 2, 2021

tischi commented Mar 3, 2021

d-v-b commented Mar 3, 2021

constantinpape commented Mar 4, 2021

tischi commented Mar 4, 2021

tischi commented Mar 4, 2021

jni commented Mar 5, 2021

tischi commented Mar 5, 2021

tischi commented Mar 5, 2021

joshmoore commented Mar 5, 2021

joshmoore commented Mar 5, 2021 • edited Loading

constantinpape commented Mar 5, 2021

joshmoore commented Mar 5, 2021

d-v-b commented Mar 5, 2021

joshmoore commented Mar 5, 2021

tischi commented Mar 5, 2021

axtimwalde commented Mar 5, 2021

glyg commented Mar 7, 2021

constantinpape commented Mar 7, 2021

joshmoore commented Mar 8, 2021

glyg commented Mar 8, 2021

tischi commented Mar 8, 2021 • edited Loading

constantinpape commented Mar 13, 2021

constantinpape commented Mar 16, 2021 • edited Loading

k-dominik commented Apr 19, 2021 • edited Loading

k-dominik commented Apr 20, 2021

joshmoore commented Apr 27, 2021

imagesc-bot commented Sep 1, 2021

constantinpape commented Sep 1, 2021

constantinpape commented Sep 1, 2021

d-v-b commented Sep 2, 2021

bogovicj commented Sep 2, 2021

tischi commented Sep 6, 2021

constantinpape commented Sep 7, 2021

thewtex commented Sep 7, 2021

unidesigner commented Sep 14, 2021

satra commented Sep 25, 2021

constantinpape commented Oct 1, 2021

constantinpape commented Feb 2, 2022

joshmoore commented Mar 5, 2021 •

edited

Loading

tischi commented Mar 8, 2021 •

edited

Loading

constantinpape commented Mar 16, 2021 •

edited

Loading

k-dominik commented Apr 19, 2021 •

edited

Loading