Plans for converting the data #2

constantinpape · 2020-11-18T16:52:13Z

@tischi, I wrote a couple of mails with @joshmoore today and as far as I understand the current plan is the following:
We don't ship the data to josh and instead convert and upload it locally.

I have a converter script and I am pretty sure it does the right thing, but I have a couple of other questions:

Should we put the new data in a separate bucket? I can ask Josep to create one.
Do we keep the same folder structure as for the other mobie projects?
I would suggest not to add the full res raw data, but only the 100nm version. Which data do we add apart from that.

P.S I made a new issue because #1 got a bit crowded.

tischi · 2020-11-18T19:27:30Z

Should we put the new data in a separate bucket? I can ask Josep to create one.

Yes, why not. Let's call it i2k-2020

Do we keep the same folder structure as for the other mobie projects?

From my point of view we don't need any folder structure because there will be only three files (see the very first post here: #1).
But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.

I would suggest not to add the full res raw data, but only the 100nm version.

Yes! Excellent suggestion!

Which data do we add apart from that.

As said above: in terms of files see the very first post here: #1

I am not sure about the table. I don't think @joshmoore has something yet ready to store the table in zarr format?!

And ❤️ for helping!

constantinpape · 2020-11-18T19:42:56Z

From my point of view we don't need any folder structure because there will be only three files (see the very first post here: #1).
But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.

Ok, in that case I would just add a single root zarr file with three multiscale datasets:

platy.zarr/
  em-raw/
     ...
  em-segmentation-cells/
     ...
  prospr-myosin/
    ...

I am not sure about the table. I don't think @joshmoore has something yet ready to store the table in zarr format?!

We could just store it as a 2d dataset with column names in the header, but I think there is indeed not a NGF format for tables yet.

Anyway, I will start with the volumetric data and let you know once I have something. (I will probably just start with the myosin volume, so @joshmoore can check it out once I have put it on the bucket and after we make sure the format is correct we add the larger files).

tischi · 2020-11-19T09:00:41Z

Related to this: #3

If we want to use the MoBIE infrastructure the most straightforward would be if there would be somewhere an images.json file (like this one) pointing to three bdv.xml files (like this one) with <ImageLoader format="bdv.n5.zarr.s3">. If we would do this, we may "only" have to get this done (and some hopefully small add-ons in MoBIE) in order to have a working example to further iterate on.

constantinpape · 2020-11-19T09:20:19Z

pointing to three bdv.xml files (like this one) with <ImageLoader format="bdv.n5.zarr.s3">

If we do this there are a few questions about the file layout, because we cannot simply use what I suggested here, because bdv assumes fixed paths inside the dataset (setup0/timepoint0, ...).

I see three options:

we go back to having one root zarr per volume and each has a single setup and timepoint
we use a single root zarr and single xml and store the different volumes as setups
we change the bdv.n5.zarr.s3 format so that we allow specifying a custom pathInFile to support a single root zarr

joshmoore · 2020-11-19T10:26:25Z

But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.

I don't think so.

I don't think @joshmoore has something yet ready to store the table in zarr format?!

There is some work now on an initial format:

which briefly looks like this:

/opt/data/6001240.zarr $ cat labels/0/.zattrs
{
    "image-label": {
        "properties": [
            {
                "label-value": 1,
                "class": "foo"
            },
            {
                "label-value": 2,
                "class": "bar"
            }
        ],
        "colors": [
            {
                "label-value": 1,
                "rgba": [
                    128,
                    128,
                    128,
                    128
                ]
            },

Ok, in that case I would just add a single root zarr file with three multiscale datasets:

Also ok.

constantinpape · 2020-11-19T10:42:51Z

But I don't know if for @joshmoore's current vision of the ome.zarr file-format they somehow should be in the same zarr container, because they are part of one dataset. @joshmoore would need to say.

I don't think so.

Ok, let's discuss the layout tomorrow in the meeting.

There is some work now on an initial format:

* [ome/omero-cli-zarr#50](https://github.com/ome/omero-cli-zarr/pull/50)

* [ome/ome-zarr-py#61](https://github.com/ome/ome-zarr-py/pull/61)

* [ome/ome-zarr-py#63](https://github.com/ome/ome-zarr-py/pull/63)

This will produce large jsons in our case :). But we can give it a try; and in the future we can hopefully switch to storing the table as a zarr array.

tischi · 2020-11-19T10:44:57Z

But we can give it a try; and in the future we can hopefully switch to storing the table as a zarr array.

For the testing, you could just write one feature value, like size.

tischi · 2020-11-19T11:59:58Z

Personally, if I would like to get something working within one week until i2k, I would do the following:

Store data like this on EMBL S3

images.json
a.xml
b.xml
c.xml
a.zarr
b.zarr
c.zarr

Copy all the code from https://github.com/joshmoore/n5-zarr/tree/s3zarr into a branch of MoBIE
Work within the MoBIE branch until we can read the images into BDV
Take it from there, e.g. factor out the s3zarr stuff into its own repo again, discuss metadata a.s.o.

joshmoore · 2020-11-19T12:25:46Z

This will produce large jsons in our case :)

Yup. Definitely aware. I had tried the zarr array solution but ran into saalfeldlab/n5#73 (comment) Also discussed possible integrate with Parquet etc last night on the community call. Open to thoughts.

constantinpape · 2020-11-19T12:31:53Z

@tischi your plan sounds good. I can def. set up 1. :). Will try to do as much as possible there before the meeting tomorrow and then we can finalize the plan before i2k.

constantinpape · 2020-11-19T13:57:04Z

@joshmoore I uploaded one multiscale dataset to our new bucket.

Could you please check that you can access it?
Here's the details:

ServiceEndpoint: https://s3.embl.de
BucketName: i2k-2020
PathInBucket: platy.ome.zarr   (this is the zarr root)

If you can access it, can you check if the dataset at prospr-myosin is compatible with the zarr multiscale format?

Thanks!

joshmoore · 2020-11-19T21:05:14Z

Hi @constantinpape,

The .zattrs that's in ...ome.zarr/ will need to be in the prospr-myosin/ directory

aws --no-sign-request --endpoint-url=https://s3.embl.de s3 ls --recursive s3://i2k-2020/platy.ome.zarr/ | grep /.z
2020-11-19 14:49:21        400 platy.ome.zarr/.zattrs
2020-11-19 14:49:21         24 platy.ome.zarr/.zgroup
2020-11-19 14:49:21        327 platy.ome.zarr/prospr-myosin/s0/.zarray
2020-11-19 14:49:21        327 platy.ome.zarr/prospr-myosin/s1/.zarray
2020-11-19 14:49:21        327 platy.ome.zarr/prospr-myosin/s2/.zarray
2020-11-19 14:49:21        321 platy.ome.zarr/prospr-myosin/s3/.zarray

constantinpape · 2020-11-20T08:20:03Z

The .zattrs that's in ...ome.zarr/ will need to be in the prospr-myosin/ directory

Thanks for checking!
I fixed it in the code.

constantinpape · 2020-11-20T12:19:08Z

I added the data according to what we discussed, see #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans for converting the data #2

Plans for converting the data #2

constantinpape commented Nov 18, 2020

tischi commented Nov 18, 2020 •

edited

Loading

constantinpape commented Nov 18, 2020

tischi commented Nov 19, 2020 •

edited

Loading

constantinpape commented Nov 19, 2020

joshmoore commented Nov 19, 2020

constantinpape commented Nov 19, 2020

tischi commented Nov 19, 2020

tischi commented Nov 19, 2020

joshmoore commented Nov 19, 2020

constantinpape commented Nov 19, 2020

constantinpape commented Nov 19, 2020

joshmoore commented Nov 19, 2020

constantinpape commented Nov 20, 2020 •

edited

Loading

constantinpape commented Nov 20, 2020

Plans for converting the data #2

Plans for converting the data #2

Comments

constantinpape commented Nov 18, 2020

tischi commented Nov 18, 2020 • edited Loading

constantinpape commented Nov 18, 2020

tischi commented Nov 19, 2020 • edited Loading

constantinpape commented Nov 19, 2020

joshmoore commented Nov 19, 2020

constantinpape commented Nov 19, 2020

tischi commented Nov 19, 2020

tischi commented Nov 19, 2020

joshmoore commented Nov 19, 2020

constantinpape commented Nov 19, 2020

constantinpape commented Nov 19, 2020

joshmoore commented Nov 19, 2020

constantinpape commented Nov 20, 2020 • edited Loading

constantinpape commented Nov 20, 2020

tischi commented Nov 18, 2020 •

edited

Loading

tischi commented Nov 19, 2020 •

edited

Loading

constantinpape commented Nov 20, 2020 •

edited

Loading