Best practices for generating multiscale zarr data? #215

GenevieveBuckley · 2022-07-24T08:09:25Z

What is the current best practice for generating & saving a multiscale zarr array, given a single resolution of that data?

I gather things have changed a lot recently with the improvements to OME NGFF, so I feel like I need to ask the question. I've talked to a few people who say they use a python script they or someone else in the lab wrote, but then say it might be a little bit hacky and they're not completely sure if it's compliant with the latest NGFF.

I've looked at the docs, but it hasn't completely clarified things for me. The write_multiscale function seems like the best option, but requires users to have already generated the resolution levels externally (so the question is still, what is the best practice recommendation for that). Worse, write_multiscale appears to only take in a list of numpy arrays, which is a little odd. If I could reliably fit my high resolution data in memory as a numpy array, I wouldn't need to use zarr at all.

The regular function for writing a zarr array seems to have a keyword argument for a downsampling function, but not much information on what that function should be like, or how to use the feature. (Unless I've just missed it, please point me to the right section of the docs if there's more info somewhere!)

The text was updated successfully, but these errors were encountered:

constantinpape · 2022-07-26T14:02:25Z

Hi @GenevieveBuckley,
there are convenience functions for also creating the multi-scales in ome_zarr. Here's an example workflow script I wrote to demonstrate the usage:https://github.com/ome/ome-ngff-prototypes/blob/main/workflows/spatial-transcriptomics-example/convert_transcriptomics_data_to_ngff.py#L39-L64
(Though I fully agree that overall this needs to be better documented ...)

Also note that using the local_mean option is currently not working, see #217, but you can e.g. use nearest instead.

(Sorry, closed by accident)

joshmoore · 2022-07-26T15:34:31Z

@toloudis / @will-moore: thoughts on the rolling out of (and/or testing of) #192 here?

toloudis · 2022-07-27T00:16:34Z

makes sense to me. At best, it will probably lead to improvements and may verify some of the performance issues I was seeing with large data and dask resizing.

It might also be instructive to look at this Pull Request in aicsimageio, building on top of #192: AllenCellModeling/aicsimageio#381 , which includes a ipynb file demonstrating loading a single resolution image and saving a multiresolution zarr. Inside the OmeZarrWriter is the code that forwards the arrays to ome-zarr-py

joshmoore · 2022-08-03T19:38:49Z

👍 @GenevieveBuckley, just one more minor change on that PR and then I'll get it released. Happy to have some testing either before or after.

toloudis · 2022-12-22T00:32:53Z

I'm also interested in optimal implementations for generating downsampled data for large datasets.
There are many alternate implementations to the Scaler -- one intriguing one is here: https://github.com/spatial-image/multiscale-spatial-image , which seems to be nice and general, and dask-ready, but I have not attempted to use it yet.

constantinpape closed this as completed Jul 26, 2022

constantinpape reopened this Jul 26, 2022

GenevieveBuckley mentioned this issue Jul 27, 2022

Blogpost idea: how to generate multiscale image arrays dask/dask-blog#141

Open

droumis mentioned this issue Jan 5, 2024

[GOAL] Demo viewing of large multi-chan timeseries data with multi-time-resolution generator and dynamic accessor holoviz-topics/neuro#87

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for generating multiscale zarr data? #215

Best practices for generating multiscale zarr data? #215

GenevieveBuckley commented Jul 24, 2022

constantinpape commented Jul 26, 2022 •

edited

Loading

joshmoore commented Jul 26, 2022

toloudis commented Jul 27, 2022

joshmoore commented Aug 3, 2022

toloudis commented Dec 22, 2022

Best practices for generating multiscale zarr data? #215

Best practices for generating multiscale zarr data? #215

Comments

GenevieveBuckley commented Jul 24, 2022

constantinpape commented Jul 26, 2022 • edited Loading

joshmoore commented Jul 26, 2022

toloudis commented Jul 27, 2022

joshmoore commented Aug 3, 2022

toloudis commented Dec 22, 2022

constantinpape commented Jul 26, 2022 •

edited

Loading