Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for generating multiscale zarr data? #215

Open
GenevieveBuckley opened this issue Jul 24, 2022 · 5 comments
Open

Best practices for generating multiscale zarr data? #215

GenevieveBuckley opened this issue Jul 24, 2022 · 5 comments

Comments

@GenevieveBuckley
Copy link

What is the current best practice for generating & saving a multiscale zarr array, given a single resolution of that data?

I gather things have changed a lot recently with the improvements to OME NGFF, so I feel like I need to ask the question. I've talked to a few people who say they use a python script they or someone else in the lab wrote, but then say it might be a little bit hacky and they're not completely sure if it's compliant with the latest NGFF.

I've looked at the docs, but it hasn't completely clarified things for me. The write_multiscale function seems like the best option, but requires users to have already generated the resolution levels externally (so the question is still, what is the best practice recommendation for that). Worse, write_multiscale appears to only take in a list of numpy arrays, which is a little odd. If I could reliably fit my high resolution data in memory as a numpy array, I wouldn't need to use zarr at all.

The regular function for writing a zarr array seems to have a keyword argument for a downsampling function, but not much information on what that function should be like, or how to use the feature. (Unless I've just missed it, please point me to the right section of the docs if there's more info somewhere!)

@constantinpape
Copy link
Contributor

constantinpape commented Jul 26, 2022

Hi @GenevieveBuckley,
there are convenience functions for also creating the multi-scales in ome_zarr. Here's an example workflow script I wrote to demonstrate the usage:https://github.com/ome/ome-ngff-prototypes/blob/main/workflows/spatial-transcriptomics-example/convert_transcriptomics_data_to_ngff.py#L39-L64
(Though I fully agree that overall this needs to be better documented ...)

Also note that using the local_mean option is currently not working, see #217, but you can e.g. use nearest instead.

(Sorry, closed by accident)

@joshmoore
Copy link
Member

@toloudis / @will-moore: thoughts on the rolling out of (and/or testing of) #192 here?

@toloudis
Copy link
Contributor

makes sense to me. At best, it will probably lead to improvements and may verify some of the performance issues I was seeing with large data and dask resizing.

It might also be instructive to look at this Pull Request in aicsimageio, building on top of #192: AllenCellModeling/aicsimageio#381 , which includes a ipynb file demonstrating loading a single resolution image and saving a multiresolution zarr. Inside the OmeZarrWriter is the code that forwards the arrays to ome-zarr-py

@joshmoore
Copy link
Member

👍 @GenevieveBuckley, just one more minor change on that PR and then I'll get it released. Happy to have some testing either before or after.

@toloudis
Copy link
Contributor

I'm also interested in optimal implementations for generating downsampled data for large datasets.
There are many alternate implementations to the Scaler -- one intriguing one is here: https://github.com/spatial-image/multiscale-spatial-image , which seems to be nice and general, and dask-ready, but I have not attempted to use it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants