Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this project still actively being developed? #6

Open
JackKelly opened this issue Jan 26, 2023 · 9 comments
Open

Is this project still actively being developed? #6

JackKelly opened this issue Jan 26, 2023 · 9 comments

Comments

@JackKelly
Copy link

Hi!

I'm super-excited about trying out Zarr with Rust! I was just curious if you plan to continue developing this project?

(Absolutely no pressure! I totally know how hard it can be to sustain maintenance effort with multiple projects!)

(A tiny bit of background on me: I've been using Zarr in Python for several years. I'm only just learning Rust now. So I'm definitely not good enough with Rust to help yet. But I might be able to help with this Zarr Rust project if I continue learning Rust!)

@rabernat
Copy link

I am also keen to follow this effort. I see a lot of potential here. We could even bind to this implementation from Python if there are big performance benefits.

@aschampion
Copy link
Contributor

Python bindings are in progress, generalizing the python bindings that already exist for the n5 library this evolved from. I don't know if speed will be a factor given there's certain to be more eyes on the reference python impl, but I'm mostly motivated in having a single, consistent backend for rust/ffi/cli/wasm and python, which has proved valuable for me for n5.

@clbarnes
Copy link
Contributor

For anyone watching this, I have a very rough end-to-end implementation of the accepted ZEP1 zarr v3 spec here, to be discussed here: zarr-developers/zarr-specs#244

I wouldn't anticipate getting significant speed gains out of a rust implementation as the slowest bits are going to be IO and (en|de)coding blocks, which in "python" are practically written in native code. But as said, a single implementation which could be bound to from different frontends is valuable, and trying to push the envelope on rust so we don't have to touch any C++ would be great.

@JackKelly
Copy link
Author

JackKelly commented May 23, 2023

Sounds awesome, thank you!

On the topic of speed: I definitely agree that, when reading a single chunk, there are unlikely to be any speed increases.

But, when the user requests multiple chunks, my understanding is that zarr-python (when used on its own) currently reads chunks in series, and the most common way to parallelise Zarr in Python is to use zarr-python plus dask. But dask is an enormously heavy-weight bit of kit, and in my limited experience, often spends hundreds of milliseconds computing its plan of action. (Which rules it out for my use-case of loading Zarr on-the-fly during ML training).

So I guess my naive hope was that a Rust implementation of Zarr might read and decompress chunks in parallel, without dask (similar to how tensorstore , TileDB, and the Blosc2 NDim layer (formerly called caterva) parallelise reads).

I only mention this for context, and to test if my understanding is correct or not! I don't mean to put you under pressure to build a parallelised Zarr reader!!! I'm enormously grateful for all the work you've done already!

@rabernat
Copy link

FYI, the same capability - parallel chunk fetching and decompression - is being discussed in zarr python (zarr-developers/zarr-python#1398). There is a debate going on over whether the main performance opportunity is from

  • asynchronous fetching of chunks from a remote store (already supported in python somewhat via fsspec) - doesn't necessarily require true parallelism, just async concurrency
  • actual parallel (e.g. multithreaded) decompression of chunks

I'd be keen to get some evidence one way or another.

@mpiannucci
Copy link

Zarr.js is basically the style api for getting chunks I would want to use from a rust implementation, where there is control over the fetching of the chunks via concurrency.

@clbarnes
Copy link
Contributor

clbarnes commented Jun 16, 2023

When writing zarr3, I have so far stayed synchronous for reasons of simplicity, as it was intended as a "does the spec work for strict languages" exploration rather than a production library from the outset. Because rust doesn't have an async runtime built-in, library authors need to decide which async library's ecosystem all of their users will have to buy into, which is a decision I was happy to punt on while trying to bash out an exploratory prototype...

@mpiannucci
Copy link

mpiannucci commented Jun 16, 2023

When writing zarr3, I have so far stayed synchronous for reasons of simplicity, as it was intended as a "does the spec work for strict languages" exploration rather than a production library from the outset. Because rust doesn't have an async runtime built-in, library authors need to decide which async library's ecosystem all of their users will have to buy into, which is a decision I was happy to punt on while trying to bash out an exploratory prototype...

Makes a ton of sense, especially without async traits being stabilized yet.

Sorry if I came off annoying! Just thinking out loud about the future, tons of excitement how this could be built upon

@clbarnes
Copy link
Contributor

Not at all, the excitement is very welcome! The new spec solves some problems for us so it'll be great to get it working in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants