-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this project still actively being developed? #6
Comments
I am also keen to follow this effort. I see a lot of potential here. We could even bind to this implementation from Python if there are big performance benefits. |
Python bindings are in progress, generalizing the python bindings that already exist for the n5 library this evolved from. I don't know if speed will be a factor given there's certain to be more eyes on the reference python impl, but I'm mostly motivated in having a single, consistent backend for rust/ffi/cli/wasm and python, which has proved valuable for me for n5. |
For anyone watching this, I have a very rough end-to-end implementation of the accepted ZEP1 zarr v3 spec here, to be discussed here: zarr-developers/zarr-specs#244 I wouldn't anticipate getting significant speed gains out of a rust implementation as the slowest bits are going to be IO and (en|de)coding blocks, which in "python" are practically written in native code. But as said, a single implementation which could be bound to from different frontends is valuable, and trying to push the envelope on rust so we don't have to touch any C++ would be great. |
Sounds awesome, thank you! On the topic of speed: I definitely agree that, when reading a single chunk, there are unlikely to be any speed increases. But, when the user requests multiple chunks, my understanding is that zarr-python (when used on its own) currently reads chunks in series, and the most common way to parallelise Zarr in Python is to use zarr-python plus dask. But dask is an enormously heavy-weight bit of kit, and in my limited experience, often spends hundreds of milliseconds computing its plan of action. (Which rules it out for my use-case of loading Zarr on-the-fly during ML training). So I guess my naive hope was that a Rust implementation of Zarr might read and decompress chunks in parallel, without dask (similar to how tensorstore , TileDB, and the Blosc2 NDim layer (formerly called caterva) parallelise reads). I only mention this for context, and to test if my understanding is correct or not! I don't mean to put you under pressure to build a parallelised Zarr reader!!! I'm enormously grateful for all the work you've done already! |
FYI, the same capability - parallel chunk fetching and decompression - is being discussed in zarr python (zarr-developers/zarr-python#1398). There is a debate going on over whether the main performance opportunity is from
I'd be keen to get some evidence one way or another. |
Zarr.js is basically the style api for getting chunks I would want to use from a rust implementation, where there is control over the fetching of the chunks via concurrency. |
When writing zarr3, I have so far stayed synchronous for reasons of simplicity, as it was intended as a "does the spec work for strict languages" exploration rather than a production library from the outset. Because rust doesn't have an async runtime built-in, library authors need to decide which async library's ecosystem all of their users will have to buy into, which is a decision I was happy to punt on while trying to bash out an exploratory prototype... |
Makes a ton of sense, especially without async traits being stabilized yet. Sorry if I came off annoying! Just thinking out loud about the future, tons of excitement how this could be built upon |
Not at all, the excitement is very welcome! The new spec solves some problems for us so it'll be great to get it working in the wild. |
Hi!
I'm super-excited about trying out Zarr with Rust! I was just curious if you plan to continue developing this project?
(Absolutely no pressure! I totally know how hard it can be to sustain maintenance effort with multiple projects!)
(A tiny bit of background on me: I've been using Zarr in Python for several years. I'm only just learning Rust now. So I'm definitely not good enough with Rust to help yet. But I might be able to help with this Zarr Rust project if I continue learning Rust!)
The text was updated successfully, but these errors were encountered: