Zarr-Python → Benchmarking and Performance #1479
Replies: 17 comments 65 replies
-
Thanks so much for kicking off this discussion, @MSanKeys963 & @rabernat. MeetingsTo everyone else: If you're interested in helping to measure & improve Zarr's performance then please join our half-hourly meetings which will be held every two weeks, starting in September. Please don't worry if you don't have time to write code! It will still be very useful to hear different perspectives and use-cases in these meetings.
Let's work on the assumption that we'll hold our subsequent meetings every two weeks, at the same time-of-day and day-of-week as our first meeting. (Although, if that's not convenient then by all means raise that in the first meeting). Other things to read (if you want to!)If you're itching to get started then here are some more things to read and comment on (if you want!):
Very quick intro to meFinally, I should give a super-quick intro: I'm a co-founder of Open Climate Fix. I've been using (and loving) Zarr since early 2019. I've written a lot of Python code over the last ~12 years (most recently: data-processing and machine learning for solar power forecasting). But I haven't contributed to Zarr before, so I'm a newbie in terms of the Zarr-Python codebase. I'm super-excited about Rust, and have been learning Rust because I hope that Rust could help us speed up Zarr, but I'm still a few months away from being productive in Rust. I love reading about high-performance code (maximising CPU cache hits, io_uring, the performance characteristics of SSDs vs HDDs etc.) but I haven't actually written that much high-performance IO code. Which is all to say: I'm eager to help. But I don't make any claims about being especially knowledgeable! So the community's help & guidance will be essential 🙂 |
Beta Was this translation helpful? Give feedback.
-
Some thoughts I've had while writing zarr3:
|
Beta Was this translation helpful? Give feedback.
-
Of course it always depends on the use case and array configuration. From my experience with zarrita and sharding, the performance bottleneck is less at the IO layer but more at the (inner) chunk handling. With sharding, we write out not so many, fairly large objects. With an async IO implementation, this is already fairly fast in Python. |
Beta Was this translation helpful? Give feedback.
-
I am only just seeing this long thread, so please allow me some time to catch up. I am interested, please include me. I thought I'd quickly point out that rfsspec already supports suffix ranges; and that cramjam shows a nice pattern for encoding/decoding compressors with optional python bindings. I think it's a mistake to attach to arrow, you will never get them to make the changes we require. |
Beta Was this translation helpful? Give feedback.
-
One random thought: This thread has already highlighted that there are loads of use-cases and platforms and software dependencies to consider - and I'm sure many more will come to light soon. I don't know how you folks feel, but it feels to me like we have more questions than answers at the moment (which is good!). So I think we shouldn't put undue pressure on ourselves to try to architect the "perfect" all-singing all-dancing high-performance Zarr stack in one go. Instead, after we benchmark some existing Zarr implementations, my guess is that the next step will be to follow the lead of projects like How does that sound? |
Beta Was this translation helpful? Give feedback.
-
Update: I've included the link for the project board in the description. |
Beta Was this translation helpful? Give feedback.
-
Some quick updates:
|
Beta Was this translation helpful? Give feedback.
-
Meeting times for our kick-off meetings!Thank you to everyone who filled out the poll. Unfortunately there is no single time that everyone can make (because timezones 🙂). In fact, 3 people have 3 completely disjoint availabilities! So I'd propose that we have two kick-off meetings in Sept:
If you can make both meetings, then please don't feel under any obligation to attend both meetings! @joshmoore, I'm really sorry but neither of these meeting times work for you. Would 12:00 UTC work for you on Mon 2nd Oct and/or Mon 16th Oct? |
Beta Was this translation helpful? Give feedback.
-
I wonder if we shouldn't have an explicit mention of super-zarr parallelism, particularly dask? The number of threads, async IO and batching will surely have a different impact when dask is also parallelising over the top. For instance, dask has long set thread spawning in the libraries it calls to one, since there are already about one thread per core at work. Also, there are other parallel libraries out there - they may have the same concerns, but maybe not. |
Beta Was this translation helpful? Give feedback.
-
(Just to flag up that I'll be going on family holiday within the next few days, and will then be on holiday for the rest of August. But please don't let that stop others discussing things! I just didn't want you to think I was being rude by not replying!) |
Beta Was this translation helpful? Give feedback.
-
Update: As discussed during the first meeting on 9/18, I've added bi-weekly meetings for the benchmarking & performance group to the Zarr Community Calendar. See #1479 (reply in thread). Meeting details here: https://zarr.dev/community-calls/ |
Beta Was this translation helpful? Give feedback.
-
My personal takeaways from the benchmarking presentation today, as I understood it. The workflow was single-pass of a large amount of zarr uncompresed data on local disk. The majority of the IO time was in
This suggests to me the low-hanging fruit of:
Whether or not the memcopy would be noticeably faster could be tested by preallocating and reusing a sufficiently large numpy buffer and .readinto that, which is safe if the IO happens in a single thread. For strided copy, as in the benchmark workflow, I suspect it makes no difference. I suspect that dask threading over the IO would make little or no difference since disk operations block, and the memcopy is probably saturating the bus, but it ought to be tried. |
Beta Was this translation helpful? Give feedback.
-
It occurs to me, that parallel decompression in the existing async chunk loading logic should be pretty simple: use run_in_executor to farm such CPU tasks to threads. So long as the algorithm releases the GIL, this would be enough. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone! 👋🏻 Update: We're trying to find a new time for the bi-weekly meetings for this group to avoid ongoing conflicts. I've also emailed everyone from the group, but in case I missed you, please refer here. After the results, I'll update the community calendar as well. Thank you everyone for your time and efforts! Appreciate it! |
Beta Was this translation helpful? Give feedback.
-
I've published some detailed performance analyses (using Intel VTune and Zarr-Benchmark) of Zarr-Python (and numpy) here: zarr-developers/zarr-benchmark#22 |
Beta Was this translation helpful? Give feedback.
-
Just a quick reminder that we have a Zarr Benchmarking & Performance Zoom meeting today! Details are on the Zarr Community Calendar. And here's the agenda. Looking forward to it 🙂 |
Beta Was this translation helpful? Give feedback.
-
Oooh, it turns out that the NVMe v2 standard enables SSDs to provide key-value storage (and compression) on the device. And can support tiny chunks (down to 1 byte!). So the SSD can basically do everything we need for an Zarr Store! More details here (sorry for cross-posting): |
Beta Was this translation helpful? Give feedback.
-
Hi everyone! 👋🏻
Recently, we had meetings on
July 6th and 7th, 2023
(meeting notes) led by @rabernat to discuss and decide the path forward for Zarr-Python development. After a good discussion with the attendees and gauging their interest, we decided to divide the larger group into two groups; they are:Thank you @JackKelly and @jhamman, for stepping up! 🙏🏻
What's next? 👀
This discussion thread aims to kick off the 📈 Benchmarking and Performance 📈 working group and hold any top-level discussions related to the development work.
Here's the project board to organise and track progress: https://github.com/orgs/zarr-developers/projects/4
Thank you, everyone, for joining the meetings and sharing your insights. Please feel free to ask any questions.
@JackKelly, please take it from here.
Beta Was this translation helpful? Give feedback.
All reactions