Replies: 5 comments
-
IO bandwidthFor the I/O data transfer bandwidth in the plot above, the first 4 jobs peak at only about 620 MB/s (about half the bandwidth that the drive is capable of). In contrast, if we use The full
This shows that, even when reading just a single file (as each |
Beta Was this translation helpful? Give feedback.
-
If anyone's interested, here are the details performance metrics output by
|
Beta Was this translation helpful? Give feedback.
-
I think my next step is to benchmark TensorStore. And then to start work on my Rust implementation of Zarr (starting with a bunch of performance experiments 🙂 ) |
Beta Was this translation helpful? Give feedback.
-
Fascinating stuff Jack! Thanks for sharing! The memory copies seem like a low hanging fruit. I know that the Arrow community is obsessed with avoiding memory copies. It would be interesting to dig into where unnecessary memory copies might be avoided in Zarr. It looks like the code is attempting to decompress directly into the target array memory when possible: |
Beta Was this translation helpful? Give feedback.
-
This should be completely eliminated for the case of contiguous (non-strided) reads: you can readinto a memory space or decompress directly into it, no need to copy. numcodecs allows an out= argument for this purpose, is it not being used? |
Beta Was this translation helpful? Give feedback.
-
Here are some detailed performance profiles of Zar-Python, Numpy, and
fio
.In all cases, we read the entirety of a 1 gigabyte dataset from local SSD. (Using my very modest and aging Intel NUC, with a PCIe v3 SSD, capable of only ~ 2 GB/sec). The 1 GB dataset is a repeated (monochrome) image of a tarte tatin 🙂.
TL;DR: These analyses add support for what we already knew about Zarr-Python:
io_uring
).Details:
The image below shows the Intel VTune "Input and Output" analysis of
Zarr-Benchmark
running a total of 6 different jobs. Each job is run 3 times (so we can see the variation between runs). The 6 jobs and dataset are defined here. In summary, they are:numpy.load
to read an entire 1 GB.npy
file into RAM.The rows in the VTune screenshot show:
Beta Was this translation helpful? Give feedback.
All reactions