-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object_store: Using io_uring
?
#4631
Comments
I would probably want to see some numbers and go from there, it isn't immediately obvious to me that io_uring would be beneficial for reading immutable chunks of data from disk, especially if the workload is doing any non-trivial computation alongside. The major argument I've heard is for systems doing mixed IO, or with custom buffer pooling, neither of which object store is |
OK, cool, that's good to know. Thank you for your quick reply. No worries at all if Just to make sure... please let me give a little more detail about what I'd ultimately like to do... First, some context: Zarr has been around for a while. As you probably know, the main idea behind Zarr is very simple: We take a large multi-dimensional array and save it to disk as multi-dimensional, compressed chunks. The user can request an arbitrary slice of the overall array, and Zarr will load the appropriate chunks, decompress them, and merge them into a single We're now exploring ways to use multiple CPU cores in parallel to load, decompress, and copy each decompressed Zarr chunk into a "final" array, as fast as possible. (Many Zarr users would benefit if Zarr could max-out the hardware). If we were to implement our own IO backend using Would you say that |
I would suggest first getting something simple working with tokio::spawn, or some other threadpool abstraction, and the existing APIs, and then go from there. I would recommend against reaching for solutions like io_uring until you have confirmed that simpler solutions are insufficient, from what I understand of your use-case I'm not sure io_uring would yield tangible benefits. |
Just a quick update... I am hoping to provide some benchmarks within a few months. More details here: JackKelly/light-speed-io#27 |
I've got no horse in this race, but AnyBlob and their paper are worth a look: https://github.com/durner/AnyBlob They're using io_uring to accelerate cloud object downloads. |
Which part is this question about
object_store
's code.Describe your question
For Zarr, we may want to read on the order of 1 million parts of files per second (from a single machine). It's possible that the only way to achieve this performance will be to use
io_uring
to send many IO operations to the Linux kernel using just a single system call.Would
object_store
ever consider implementing an asyncio_uring
backend forget_ranges
? (I may be able to write the PR, with some hand-holding!)Additional context
io_uring is a newish feature of the Linux kernel that allows for requesting many IO operations with a single system call - including local file operations and network operations - without any memory copying, and with minimal system calls. Some database folks seem pretty excited about io_uring. Some benchmarks show that io_uring can deliver almost 20x more IOPs for random reads than the previous approach.
The text was updated successfully, but these errors were encountered: