-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokio-uring design proposal #1
Conversation
spawn(async move { ... }); | ||
} | ||
}) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd probably be best for there to be a clear model for how we might move an IO object from one thread to another, should such advanced load balancing become required. For example, a webserver might find that a bunch of high bandwidth clients ended up on a single thread, overloading that thread.
I'm actually of the opinion that an explicit runtime handle might be the way to deal with this. Consider something like this sketch:
struct LocalRuntime { ... } // !Sync
fn spawn_new_group<R, F: Future<Output=R> + Send>(f: FnOnce(&LocalRuntime)->F) -> GroupHandle<R>;
impl LocalRuntime {
fn spawn<R, F: Future<Output=R> + Send>(f: impl FnOnce(&LocalRuntime)->F) -> SpawnHandle<R>;
}
impl TcpStream {
fn bind_to_runtime<'a>(self, runtime: &'a LocalRuntime) -> LocalTcpStream<'a>;
}
Here we're spawning a bunch of tasks in a group, which lives on a single runtime. The tokio runtime can migrate this group, if needed (perhaps when instructed by the application); this might involve tracking the uring tasks issued by this group, and either forwarding results cross-thread or cancelling and reissuing those tasks during migration. Within the group, we can avoid synchronization overhead most of the time - ie, when we're not actively in the middle of a migration - as we know that all of our IO object handles are not shared between threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a system could be implemented as an additional crate or at the application level. I am inclined to punt for now and explore balancing strategies in a "real app".
impl File { | ||
async fn read_at(&self, buf: buf::Slice, pos: u64) -> BufResult<usize>; | ||
|
||
async fn write_at(&self, buf: buf::Slice, pos: u64) -> BufResult<usize>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, it should support writing from shared buffers, such as Bytes
or Arc<[u8]>
. I think using AsRef<[u8]>
would actually be the best option here, since it's the most flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bytes
is possible, but probably better supported as an IoBuf
backend. Supporting a trait is possible but adds complexity/overhead as the trait would probably need to be boxed into a trait object and stored internally while the operation is in-flight.
|
||
```rust | ||
impl TcpStream { | ||
async fn close(self) { ... } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from or relate to AsyncWrite
's shutdown
? Could these two APIs be unified somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an "instead" question.
For glommio, I always provide a shutdown and/or close functions.
However there are situations in which it is simply not possible to call them. One example are traits that contains consuming functions, but there are also situations for instance where an LSM compaction finishes operating on a file stream, but there are still readers (you can have logic to wait for the readers, it is just too complex)
The best way is to provide indeed a close function and steer users towards using it, but you'll have to choose between one of the mechanisms for close-on-drop.
I heavily prefer async-close-on-drop. The "too-many-files" is not a big problem in practice, because the queueing into the ring is always synchronous, and ultimately you have control over when to dispatch.
It's even possible to keep a counter of files that are asked-to-close-but-not-yet-closed, and force io_uring_submit
dispatch when the number becomes too big. Calling the system call itself is non-blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not quite sure what you're talking about here - both shutdown
and close
are asynchronous.
Either way, after reading more in to the difference between shutdown and close I finally understand why they are different: shutdown
only shuts down the write end of the TcpStream
and doesn't take ownership, whereas close
is shutdown followed by actually closing the fd. I still think it's a bit confusing to have two AsyncDrop
-like mechanisms, but I don't see an obvious way around it.
assert_eq!(slice.capacity(), 100); | ||
``` | ||
|
||
A trait argument for reading and writing may be possible as a future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need 'static
constraints. like https://github.com/quininer/ritsu/blob/master/src/actions/io.rs#L16
// ensures there are a minimum number of workers | ||
// in the runtime that are flagged as with capacity | ||
// to avoid total starvation. | ||
current_worker::wait_for_capacity().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that "worker" is a new noun in this API? If so, is spawn_on_each_thread
more like "spawn on each worker?" Or should this be current_thread::wait_for_capacity
instead?
But also, if the implication is that the only way to saturate your runtime is to always be accepting connections on all threads that are dedicated to it, should spawn_on_each_thread
take a FnMut() -> impl Future
instead to remove the loop
? The loop
seems error prone, because if it ever exited (due to panic/etc) you'd have an idle thread spawned by the runtime, whereas with a function the runtime could restart threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be current_thread
, this snippet is just to illustrate the concept and not to include within tokio-uring. Either way, we can explore the options outside of the crate.
/// | ||
/// This is implemented as a new type to implement std::ops::Try once | ||
/// the trait is stabilized. | ||
type BufResult<T> = (std::io::Result<T>, buf::Slice); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit awkward because users cannot directly use ?
.
Maybe we need a custom Error type that allows fd
and buffer
to be take out from it. or let user pass &mut Option<Buf>
, like https://github.com/quininer/ritsu/blob/master/src/actions/io.rs#L18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newtype could implement a method to convert into a Result
and drop the buffer on errors, I suppose. let buf = file.read_(0, buf).await.result()?;
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller needs to get the buffer back on success. It is awkward, but also isn't intended to be the primary read / write API. The buf stream API (described later) will be more common.
my_tcp_stream.close().await; | ||
``` | ||
|
||
The resource must still tolerate the caller dropping it without being explicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, tokio-uring will close the resource in the background, avoiding blocking the runtime.
We can be optimistic that closing will not cause congestion, because we also do this under epoll.
The drop handler will move ownership of the resource handle to the runtime and submit cancellation requests for any in-flight operation. Once all existing in-flight operations complete, the runtime will submit a close operation.
This sounds quite complicated, and maybe it's easier to use reference counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"move ownership" is probably a poor choice of words here. It is a logical transfer of ownership. The value is already stored in the runtime's internal storage. The implementation changes the operation state to Ignored
which signals to the runtime that it owns that state now.
socket, to the kernel using the submission queue. The kernel then performs the | ||
operation. On completion, the kernel returns the operation results via the | ||
completion queue and notifies the process. The `io_uring_enter` syscall flushes | ||
the submission queue and acquires any pending completion events. Upon request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for a specific syscall to "acquire a completion event" ? (i.e. The syscall itself cannot own responsibility for handling that event, right? So is it handing off that responsibility to whomever in user space invoked the syscall ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or, I guess since the syscall is allowed to block the thread, it can have that responsibility? But below you say waiting for a minimum number of completion events; not all of the events... so I'm still confused I guess.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acquire at the memory ordering level (think atomic load operation w/ Acquire ordering).
This read function takes ownership of the buffer; however, any pointer to the | ||
buffer obtained from a value becomes invalid when the value moves. Storing the | ||
buffer value at a stable location while the operation is in-flight should be | ||
sufficient to satisfy safety. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically safe to return a new slice or pointer on each call to as_mut()
, as bonkers as that may be, so it might need something like unsafe trait TrustedAsMut: AsMut<[u8]> {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we hold ownership of T and guarantee to call as_mut only once, I think this is trusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, but I am pretty sure that as long as as_mut()
is called only once and is kept at a stable location, it should be fine.... but that is also why I am punting 😆
kind: Kind, | ||
} | ||
|
||
enum Kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just for concept purposes? Do you plan to instead actually use like a vtable design kind of like in bytes
? Otherwise accessing the slice will require a match every single time, which means the performance downsides of SBO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might eventually, but I am not worrying about it now. It could be changed to avoid the branch on deref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, eventually, we might "just" use bytes::Bytes.
Overall there's a lot here that's similar to ringbahn is specifically designed to be externally extensible in a number of ways:
My impression is extensibility in this manner has never been a priority for tokio, so I'm not surprised to see the differences. Its certainly easier if you don't do this. The other big difference is that ringbahn's provided IO handles use AsyncBufRead instead of passing ownership in and out. I think any good implementation should support both (ringbahn doesn't provide the ownership passing API, but I believe it would be straightforward to implement on top of its existing APIs). |
for the duration of their lifecycle. Each runtime thread will own a dedicated | ||
submission and completion queue pair, and operations are submitted using the | ||
submission queue associated with the current thread. Operation completion | ||
futures will not implement `Send`, guaranteeing that they remain on the thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have quite a large impact on the ecosystem. Is there a possibility to make the futures be Send
, but only incur a significant performance penalty if polled on another thread?
For example, the source of the !Send
bound appears to be the necessity to access the driver state when polling:
https://github.com/tokio-rs/tokio-uring/blob/master/src/driver/op.rs#L15
I could imagine a solution where the Op
stores an Arc<MagicCell<driver::Inner>>
. The idea behind MagicCell
is that it would perform a runtime Sync
check (ie. check that the accessing thread is the same as the thread it was created on) before allowing access to the interior.
If this check fails, the Op
would do something like this pseudocode:
let (tx, rx) = Proxy::new();
let original_op = mem::replace(self, rx);
original_thread.send(original_op.and_then(tx));
As long as tasks are fairly sticky, then the overhead would only be incurred for in-progress ops when a task is actually moved. Of course, there's the question of whether the overhead of the Arc
and MagicCell
would outweigh the benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible but, part of the goal for tokio-uring, is to provide a synchronization-free option.
Related to the "ecosystem", there already are !Send
options out there (e.g. glommio), so it already is an issue to deal with.
How do you see this being used? Is the idea that application authors would directly build on top of If so, would it make sense to split the new public API (and associated execution model) from the underlying uring-based implementation, so that applications are still building against a platform-independent interface? (Even if the only back-end is linux specific right now) AIUI io-uring is basically identical to IO completion ports on windows, with the innovation that operations can be submitted without a system-call, so it's plausible that there would be more than one "backend" that would benefit from this execution model. And it would also be desirable to have a fallback implementation based on mio for platforms that cannot support io-uring or other backend directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one quick nit
io_uring is similar to IOCP in that they're both completion-based, but it has one major difference that makes it really hard to abstract over both: the threading model. io_uring naturally lends itself to a ring-per-thread model where each thread is self-contained and manages its own I/O. For Rust, this means thread-local runtimes, little synchronization and lots of One way to reconcile this difference is to always use a runtime-per-thread model, so on Windows there would be an IOCP instance for each thread (I don't know enough about IOCP to say whether this is a bad idea). Alternatively, there could be one global io_uring instance protected by a mutex, or an io_uring instance for each thread where submission queues are shared with a mutex but completion queues are thread-local. I really don't know which solution is best; they all have disadvantages when run on platforms that don't natively support the model. I think in an ideal world, io_uring would support multiple submission/completion queues on a single |
Internally, byte streams submit read operations using the default buffer pool. | ||
Additional methods exist to take and place buffers, supporting zero-copy piping | ||
between two byte streams. | ||
|
||
```rust | ||
my_tcp_stream.fill_buf().await?; | ||
let buf: IoBuf = my_tcp_stream.take_read_buf(); | ||
|
||
// Mutate `buf` if needed here. | ||
|
||
my_other_stream.place_write_buf(buf); | ||
my_other_stream.flush().await?; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to support IORING_OP_SPLICE
or IORING_OP_TEE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes via inherent methods on relevant types. It should fit into the model proposed here.
|
||
// Read the first 1024 bytes of the file, when `std::ops::Try` is | ||
// stable, the `?` can be applied directly on the `BufResult` type. | ||
let BufResult(res, buf) = file.read_at(0, buf.slice(0..1024)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we don't use Index
here rather than having a separate slice function? We could do like buf[0..1024]
and both should panic when the start or end index is out of the capacity.
I wonder if there is a way to avoid all the bound checking when passing the buffer over and back, because the size is usually very large but at the same time can still be expanded at runtime (Vec).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably because Index
is required to return a reference but .slice
has to return an owned value.
I feel like keeping io_uring and IOCP runtimes separate would be the best choice. There's no question about a thread per core architecture being a better fit for io_uring, and while IOCP can be used that way too (and pretty efficiently afaik if you create the completion port with a concurrency value of one), it would still be a bunch of additional code to maintain and logic to generalise. Note that IOCP is also typically used with the threadpool provided by the Windows kernel itself, as illustrated by the .NET TPL (even though that will change in .NET 6). The way I see it, "vanilla" tokio already does an amazing job at being an efficient and portable runtime. If you're willing to pay the complexity cost of using a thread per core architecture and dealing with completion based IO for the extra performance, you probably already know where that code is going to run. Separate, targeted runtimes seem like the obvious choice in that scenario, especially since depending on io_uring itself isn't particularly portable (a portable runtime without support for BSD and Darwin, and 90% of Linux machines out there (just throwing random numbers here), doesn't sound like the best way to build portable stuff). |
enum State { | ||
// An in-flight read operation, `None` when reading into | ||
// a kernel-owned buffer pool | ||
Read { buffer: Option<buf::Slice> }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to have two different State variations where we have a generic for kernel-owned buffer pool and Vec (if they are always the same)? Will one state be None
and another state be Some
for Read
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be, but before making that tweak I would like to measure the impact. It doesn't materially change the design though so we can explore it in the future.
} | ||
|
||
/// The caller allocates a buffer | ||
let buf = buf::IoBuf::with_capacity(4096); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about letting IoBuf::with_capacity return a Result
so that it can support fallible allocations with having to provide separate try_*
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean "without having to provide separate try_*
methods? I would default to following std
's lead here w/ Vec
. The question is whether or not supporting fallible allocation should be the primary use case.
// ensures there are a minimum number of workers | ||
// in the runtime that are flagged as with capacity | ||
// to avoid total starvation. | ||
current_worker::wait_for_capacity().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that "worker" is a new noun in this API? If so, is spawn_on_each_thread
more like "spawn on each worker?" Or should this be current_thread::wait_for_capacity
instead?
But also, if the implication is that the only way to saturate your runtime is to always be accepting connections on all threads that are dedicated to it, should spawn_on_each_thread
take a FnMut() -> impl Future
instead to remove the loop
? The loop
seems error prone, because if it ever exited (due to panic/etc) you'd have an idle thread spawned by the runtime, whereas with a function the runtime could restart threads.
|
||
let socket = listener.accept().await; | ||
|
||
spawn(async move { ... }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The socket: TcpStream
is not itself a uring "operation", afaict, so it could be Send
. Does that imply that load balancing could still happen between accepting connections and servicing them?
Load balancing of tasks to threads would then happen during spawn
. You would spawn
N tasks to accept
(having all threads accept
seems like overkill?), and upon accept
they would spawn
new tasks (which would likely go to other threads) to service the connections. The runtime would need to decide which thread to target each time it spawned a task, which could be tricky (long lived cheap tasks vs short lived expensive tasks, etc).
From an API perspective, the task would still be !Send
, but the spawn
function would require Send
for its argument(s) in order to move the inputs to the task onto its permanent home thread.
This question is particularly important I think because the effect of current_worker::wait_for_capacity
seems to be that users end up creating their own thread pools (i.e. "a pool of threads fed by a queue") with the assistance of the spawn_on_each_thread
function. In the accept case, the accept
call is what waits on your queue: but other usecases that spawn work recursively would need to wait on an explicit queue of work after their wait_for_capacity
call (if they actually wanted to balance across all threads that were available, rather than having all spawned recursion pinned once it enters the runtime).
It's not clear that any extra flexibility provided by this API justifies having applications need to implement their own load-balancing around spawn
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. Just noticed that this is addressed at the end in https://github.com/tokio-rs/tokio-uring/blob/design-doc/DESIGN.md#load-balancing-spawn-function . The idea that you might have load-balanced vs non-load-balanced spawn functions is interesting.
This read function takes ownership of the buffer; however, any pointer to the | ||
buffer obtained from a value becomes invalid when the value moves. Storing the | ||
buffer value at a stable location while the operation is in-flight should be | ||
sufficient to satisfy safety. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as_mut() creates a borrow, so it's already not possible to move the value until read
returns; the problem is ensuring the lifetime of T is at least as long as the event's completion (ie the hazard is Drop
).
|
||
If the drop handler must process closing a resource in the background, it will | ||
notify the developer by emitting a warning message using [`tracing`]. In the | ||
future, it may be possible for Rust to provide a `#[must_not_drop]` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem compatible with panicking, and is needlessly unergonomic for the actual benefit it brings. Consider a TCP proxy that spawns many tasks, each of which splits two TcpStreams into two halves and copies between them concurrently.
In the happy case, you join on the (read A -> write B) and (read B -> write A) tasks then drop the sockets. There are no outstanding operations after joining, so the sockets may close immediately.
In an error case such as the A socket being reset, the (read A) task returns immediately with an error that indicates the (read B) task is doomed to fail too. There are two distinct possibilities:
- we propagate the error from (read A) and 'gracefully' terminate the proxying task, awaiting on
close()
etc. before the task is finally dropped - we panic from (read A) and all the tasks and sockets are necessarily dropped
The runtime must still handle the panicking case; the only difference in (1) is that we keep an application task running for the duration. It's not obvious what accounting benefit this brings - as surely the wait_for_capacity
function suggested earlier would be dependent on resource usage rather than task count which may vary significantly between application types.
This comes at the cost of forcing every function that takes ownership of an I/O object to await on close() and therefore required to be async, or some wrapper type written to handle this with a custom Drop
that spawns the close()
, or (more likely) be rewritten using mutable references only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct re: panicking, which means drop must always be safe and correct, though not necessarily ideal. The idea of #[must_not_drop]
is that it would be a lint and not enforced.
Just like with #[must_use]
lints, if the developer means to drop and let cleanup happen in the background, they can silence the error (via some TBD mechanism).
// Checked-out from a pool | ||
match my_buffer_pool.checkout() { | ||
Ok(io_buf) => ..., | ||
Err(e) => panic!("buffer pool empty"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "the buffer pool is empty" is the only possible error here, I'd prefer Option
over Result
, since it doesn't really feel like an error condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, there are conflicting patterns in std. I don' have a huge opinion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general observation has been that Option
should be used if the alternative case a) carries no information; b) is common, not exceptional; and c) is unlikely to later be propagated as an Err
. I think all of those apply here, and therefore that Option
is the better choice. The exception would be if you anticipate that checkout()
can fail for different reasons as well. But even in that case, I'd prefer Result<Option<Pool>, E>
.
thread-pool. The current API would remain, and the implementation would use | ||
io-uring when supported by the operating system. The tokio-uring APIs may form | ||
the basis for a Tokio 2.0 release, though this cannot happen until 2024 at the | ||
earliest. As an intermittent step, the tokio-uring crate could explore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/intermittent/interim/
:)
operation explicitly. The application also can create its own set of io-uring | ||
queues using the io-uring crate directly. | ||
|
||
### Load-balancing spawn function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About load balance, If we do clientside load balancing, the server tokio-uring should support which feature? We should mention is .
The queue's single-producer characteristic optimizes for a single thread to own | ||
a given submission queue. Supporting a multi-threaded runtime requires either | ||
synchronizing pushing onto the submission queue or creating a queue pair per | ||
thread. The tokio-uring runtime uses an isolated thread-per-core model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 'thread-per-core' model here imply CPU affinity as well? I wonder sched_setaffinity()
will be planned or not as I could not see related implementation code yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread-per-core in this case refers to its use of a single, current-thread runtime, where, if users want more worker threads, they can spawn more runtimes in parallel.
Co-authored-by: Daniil Bondarev <[email protected]>
Co-authored-by: Graham Scheaffer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to get this in. We can have further discussion in new PRs to make changes.
Rendered
The RFC proposes a new asynchronous Rust runtime backed by io-uring as a new crate: tokio-uring. The API aims to be as close to idiomatic Tokio, but will deviate when necessary to provide full access to io-uring's capabilities. It also will be compatible with existing Tokio libraries. The runtime will use an isolated thread-per-core model, and many types will be
!Send
.The repo contains a proof-of-concept implementation.