-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
okio-async module for Kotlin coroutines based asyncio #814
Comments
I'm interested to hear thoughts on this approach. My rationale for this approach is that those using multiplatform kotlin to interact with okio should probably also reach for coroutines for async capabilities. Therefore it makes sense to break this out into it's own module which can only really work with Kotlin. However, I do forsee a bit of code duplication. Particularly, things like Additionally, the tracking approach above doesn't include other inherently async I/O functions. For example, |
Additionally, rather than pooling our own coroutines and interacting with libc, there are a lot of pre existing libraries out there which solve part of this for us. For example, |
Let's back up a bit... what's the goal? Supporting coroutines callers? Supporting 10K concurrent socket connections? Everytime I look into async I/O I find myself comparing it against blocking I/O for API and performance and I find blocking I/O wins. |
My 2c as unrelated observer - Kotlin coroutines and JVM+MPP IO are an unsolved problem. Consider this a problem statement by someone just happened to be walking by. At the moment I end up with a lot of utility methods like
I'm not even clear this is correct or optimal. I'd love this solved for me in a Kotlin coroutines first way by Okio for common file, socket and stream/writer operations. |
One challenge with designing coroutines APIs is giving callers control the boundaries. Consider this set of functions:
The Adding suspending functions in Okio would potentially lead users to write needlessly slow code, because they’d introduce lots of invisible dispatcher switches. For example, imagine a suspending Moshi that parsed JSON by just making all the functions in BufferedSource suspending. Though the code would be fully coroutines-friendly, it would be a performance disaster. |
Ah sorry. I thought the desire for non-blocking I/O in okio was already set in stone.
I think this question can also be answered by asking the question "why should non-blocking I/O be used at all?" I've done some research, and it's true. In terms of throughput, blocking I/O wins. Especially when using the thread per connection model. The numbers I saw quoted most event based I/O achieves 75% of the throughput that blocking I/O does. Why is this? Well, using tools like epoll expose events to the user on when I/O events are ready to be consumed. Consuming these events and figuring out when and where to resume code execution of, say, a couroutine is scheduling. What are operating systems really, really good at? Scheduling. It's one of their core responsibilities. So, the effect of this is that I/O scheduling is moved from kernel space to user space. Using the thread per connection models allows for the (linux) kernel to perform scheduling versus an inferior scheduler like the one found in kotlinx.coroutines. This isn't the whole story though. There are other confounding factors that explain why blocking I/O throughput reigns superior over event based I/O. For example, event based I/O requires more CPU time to process events than blocking I/O. In a completely single threaded program, non-blocking I/O spends most of it's time bouncing between events that need to be processed and less time actually processing the events. So, again. Why use non-blocking I/O at all? It's a trade off. Non-blocking I/O requires less memory (because of the limited amount of threads used) but does require more CPU time as well as dealing with inferior schedulers. Using non-blocking I/O on memory constrained devices like a low end Android phone, or a raspberry pi could prove to be beneficial in these scenarios. In order to avoid the single threaded non-blocking I/O problem, it's a well established pattern to have a single thread that observes the availability status of I/O events and 1 (or more) worker threads that process those events. In a concrete example, I am working on an implementation of the BitTorrent protocol (albeit slowly). The thread per connection model is less suitable because of the protocol's nature. A single torrent can possibly have hundreds of peers. Spawning (or pooling) a thread for each of those connections would have a huge memory footprint. If each thread were to take 1-2MB with 200 connections per torrent, you're looking at 200-400MB of memory to download a file. That also doesn't include the memory needed to do in memory transformations of the data, like SHA1/SHA256 block verifications. Because these trade-offs exist with I/O I think it makes sense to offer |
|
@yschimke if your code assumes that context-switching to the IO dispatcher is free, then you should just write blocking code. |
That seems like a harsh interpretation, by default it does what you were hoping it would. But if it's on an incompatible dispatcher it will switch, and if it needs to it cause another thread to come alive. Seems nicer than ForkJoinPool.managedBlock. |
I found this article interesting on the topic of non-blocking I/O. Specifically the part about https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-will-revolutionize-programming-in-linux/ |
Related, and also making the rounds: https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a
…On Thu, Nov 26, 2020, at 11:47 PM, Kevin Cianfarini wrote:
I found this article interesting on the topic of non-blocking I/O. Specifically the part about `io_uring`
https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-will-revolutionize-programming-in-linux/
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#814 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQIEMT5OTDTPRSH3UBRLDSR4VMXANCNFSM4TWQKBTA>.
|
So there's two trends that converge:
What's amazing is when we have all of this stuff, developers will write obvious blocking code and the platform & libraries will transform it into fast IO events. Where does Okio fit?! My ideal is it's just Loom, and the VM has code to map blocking I/O calls to io_uring work automatically. Without Loom, we do suspend functions and try to get the same effect via the compiler virtualizing threads rather than the runtime virtualizing threads. The catch I'm hung up on is that suspend functions aren't as fast to call as blocking functions. So a Moshi or Wire that used a new suspending Okio would perform terribly. |
Can you elaborate? |
I've been thinking about this comment over the past few days.
My understanding of the situation is this. Kotlin coroutine So, this is inherently slower because it might turn non-branchy code into branchy code. Code which doesn't need to block and suspend written in a suspend function wouldn't have an impact though, I don't think? If we have a suspend fun that doesn't call any other suspend functions, then I don't think that there would be any finite state machine inserted at compile time. Furthermore, Loom (I think) achieves non-blocking I/O that looks like blocking I/O via their own continuation implementation. Although I only looked at this briefly, the logic seems to be the same. A finite state machine is generated that manages the state of the continuation. Are there additional performance concerns that |
Loom is JDK 17 at best, otherwise effectively next LTS is 23. So it's years away from working with various other frameworks, when you don't control the whole deployment e.g. app servers or hosted setups. Android never? While Kotlin coroutines are supported on old Android and JDKs. So reality is some solution that bridges between these two worlds is what users will adopt. |
Here’s a gist that compares three equivalent functions:
This function is typical of the inner-loop stuff that Okio is best at. It is written in to a different standard than conventional code. In particular, we’re trying to avoid allocations, function calls, polymorphic calls, and even member field access. The more work we can do on the CPU and the CPU alone, the happier we are. Calling from this function to a suspending function 2x per byte processed is likely to have a significant cost! Plus there’s the cost of boxing every byte and the result. I think coroutines are great! But unfortunately just adding |
So it seems that most of these problems are derived from the continuation itself.
I'm curious to know what your thoughts are on some of the difference between kotlinx.coroutines and Loom and how they quell some of your concerns. As far as I'm aware, Loom continuations will be storing the stack frame as member variables as well. The JVM managing this will have to instantiate new continuations as well. As for the other two, I'm not so sure.
While interpreting those buffers which are awaiting data...what would the caller do? We could have a non-suspend function that checks to see if data is available. The other option seems to be blocking. |
I expect Loom will be significantly faster than Kotlin coroutines because Loom can use the JVM’s existing callstack, whereas Kotlin coroutines needs an separate mechanism to track the callstack. |
Going back to the BitTorent motivating use case, what about a design that separates the CPU-bound parsing work that uses BufferedSource from the I/O bound work that uses Sockets? A sketch:
This is a base primitive that we can implement with NIO, Native, and even blocking potentially. Callers don’t get high-granularity suspending (because that’s a performance trap), but protocol implementers get a way to manage many sockets on a small number of threads. Protocol implementations would slice their protocol into frames, and write a small bit of tricky code to make sure an entire frame is ready before processing it. Heck, we could use this in OkHttp. We might even be able to borrow some code from SelectRunner, which was where I originally smashed into the difficulties of integrating coroutines with Okio. |
@swankjesse really nice analysis. Makes sense. Now stepping back is idea we would target solving both these problems by having an efficient infra optimised version internally, but at the key user abstraction (HTTP Request for OkHttp) we'd make sure bridging to coroutines works nicely and simply.
What are the key user abstractions for File IO? File.readLines? socket.readMoshiObject? What is required in the filesystem abstraction to make that work nicely? Anything? |
For files my first instinct is to treat random access and beginning-to-end as separate APIs. Random access is much less frequently needed in my experience. It came up in LeakCanary and .dx merging. I'd like to defer designing this. For beginning-to-end, programs alternate between two complimentary tasks: I/O syscalls to move data between memory and disk, and computational work to encode or decode that data. Our goal is to saturate disk and CPU and to maximize throughput by limiting context switching. I expect our opportunities are:
I don't think there's much need for accessing N files concurrently with less than N threads. Probably the best upside for async is an event-driven API like the above to avoid context switching. As with the sockets example, the events should be on big boundaries (entire file?!) and not per byte. |
Thinking about
We build this then we see the performance consequences of using it in OkHttp’s Http2Connection? In theory we can shrink the number of reader threads from 1 per Socket to 1 per ConnectionPool. |
I like the AsyncDispatcher approach. I'm interested to play with it and see where complexities arise is a highly concurrent protocol like BitTorrent.
Something to think about -- although Linux specific, io_uring allows for asyncio on both sockets and files with the options to poll instead of making heavy kernel system calls. It's possible to perform I/O using io_uring with very few kernel to user space context switches. |
For JVM - I think this class could usefully bridge from coroutines safe code to blocking regular java IO.
Rather than duplicating all the readX and writeX methods, switch to Dispatchers.IO and allow safe calls, potentially with a NonCancellable context thrown in for good measure? |
I would like to know whether there is any new progress of OKIO-ASYNC module, |
@s949492225 what's your use case? |
@swankjesse Just read or write files without blocking the thread |
Got it. Would you consider moving your I/O operation to another thread using Dispatchers.IO? That way control flow switches threads once (very efficient) and not once per byte (very inefficient). |
@swankjesse |
Blocking the IO thread is the best we can do. All async frameworks keep an extra IO thread (or multiple) just to do blocking select() calls. We're a year or two out from Loom, which will dramatically lower the scalability cost of blocking a thread. When that lands, Okio’s blocking model will be both the simplest and most efficient. |
@swankjesse I see. Thank you |
No action planned here. Eager to see what kotlinx.io does! Kotlin/kotlinx-io#163 |
I'm aware that the okio team has made attempts at integrating coroutines but have ultimately dropped the issue in hopes that Loom would solve for this. Seeing that okio is becoming increasingly multiplatform, Loom would only solve part of the problem.
I'm proposing that this issue tracks progress on a multiplatform okio artifact,
okio-async
, which implements non-blocking and asynchronous I/O for the following platforms.epollio_uring)There has been prior work done on this. In particular, I think that okiox by @bnorm will be a good starting resource.
The following is a tracked list of functionalities which should be made available.
Tracking
AsyncBuffer
which serves the same purpose asBuffer
, however it implementsAsyncSource
as well asAsyncSink
.The text was updated successfully, but these errors were encountered: