RFC 0007: Event loop refactor #7

straight-shoota · 2024-05-02T19:06:00Z

This PR has been merged. The RFC text is available at: https://github.com/crystal-lang/rfcs/blob/main/text/0007-event_loop-refactor.md

This RFC is very much work in progress and has a lot of unresolved questions which I intend to discuss here.

Preview: https://github.com/crystal-lang/rfcs/blob/rfc/0005/text/0007-event_loop-refactor.md

crysbot · 2024-05-03T12:49:07Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/rfc-0007-event-loop-refactor/6812/1

yxhuvud · 2024-05-03T16:44:37Z

text/0007-even_loop-refactor.md

+
+Should events from the Crystal runtime be part of the event loop as well?
+
+- Fiber: sleep


Yes, under the assumption that it is sleep with argument and not endless sleep. At least for io_uring the easiest implementation is to just emit the appropriate op (ie timeout) and it will just work in a way that is extremely straightforward (also note the even simpler implementation of Fiber#yield a few lines beneath it. That is the territory of the scheduler, but event loop and schedulers are not necessarily all that separate, in practice..).

yxhuvud · 2024-05-03T16:50:25Z

text/0007-even_loop-refactor.md

+- **Generic API**: Independent of a specific event loop design
+- **Black Box**: No interference with internals of the event loop
+- **Pluggable**: It must be possible to compile multiple event loop implementations and choose at runtime.
+  Only one type of implementation will typically be active (it would be feasible to have different event loop implementations in different execution contexts, if there's a use case for that).


(Feasible, but problematic). Some default values may be wanted to be different when using file descriptors in different event loops, like for example if sockets are opened in nonblocking mode or not. This is also quite problematic for the pluggable-at-runtime switching. :(

To properly solve this, I suppose it would require the opening of file descriptors to be passed to the event loop as well.

Deferring it to the event loop help a little but doesn't really solve it as it means fds cannot be passed around. So for example, it would not work to open the socket in one context, and then start to listen on it and spawn handlers in another. And if descriptors are not setup properly, the error symptoms would be weird. Not having that confusion and that footgun around is worth more than what it would fix.

My preference for linux in the really long term would be to keep the default to nonblocking until there is no longer any support for any linux version that doesn't have a sufficiently modern io_uring, and then switch the default once we have an implementation we are happy with. The latter can be implemented using poll ops until the switch (similar to the uring branch) - less neat and does more work than necessary, but it would work. But at some time it would be nice to switch as the polling adds a little overhead both to code and performance.

(FWIW, note that there is still cases where wait_readable and wait_writable is useful even with io_uring. perhaps their fate needs to be a separate discussion as they don't provide value/make sense on certain other platforms)

Yeah, I don't think it makes sense to move open fds between different event loop implementations. That's just going to be chaos. And I'm not sure it would even be very useful anyway.
However I could potentially see use cases for using one kind of event loop implementation for a part of an application, and another one in another part. Not saying that this definitely makes sense, but it might.

Yeah, I don't think it makes sense to move open fds between different event loop implementations.

So what you are saying is that each event loop need to open their own copy of std_* and keep track of them separately? And that it shouldn't be possible to safely pass file descriptors around without restrictions? That seems a lot more chaotic to me. The gains for adding restrictions like that on the user seems very tiny.

At least on linux/mac. No idea what would be the best choice on windows.

edit: I am not against allowing having loop-private fds, but I don't think it make sense as a default choice.

edit2: It could also make sense to have it unspecified. Allowing globality where it make sense (due to the fds actually being global like the linux default), or local (when they are not global, eg handles on windows or privately registered fds which is something uring can do).

As a use case I was imagining a set of file descriptors being handled by a separate event loop from the rest of the application. This could be a set of sockets for some specific communication purpose. There doesn't need to be any standard IO on such an event loop. Or any other interference with file descriptors that are not part of that set.

If I understand correctly we want to have multiple event loop implementations compiled together (i.e. have io_uring + epoll on Linux) but only one implementation will be activated after a runtime check for the whole program.

Couldn't the event loop implementation tell whether it wants O_NONBLOCK for example?

yxhuvud · 2024-05-03T16:53:22Z

text/0007-even_loop-refactor.md

+
+### Optional event loop features
+
+Some activities are managed on the event loop on one platform but not on others. Example would be `Process#wait` which goes through IOCP on Windows but on Unix it’s part of signal handling. (Note: Perhaps we could try to get that on the event loop on Unix as well? **🤔** But there are other examples of system differences)


linux has signalfd which could help, but I am not aware if mac/bsd have any good solutions.

BSD and Darwin have kqueue that can receive signals (and many other things).
Recent Linux kernels have pidfd that look even more lovely than signalfd.

It's good to know we have the option to put signals on the event loop. It's a different question whether we'd want that. It could be useful as it would help simplify the implementation, I imagine?

But this is just an example. The real question here is how to handle events that need to be on the event loop on one system and we cannot put them on on another one. Not sure if there's any currently relevant use case except signals, but it's something to ponder for future extensibility.

yxhuvud · 2024-05-03T17:00:17Z

text/0007-even_loop-refactor.md

+
+### Bulk events without fibers
+
+For some applications it might be useful to interact with the event loop directly, being able to push operations in bulk without having to spawn (and wait) a fiber for each one.


One could also think of the reverse. For example io_uring supports multishot accept, ie it is not necessary to rearm the op after getting one without emitting more events, so a usage of it could either spawn new fibers or reuse a set of existing fibers with each trigger.

text/0007-even_loop-refactor.md

crysbot · 2024-05-07T16:41:43Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/curious-about-the-eventloop-updates/6825/3

straight-shoota · 2024-05-07T17:17:22Z

text/0007-event_loop-refactor.md

+  # Opens a connection on *socket* to the target *address* and continues fiber
+  # when the connection has been established.
+  # Returns `IO::Error` but does not raise.
+  abstract def connect(socket : ::Socket, address : ::Socket::Addrinfo | ::Socket::Address, timeout : ::Time::Span?) : IO::Error?


The address type could perhaps be abstracted. We need to support both types, though they both boil down to the same interface: #to_unsafe returning a pointer to a LibC::Sockaddr and #size returning its size in bytes.

So it could be an option to define an interface for that and use that as type restriction. But we cannot actually make it strictly type-safe because the return type of #to_unsafe is Void* and we cannot enforce a pointer to LibC::Sockaddr (so Array would implement this interface for example).

Alternatively, we could remove the type restriction entirely because any type that fulfills the implicit interface should work.

I think it's better to be explicit, either with ::Socket::Addrinfo | ::Socket::Address or an abstract module interface.
Introducing a new model would also solve the issue that the Socket name space is not in core lib (see https://github.com/crystal-lang/rfcs/blob/rfc/0005/text/0007-event_loop-refactor.md#socket).

It's actually hard to stub out ::Socket::Addrinfo and ::Socket::Address because that would require defining the Socket namespace, which is a class that inherits IO and that's so much complexity we could just include socket.cr.

I've considered introducing a module which could be included in both types. But that's actually not a great solution for when we return an address.
So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Or to leave it as the union?

Those types are not in core lib. Their existence depends on require "socket". So leaving the union won't work as is. Trying to see if we can stub it out (or replace with an internal type) is one possible way to deal with that.

But I think a better approach is described in #7 (comment)

dsisnero · 2024-05-15T21:35:43Z

Here's a good link on CancelToken instead of looking for Timeouts or even signals - use a CancelToken and have those as subclasses of it https://vorpus.org/blog/timeouts-and-cancellation-for-humans/

straight-shoota · 2024-05-27T10:17:04Z

text/0007-event_loop-refactor.md

+#### Socket
+
+One instance of this problem shows already in the core features: The event loop interface has type restrictions of the `Socket` namespace in abstract defs, but `Socket` is not in the core lib.
+
+Options:
+
+- Omit those abstract defs (dilutes the interface, so not ideal)
+- Split `EventLoop` interface and add parts of it only with `require “socket”`
+- Add stub declarations for the involved types (`Socket::Handle` and `Socket::Address` - `Socket` itself is only used as parameter type which is technically okay for abstract methods)


I considered option 3 (stubbing the affected types in core lib) as maybe the most approachable solution, but I don't think it is.
We technically don't need to stub Socket because it's only used as type restriction in the parameters which are not validated by the abstract method checker. However, I believe this should eventually be covered as well, which would break this solution in the future. And I think it's generally better to be strict about abstract method implementations, even if the compiler won't complain.
Either way, I don't think it makes much sense to stub that many types (2 or 3) for the event loop API. It would probably be easier to just use the actual types directly.

Considering that we may need to handle optional event loop features eventually (see previous section), it might be a good exercise to start with sockets.

Here's how this could work:

Define a module Crystal::System::EventLoop::Socket

This module is empty by default.

module Crystal::System::EventLoop::Socket # empty end

In crystal/system/socket.cr, add abstract defs to Crystal::System::EventLoop::Socket (this could probably be in a different file).

module Crystal::System::EventLoop::Socket abstract def close(socket : ::Socket) : Nil # etc. end

All current event loop implementations implement the socket interface, so they can unconditionally include Crystal::System::EventLoop::Socket. There's no need to hide the implementation when sockets are not used.
We can even include it in Crystal::System::EventLoop (and pull it out when we ever get an event loop implementation that doesn't implement sockets).

class Crystal::LibEvent::EventLoop include Crystal::EventLoop::Socket def close(socket : ::Socket) : Nil # ... end # etc. end

In a program where sockets are not used, the abstract defs are not checked because the module is empty. The implementation methods are never called, so their type restrictions don't matter.

In a program where sockets are used, the module has abstract defs and their implementation will be checked.

I know I was an advocate for this style, but now that I see it I'm having a little itch. How would it look like with a split interface for the EventLoop? Say, EventLoop::SocketInterface, EventLoop::FileInterface, ... and then let each EL implementation include the module interfaces they comply to.

Yeah, that's basically what this is.
However, there's the problem that at the event loop implementation we can't know whether sockets are used in the program (i.e. if Socket is defined) or not. The event loop gets required as part of the prelude while require "socket" will only be in user code.

So the trick here is to have the socket interface empty by default and only fill it with abstract defs if sockets are used.
That makes it possible for the event loop implementation to always include the socket interface. We get abstract def checks if sockets are used.

text/0007-event_loop-refactor.md

beta-ziliani · 2024-05-27T19:20:32Z

text/0007-event_loop-refactor.md

+  # Opens a connection on *socket* to the target *address* and continues fiber
+  # when the connection has been established.
+  # Returns `IO::Error` but does not raise.
+  abstract def connect(socket : ::Socket, address : ::Socket::Addrinfo | ::Socket::Address, timeout : ::Time::Span?) : IO::Error?


So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Or to leave it as the union?

text/0007-event_loop-refactor.md

beta-ziliani · 2024-05-27T19:58:26Z

text/0007-event_loop-refactor.md

+#### Socket
+
+One instance of this problem shows already in the core features: The event loop interface has type restrictions of the `Socket` namespace in abstract defs, but `Socket` is not in the core lib.
+
+Options:
+
+- Omit those abstract defs (dilutes the interface, so not ideal)
+- Split `EventLoop` interface and add parts of it only with `require “socket”`
+- Add stub declarations for the involved types (`Socket::Handle` and `Socket::Address` - `Socket` itself is only used as parameter type which is technically okay for abstract methods)


I know I was an advocate for this style, but now that I see it I'm having a little itch. How would it look like with a split interface for the EventLoop? Say, EventLoop::SocketInterface, EventLoop::FileInterface, ... and then let each EL implementation include the module interfaces they comply to.

yxhuvud · 2024-06-25T19:12:21Z

Based on comment in the read_evented issue, I'm noting that support for wait_readable (and probably wait_writable, though that one is less used) would be nice to provide on the systems and implementations where it is possible. It is a fairly common ideom in C libraries on Linux, and it ties in extremely well with an event loop that is built on epoll, kevent or io_uring. Examples of usage: v4l2.cr, wayland_client. I don't know enough about the former to say if it actually needs to wait on readability or if the fixes in 1.12 was sufficient for that use case, but the latter would have to be rewritten from scratch without using system libraries if the ability to wait on readability wasn't there.

As these cannot be implemented on windows, but is at the same time very convenient to have access to where it is possible, I wonder if there should be something written about them in the rfc.

ysbaddaden · 2024-06-25T19:51:59Z

Damn, those are valid use cases.

straight-shoota · 2024-06-25T21:38:09Z

Obviously the event loop implementations for libevent, kqueue, epoll, io_uring etc. will always require something like wait_readable/wait_writable as building blocks. So that functionality must be available on supporting platforms.

But it probably fits best as a feature of the event loop implementation, not the file descriptor model. The former is platform-specific, while the latter is (supposed to be) platform-independent.
It would then be possible to directly interface with the event loop to implement the features that you mention.

ysbaddaden · 2024-06-27T12:52:23Z

@straight-shoota that's what I was thinking: an interface for the event loop to implement. Windows would raise NotImplementedError (until we can figure something, like running a WSAPoll in its own thread?)

ysbaddaden · 2024-07-23T18:44:37Z

After implementing the Epoll (and Kqueue) event loops:

I wish sleep was a mere Crystal::EventLoop.sleep(duration : Time::Span). The implementation would be much simpler and direct: create a timer event on the stack, suspend the fiber, done.
Crystal::EventLoop::Event is missing a #delete : Nil method (called when cancelling a select timeout).

crysbot · 2024-09-24T17:05:54Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/new-event-loop-unix-call-for-reviews-tests/7207/1

crysbot · 2024-10-10T12:30:54Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/new-event-loop-unix-call-for-reviews-tests/7207/16

crysbot · 2024-10-18T08:39:29Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1

straight-shoota · 2024-11-12T16:15:44Z

I have updated the RFC text to reflect the implemented status. So I think it's ready to be merged.

There are a number of Unresolved questions but I don't think they need to be resolved in the scope of this RFC.
We need to figure out what to do with them. Some might turn into their own actionable items. Others may just stay here for reference.

beta-ziliani

The last bit can be refined now to explain or at least comment what we ended up doing

text/0007-event_loop-refactor.md

beta-ziliani · 2024-11-13T12:31:53Z

There are a number of Unresolved questions but I don't think they need to be resolved in the scope of this RFC.
We need to figure out what to do with them. Some might turn into their own actionable items. Others may just stay here for reference.

We can place them under "future possibilities". In particular, the select part, which likely requires its own rfc

straight-shoota self-assigned this May 2, 2024

Event loop refactor

9a93501

straight-shoota force-pushed the rfc/0005 branch from fe631ed to 9a93501 Compare May 2, 2024 19:07

straight-shoota marked this pull request as draft May 2, 2024 19:07

straight-shoota changed the title ~~Event loop refactor~~ RFC #0007: Event loop refactor May 2, 2024

straight-shoota mentioned this pull request May 3, 2024

RFC: Refactor Crystal::EventLoop to disconnect it from LibEvent crystal-lang/crystal#10766

Closed

yxhuvud reviewed May 3, 2024

View reviewed changes

Blacksmoke16 reviewed May 7, 2024

View reviewed changes

text/0007-even_loop-refactor.md Outdated Show resolved Hide resolved

straight-shoota added 3 commits May 7, 2024 18:44

Fix filename

04a7c37

Fill guide-level explanation

c70c896

Add abstract interface description

ba1e1a4

straight-shoota commented May 7, 2024

View reviewed changes

straight-shoota mentioned this pull request May 24, 2024

Extract #system_read and #system_write for FileDescriptor and Socket crystal-lang/crystal#14626

Merged

straight-shoota added 2 commits May 27, 2024 12:15

Fix associate *Socket* subsection with *Optional event loop features*

3e40217

fixup

3bff4d1

straight-shoota commented May 27, 2024

View reviewed changes

text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved

beta-ziliani reviewed May 27, 2024

View reviewed changes

straight-shoota mentioned this pull request May 28, 2024

Drop Crystal::System::Socket#system_send crystal-lang/crystal#14637

Merged

straight-shoota added 5 commits May 28, 2024 15:03

Resolve *read and write behaviour*

9afc76f

Resolve *Timeout and Resume events*

9ab3a8e

Fix typo

0647def

Drop EventLoop#send, #receive

113e401

Separate EventLoop interface into modules

b1978fa

ysbaddaden mentioned this pull request Jul 16, 2024

Epoll event loop (linux) crystal-lang/crystal#14814

Closed

20 tasks

straight-shoota changed the title ~~RFC #0007: Event loop refactor~~ RFC 0007: Event loop refactor Nov 12, 2024

straight-shoota added 4 commits November 12, 2024 17:05

Update implemented API

8eee22e

Resolve Socket question

6334e16

cleanup

526b3d8

Add reference to RFC 0009

53a1861

straight-shoota marked this pull request as ready for review November 12, 2024 16:13

beta-ziliani reviewed Nov 13, 2024

View reviewed changes

ysbaddaden approved these changes Nov 14, 2024

View reviewed changes

Fix write/read docs

1061a51

straight-shoota mentioned this pull request Nov 14, 2024

Fix EventLoop docs for Socket read, write crystal-lang/crystal#15194

Merged

straight-shoota added 4 commits November 14, 2024 20:49

Drop unresolved questions

246c87f

Turn _Type for sizes_ into a future possibility

f74db24

Turn more open questions into future possibilities

70942fa

Drop *Rationale and alternatives*

b7da5cc

straight-shoota added the rfc label Nov 14, 2024

Add *Rationale and alternatives*

252f546

beta-ziliani approved these changes Nov 15, 2024

View reviewed changes

straight-shoota merged commit 0d3fb2a into main Nov 15, 2024

straight-shoota deleted the rfc/0005 branch November 15, 2024 22:56


		Should events from the Crystal runtime be part of the event loop as well?

		- Fiber: sleep


		### Optional event loop features

		Some activities are managed on the event loop on one platform but not on others. Example would be `Process#wait` which goes through IOCP on Windows but on Unix it’s part of signal handling. (Note: Perhaps we could try to get that on the event loop on Unix as well? 🤔 But there are other examples of system differences)


		### Bulk events without fibers

		For some applications it might be useful to interact with the event loop directly, being able to push operations in bulk without having to spawn (and wait) a fiber for each one.

RFC 0007: Event loop refactor #7

RFC 0007: Event loop refactor #7

Conversation

straight-shoota commented May 2, 2024 • edited Loading

crysbot commented May 3, 2024

yxhuvud May 3, 2024 • edited Loading

Choose a reason for hiding this comment

yxhuvud May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yxhuvud May 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crysbot commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsisnero commented May 15, 2024

straight-shoota May 27, 2024 • edited Loading

Choose a reason for hiding this comment

straight-shoota May 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yxhuvud commented Jun 25, 2024

ysbaddaden commented Jun 25, 2024

straight-shoota commented Jun 25, 2024

ysbaddaden commented Jun 27, 2024 • edited Loading

ysbaddaden commented Jul 23, 2024

crysbot commented Sep 24, 2024

crysbot commented Oct 10, 2024

crysbot commented Oct 18, 2024

straight-shoota commented Nov 12, 2024

beta-ziliani left a comment

Choose a reason for hiding this comment

beta-ziliani commented Nov 13, 2024

straight-shoota commented May 2, 2024 •

edited

Loading

yxhuvud May 3, 2024 •

edited

Loading

yxhuvud May 3, 2024 •

edited

Loading

yxhuvud May 21, 2024 •

edited

Loading

straight-shoota May 27, 2024 •

edited

Loading

straight-shoota May 27, 2024 •

edited

Loading

ysbaddaden commented Jun 27, 2024 •

edited

Loading