Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 0007: Event loop refactor #7

Merged
merged 27 commits into from
Nov 15, 2024
Merged

RFC 0007: Event loop refactor #7

merged 27 commits into from
Nov 15, 2024

Conversation

straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented May 2, 2024

This PR has been merged. The RFC text is available at: https://github.com/crystal-lang/rfcs/blob/main/text/0007-event_loop-refactor.md


This RFC is very much work in progress and has a lot of unresolved questions which I intend to discuss here.

Preview: https://github.com/crystal-lang/rfcs/blob/rfc/0005/text/0007-event_loop-refactor.md

@straight-shoota straight-shoota self-assigned this May 2, 2024
@straight-shoota straight-shoota marked this pull request as draft May 2, 2024 19:07
@straight-shoota straight-shoota changed the title Event loop refactor RFC #0007: Event loop refactor May 2, 2024
@crysbot
Copy link

crysbot commented May 3, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/rfc-0007-event-loop-refactor/6812/1


Should events from the Crystal runtime be part of the event loop as well?

- Fiber: sleep
Copy link

@yxhuvud yxhuvud May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, under the assumption that it is sleep with argument and not endless sleep. At least for io_uring the easiest implementation is to just emit the appropriate op (ie timeout) and it will just work in a way that is extremely straightforward (also note the even simpler implementation of Fiber#yield a few lines beneath it. That is the territory of the scheduler, but event loop and schedulers are not necessarily all that separate, in practice..).

- **Generic API**: Independent of a specific event loop design
- **Black Box**: No interference with internals of the event loop
- **Pluggable**: It must be possible to compile multiple event loop implementations and choose at runtime.
Only one type of implementation will typically be active (it would be feasible to have different event loop implementations in different execution contexts, if there's a use case for that).
Copy link

@yxhuvud yxhuvud May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Feasible, but problematic). Some default values may be wanted to be different when using file descriptors in different event loops, like for example if sockets are opened in nonblocking mode or not. This is also quite problematic for the pluggable-at-runtime switching. :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To properly solve this, I suppose it would require the opening of file descriptors to be passed to the event loop as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring it to the event loop help a little but doesn't really solve it as it means fds cannot be passed around. So for example, it would not work to open the socket in one context, and then start to listen on it and spawn handlers in another. And if descriptors are not setup properly, the error symptoms would be weird. Not having that confusion and that footgun around is worth more than what it would fix.

My preference for linux in the really long term would be to keep the default to nonblocking until there is no longer any support for any linux version that doesn't have a sufficiently modern io_uring, and then switch the default once we have an implementation we are happy with. The latter can be implemented using poll ops until the switch (similar to the uring branch) - less neat and does more work than necessary, but it would work. But at some time it would be nice to switch as the polling adds a little overhead both to code and performance.

(FWIW, note that there is still cases where wait_readable and wait_writable is useful even with io_uring. perhaps their fate needs to be a separate discussion as they don't provide value/make sense on certain other platforms)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it makes sense to move open fds between different event loop implementations. That's just going to be chaos. And I'm not sure it would even be very useful anyway.
However I could potentially see use cases for using one kind of event loop implementation for a part of an application, and another one in another part. Not saying that this definitely makes sense, but it might.

Copy link

@yxhuvud yxhuvud May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it makes sense to move open fds between different event loop implementations.

So what you are saying is that each event loop need to open their own copy of std_* and keep track of them separately? And that it shouldn't be possible to safely pass file descriptors around without restrictions? That seems a lot more chaotic to me. The gains for adding restrictions like that on the user seems very tiny.

At least on linux/mac. No idea what would be the best choice on windows.

edit: I am not against allowing having loop-private fds, but I don't think it make sense as a default choice.

edit2: It could also make sense to have it unspecified. Allowing globality where it make sense (due to the fds actually being global like the linux default), or local (when they are not global, eg handles on windows or privately registered fds which is something uring can do).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a use case I was imagining a set of file descriptors being handled by a separate event loop from the rest of the application. This could be a set of sockets for some specific communication purpose. There doesn't need to be any standard IO on such an event loop. Or any other interference with file descriptors that are not part of that set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly we want to have multiple event loop implementations compiled together (i.e. have io_uring + epoll on Linux) but only one implementation will be activated after a runtime check for the whole program.

Couldn't the event loop implementation tell whether it wants O_NONBLOCK for example?


### Optional event loop features

Some activities are managed on the event loop on one platform but not on others. Example would be `Process#wait` which goes through IOCP on Windows but on Unix it’s part of signal handling. (Note: Perhaps we could try to get that on the event loop on Unix as well? **🤔** But there are other examples of system differences)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linux has signalfd which could help, but I am not aware if mac/bsd have any good solutions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BSD and Darwin have kqueue that can receive signals (and many other things).
Recent Linux kernels have pidfd that look even more lovely than signalfd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to know we have the option to put signals on the event loop. It's a different question whether we'd want that. It could be useful as it would help simplify the implementation, I imagine?

But this is just an example. The real question here is how to handle events that need to be on the event loop on one system and we cannot put them on on another one. Not sure if there's any currently relevant use case except signals, but it's something to ponder for future extensibility.


### Bulk events without fibers

For some applications it might be useful to interact with the event loop directly, being able to push operations in bulk without having to spawn (and wait) a fiber for each one.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could also think of the reverse. For example io_uring supports multishot accept, ie it is not necessary to rearm the op after getting one without emitting more events, so a usage of it could either spawn new fibers or reuse a set of existing fibers with each trigger.

@crysbot
Copy link

crysbot commented May 7, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/curious-about-the-eventloop-updates/6825/3

# Opens a connection on *socket* to the target *address* and continues fiber
# when the connection has been established.
# Returns `IO::Error` but does not raise.
abstract def connect(socket : ::Socket, address : ::Socket::Addrinfo | ::Socket::Address, timeout : ::Time::Span?) : IO::Error?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The address type could perhaps be abstracted. We need to support both types, though they both boil down to the same interface: #to_unsafe returning a pointer to a LibC::Sockaddr and #size returning its size in bytes.

So it could be an option to define an interface for that and use that as type restriction. But we cannot actually make it strictly type-safe because the return type of #to_unsafe is Void* and we cannot enforce a pointer to LibC::Sockaddr (so Array would implement this interface for example).

Alternatively, we could remove the type restriction entirely because any type that fulfills the implicit interface should work.

I think it's better to be explicit, either with ::Socket::Addrinfo | ::Socket::Address or an abstract module interface.
Introducing a new model would also solve the issue that the Socket name space is not in core lib (see https://github.com/crystal-lang/rfcs/blob/rfc/0005/text/0007-event_loop-refactor.md#socket).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually hard to stub out ::Socket::Addrinfo and ::Socket::Address because that would require defining the Socket namespace, which is a class that inherits IO and that's so much complexity we could just include socket.cr.

I've considered introducing a module which could be included in both types. But that's actually not a great solution for when we return an address.
So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Or to leave it as the union?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those types are not in core lib. Their existence depends on require "socket". So leaving the union won't work as is. Trying to see if we can stub it out (or replace with an internal type) is one possible way to deal with that.

But I think a better approach is described in #7 (comment)

@dsisnero
Copy link

Here's a good link on CancelToken instead of looking for Timeouts or even signals - use a CancelToken and have those as subclasses of it https://vorpus.org/blog/timeouts-and-cancellation-for-humans/

Comment on lines 186 to 194
#### Socket

One instance of this problem shows already in the core features: The event loop interface has type restrictions of the `Socket` namespace in abstract defs, but `Socket` is not in the core lib.

Options:

- Omit those abstract defs (dilutes the interface, so not ideal)
- Split `EventLoop` interface and add parts of it only with `require “socket”`
- Add stub declarations for the involved types (`Socket::Handle` and `Socket::Address` - `Socket` itself is only used as parameter type which is technically okay for abstract methods)
Copy link
Member Author

@straight-shoota straight-shoota May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered option 3 (stubbing the affected types in core lib) as maybe the most approachable solution, but I don't think it is.
We technically don't need to stub Socket because it's only used as type restriction in the parameters which are not validated by the abstract method checker. However, I believe this should eventually be covered as well, which would break this solution in the future. And I think it's generally better to be strict about abstract method implementations, even if the compiler won't complain.
Either way, I don't think it makes much sense to stub that many types (2 or 3) for the event loop API. It would probably be easier to just use the actual types directly.

Copy link
Member Author

@straight-shoota straight-shoota May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that we may need to handle optional event loop features eventually (see previous section), it might be a good exercise to start with sockets.

Here's how this could work:

  • Define a module Crystal::System::EventLoop::Socket
  • This module is empty by default.
module Crystal::System::EventLoop::Socket
  # empty
end
  • In crystal/system/socket.cr, add abstract defs to Crystal::System::EventLoop::Socket (this could probably be in a different file).
module Crystal::System::EventLoop::Socket
  abstract def close(socket : ::Socket) : Nil

  # etc.
end
  • All current event loop implementations implement the socket interface, so they can unconditionally include Crystal::System::EventLoop::Socket. There's no need to hide the implementation when sockets are not used.
    We can even include it in Crystal::System::EventLoop (and pull it out when we ever get an event loop implementation that doesn't implement sockets).
class Crystal::LibEvent::EventLoop
  include Crystal::EventLoop::Socket

  def close(socket : ::Socket) : Nil
    # ...
  end

  # etc.
end
  • In a program where sockets are not used, the abstract defs are not checked because the module is empty. The implementation methods are never called, so their type restrictions don't matter.
  • In a program where sockets are used, the module has abstract defs and their implementation will be checked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I was an advocate for this style, but now that I see it I'm having a little itch. How would it look like with a split interface for the EventLoop? Say, EventLoop::SocketInterface, EventLoop::FileInterface, ... and then let each EL implementation include the module interfaces they comply to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's basically what this is.
However, there's the problem that at the event loop implementation we can't know whether sockets are used in the program (i.e. if Socket is defined) or not. The event loop gets required as part of the prelude while require "socket" will only be in user code.

So the trick here is to have the socket interface empty by default and only fill it with abstract defs if sockets are used.
That makes it possible for the event loop implementation to always include the socket interface. We get abstract def checks if sockets are used.

# Opens a connection on *socket* to the target *address* and continues fiber
# when the connection has been established.
# Returns `IO::Error` but does not raise.
abstract def connect(socket : ::Socket, address : ::Socket::Addrinfo | ::Socket::Address, timeout : ::Time::Span?) : IO::Error?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Or to leave it as the union?

text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved
text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved
Comment on lines 186 to 194
#### Socket

One instance of this problem shows already in the core features: The event loop interface has type restrictions of the `Socket` namespace in abstract defs, but `Socket` is not in the core lib.

Options:

- Omit those abstract defs (dilutes the interface, so not ideal)
- Split `EventLoop` interface and add parts of it only with `require “socket”`
- Add stub declarations for the involved types (`Socket::Handle` and `Socket::Address` - `Socket` itself is only used as parameter type which is technically okay for abstract methods)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I was an advocate for this style, but now that I see it I'm having a little itch. How would it look like with a split interface for the EventLoop? Say, EventLoop::SocketInterface, EventLoop::FileInterface, ... and then let each EL implementation include the module interfaces they comply to.

@yxhuvud
Copy link

yxhuvud commented Jun 25, 2024

Based on comment in the read_evented issue, I'm noting that support for wait_readable (and probably wait_writable, though that one is less used) would be nice to provide on the systems and implementations where it is possible. It is a fairly common ideom in C libraries on Linux, and it ties in extremely well with an event loop that is built on epoll, kevent or io_uring. Examples of usage: v4l2.cr, wayland_client. I don't know enough about the former to say if it actually needs to wait on readability or if the fixes in 1.12 was sufficient for that use case, but the latter would have to be rewritten from scratch without using system libraries if the ability to wait on readability wasn't there.

As these cannot be implemented on windows, but is at the same time very convenient to have access to where it is possible, I wonder if there should be something written about them in the rfc.

@ysbaddaden
Copy link
Collaborator

Damn, those are valid use cases.

@straight-shoota
Copy link
Member Author

Obviously the event loop implementations for libevent, kqueue, epoll, io_uring etc. will always require something like wait_readable/wait_writable as building blocks. So that functionality must be available on supporting platforms.

But it probably fits best as a feature of the event loop implementation, not the file descriptor model. The former is platform-specific, while the latter is (supposed to be) platform-independent.
It would then be possible to directly interface with the event loop to implement the features that you mention.

@ysbaddaden
Copy link
Collaborator

ysbaddaden commented Jun 27, 2024

@straight-shoota that's what I was thinking: an interface for the event loop to implement. Windows would raise NotImplementedError (until we can figure something, like running a WSAPoll in its own thread?)

@ysbaddaden
Copy link
Collaborator

After implementing the Epoll (and Kqueue) event loops:

  1. I wish sleep was a mere Crystal::EventLoop.sleep(duration : Time::Span). The implementation would be much simpler and direct: create a timer event on the stack, suspend the fiber, done.

  2. Crystal::EventLoop::Event is missing a #delete : Nil method (called when cancelling a select timeout).

@crysbot
Copy link

crysbot commented Sep 24, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/new-event-loop-unix-call-for-reviews-tests/7207/1

@crysbot
Copy link

crysbot commented Oct 10, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/new-event-loop-unix-call-for-reviews-tests/7207/16

@crysbot
Copy link

crysbot commented Oct 18, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1

@straight-shoota straight-shoota changed the title RFC #0007: Event loop refactor RFC 0007: Event loop refactor Nov 12, 2024
@straight-shoota straight-shoota marked this pull request as ready for review November 12, 2024 16:13
@straight-shoota
Copy link
Member Author

I have updated the RFC text to reflect the implemented status. So I think it's ready to be merged.

There are a number of Unresolved questions but I don't think they need to be resolved in the scope of this RFC.
We need to figure out what to do with them. Some might turn into their own actionable items. Others may just stay here for reference.

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last bit can be refined now to explain or at least comment what we ended up doing

text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved
text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved
text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved
text/0007-event_loop-refactor.md Outdated Show resolved Hide resolved
text/0007-event_loop-refactor.md Show resolved Hide resolved
@beta-ziliani
Copy link
Member

There are a number of Unresolved questions but I don't think they need to be resolved in the scope of this RFC.
We need to figure out what to do with them. Some might turn into their own actionable items. Others may just stay here for reference.

We can place them under "future possibilities". In particular, the select part, which likely requires its own rfc

@straight-shoota straight-shoota merged commit 0d3fb2a into main Nov 15, 2024
@straight-shoota straight-shoota deleted the rfc/0005 branch November 15, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants