Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 0007: Event loop refactor #7

Merged
merged 27 commits into from
Nov 15, 2024
Merged
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9a93501
Event loop refactor
straight-shoota May 2, 2024
04a7c37
Fix filename
straight-shoota May 7, 2024
c70c896
Fill guide-level explanation
straight-shoota May 7, 2024
ba1e1a4
Add abstract interface description
straight-shoota May 7, 2024
3e40217
Fix associate *Socket* subsection with *Optional event loop features*
straight-shoota May 27, 2024
3bff4d1
fixup
straight-shoota May 27, 2024
9afc76f
Resolve *`read` and `write` behaviour*
straight-shoota May 28, 2024
9ab3a8e
Resolve *Timeout and Resume events*
straight-shoota May 28, 2024
0647def
Fix typo
straight-shoota May 28, 2024
113e401
Drop `EventLoop#send`, `#receive`
straight-shoota May 28, 2024
b1978fa
Separate `EventLoop` interface into modules
straight-shoota May 28, 2024
a42930c
typo
straight-shoota May 28, 2024
5db0edb
Add `run` and `interrupt` for completeness
straight-shoota May 28, 2024
a08f935
fixup
straight-shoota May 29, 2024
5b7f663
Improve API docs for `EventLoop::Socket`
straight-shoota May 29, 2024
8b9b09b
Update text/0007-event_loop-refactor.md
straight-shoota May 30, 2024
04462e8
Apply suggestions from code review
straight-shoota May 31, 2024
8eee22e
Update implemented API
straight-shoota Nov 12, 2024
6334e16
Resolve `Socket` question
straight-shoota Nov 12, 2024
526b3d8
cleanup
straight-shoota Nov 12, 2024
53a1861
Add reference to RFC 0009
straight-shoota Nov 12, 2024
1061a51
Fix `write`/`read` docs
straight-shoota Nov 14, 2024
246c87f
Drop unresolved questions
straight-shoota Nov 14, 2024
f74db24
Turn _Type for sizes_ into a future possibility
straight-shoota Nov 14, 2024
70942fa
Turn more open questions into future possibilities
straight-shoota Nov 14, 2024
b7da5cc
Drop *Rationale and alternatives*
straight-shoota Nov 14, 2024
252f546
Add *Rationale and alternatives*
straight-shoota Nov 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions text/0007-event_loop-refactor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
- Feature Name: `event_loop-refactor`
- Start Date: 2023-05-02
- RFC PR: [crystal-lang/rfcs#0007](https://github.com/crystal-lang/rfcs/pull/0007)
- Issue: [crystal-lang/crystal#10766](https://github.com/crystal-lang/crystal/issues/10766)

# Summary

Refactor the event loop interface to be decoupled from `libevent` and adapts to
the needs of multithreaded execution contexts ([RFC #0002](https://github.com/crystal-lang/rfcs/pull/0002)).

# Motivation

The event loop is the central piece for enabling non-blocking operations and other event-based activities.

The original implementation of `Crystal::EventLoop` etc. is heavily based on `libevent` which was the first backing driver. Now we have an event loop based on IOCP for Windows and a version for the stripped-down abilities on WASI. Another implementation based on `io_uring` has already been proposed ([#10768](https://github.com/crystal-lang/crystal/pull/10768)).

All these implementations have quite different designs and requirements. Shoehorning them into an API intended for `libevent` isn’t ideal. Hence the interface should be refactored to be more generic.

Another concern is that `io_uring` is generally available on Linux, but requires a modern kernel. A Crystal program should be able to fall back to an `epoll` based selector if `io_uring` is unavailable.

There’s an ongoing refactoring effort for enhanced multi-threading features, which includes a refactoring of the scheduler ([RFC #0002](https://github.com/crystal-lang/rfcs/pull/2)).
This context also asks for some changes to the event loop interface to allow more capable and efficient scheduling implementations.
This has also brought up the idea to use system event loops directly instead of `libevent` (which is essentially a cross-platform wrapper around them).
This would also allow to take control of the abstraction and open possibilities for further optimization in Crystal resulting in more performance.

# Guide-level explanation

`Crystal::EventLoop` exposes a generic and compact interface for actions that are to be performed on Crystals event loop.
This interface is used in the implementation of asynchronous IO and other evented operations.

Compared to the previous event loop implementation, actions are entirely self-contained instead of scattered between the event loop and the system implementations of `Socket` and `Crystal::System::FileDescriptor`.

For example, the previous, platform-specific implementation of `Socket#unbuffered_read` for Unix systems:
```cr
private def unbuffered_read(slice : Bytes) : Int32
evented_read(slice, "Error reading socket") do
LibC.recv(fd, slice, slice.size, 0).to_i32
end
end
```

The new, portable implementation of `Socket#unbuffered_read`:
```cr
private def unbuffered_read(slice : Bytes) : Int32
event_loop.read(self, slice)
end
```

The new API initially supports adaptations of the existing backends `libevent` and `IOCP`.
The next steps are additional implementations based on `io_uring`, as well system selectors (`kqueue` and `epoll`), skipping libevent to improve performance.

Further extensions can widen the scope of the event loop.

`Crystal::Eventloop` is an internal API, thus changing it does not break backwards compatibility.

## Terminology

- **Event loop**: an abstraction to wait on specific events, such as timers (e.g. wait until a certain amount of time has passed) or IO (e.g. wait for an socket to be readable or writable).

- **Crystal event loop:** Component of the Crystal runtime that subscribes to and dispatches events from the OS and other sources to enqueue a waiting fiber when a desired event (such as IO availability) has occured. It’s an interface between Crystal’s scheduler and the system’s event loop backend.

- **System selector:** The system implementation of an event loop which the runtime component depends on to receive events from the operating system. Example implementations: `epoll`, `kqueue`, `IOCP`, `libevent` (a cross-platform wrapper of the former), `io_uring`

- **Scheduler:** manages fibers’ executions inside the program (controlled by Crystal), unlike threads that are scheduled by the operating system (outside of the accessibility of the program).

## General Design Principles

- **Generic API**: Independent of a specific event loop design
- **Black Box**: No interference with internals of the event loop
- **Pluggable**: It must be possible to compile multiple event loop implementations and choose at runtime.
Only one type of implementation will typically be active (it would be feasible to have different event loop implementations in different execution contexts, if there's a use case for that).

# Reference-level explanation

The new `EventLoop` defines the following interface for issuing operations on the event loop:

```cr
module EventLoop
beta-ziliani marked this conversation as resolved.
Show resolved Hide resolved
# Reads at least one byte from the file descriptor into *slice* and continues
# fiber when the read is complete.
# Returns the number of bytes read.
abstract def read(file : Crystal::System::FileDescriptor, slice : Bytes) : Int32

# Reads at least one byte from the socket into *slice* and continues fiber
# when the read is complete.
# Returns the number of bytes read.
abstract def read(socket : ::Socket, slice : Bytes) : Int32

# Writes at least one byte from *slice* to the file descriptor and continues
# fiber when the write is complete.
# Returns the number of bytes written.
abstract def write(file : Crystal::System::FileDescriptor, slice : Bytes) : Int32

# Writes at least one byte from *slice* to the socket and continues fiber
# when the write is complete.
# Returns the number of bytes written.
abstract def write(file : ::Socket, slice : Bytes) : Int32

# Accepts an incoming TCP connection on the socket and continues fiber when a
# connection is available.
# Returns a handle to the socket for the new connection.
abstract def accept(socket : ::Socket) : ::Socket::Handle?

# Opens a connection on *socket* to the target *address* and continues fiber
# when the connection has been established.
# Returns `IO::Error` but does not raise.
abstract def connect(socket : ::Socket, address : ::Socket::Addrinfo | ::Socket::Address, timeout : ::Time::Span?) : IO::Error?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The address type could perhaps be abstracted. We need to support both types, though they both boil down to the same interface: #to_unsafe returning a pointer to a LibC::Sockaddr and #size returning its size in bytes.

So it could be an option to define an interface for that and use that as type restriction. But we cannot actually make it strictly type-safe because the return type of #to_unsafe is Void* and we cannot enforce a pointer to LibC::Sockaddr (so Array would implement this interface for example).

Alternatively, we could remove the type restriction entirely because any type that fulfills the implicit interface should work.

I think it's better to be explicit, either with ::Socket::Addrinfo | ::Socket::Address or an abstract module interface.
Introducing a new model would also solve the issue that the Socket name space is not in core lib (see https://github.com/crystal-lang/rfcs/blob/rfc/0005/text/0007-event_loop-refactor.md#socket).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually hard to stub out ::Socket::Addrinfo and ::Socket::Address because that would require defining the Socket namespace, which is a class that inherits IO and that's so much complexity we could just include socket.cr.

I've considered introducing a module which could be included in both types. But that's actually not a great solution for when we return an address.
So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it's probably best to introduce an internal struct type which holds a reference to LibC::Sockaddr and its size.

Or to leave it as the union?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those types are not in core lib. Their existence depends on require "socket". So leaving the union won't work as is. Trying to see if we can stub it out (or replace with an internal type) is one possible way to deal with that.

But I think a better approach is described in #7 (comment)


# Writes at least one byte from *slice* to the socket and continues fiber when
# the write is complete.
# Returns the number of bytes written.
abstract def send(socket : ::Socket, slice : Bytes) : Int32

# Writes at least one byte from *slice* to the socket with a target *address* (UDP)
# and continues fiber when the write is complete.
# Returns the number of bytes written.
abstract def send_to(socket : ::Socket, slice : Bytes, address : ::Socket::Address) : Int32

# Receives on the socket into *slice* and continues fiber when the package is
# completed
# Returns the number of bytes received.
abstract def receive(socket : ::Socket, slice : Bytes) : Int32
straight-shoota marked this conversation as resolved.
Show resolved Hide resolved

# Receives on the socket into *slice* and continues fiber when the package is
# completed.
# Returns a tuple containing the number of bytes received and the source address
# of the packet (UDP).
abstract def receive_from(socket : ::Socket, slice : Bytes) : Tuple(Int32, ::Socket::Address)

# Closes the *resource*.
abstract def close(resource) : Nil
end
```

# Drawbacks

None.

# Rationale and alternatives

TBD
beta-ziliani marked this conversation as resolved.
Show resolved Hide resolved

# Prior art

## References

- <https://tinyclouds.org/iocp_links>
- Zig: [Proposal: Event loop redesign (ziglang/zig#8224)](https://github.com/ziglang/zig/issues/8224)
- [\[RFC\] Fiber preemption, blocking calls and other concurrency issues (crystal-lang/crystal#1454)](https://github.com/crystal-lang/crystal/issues/1454)

# Unresolved questions

- Distinction between generic system APIs and event loop-specifics is not always clear.
beta-ziliani marked this conversation as resolved.
Show resolved Hide resolved

- Should the implementation of the Crystal event loop be in the same type as the bindings for the system selector? (i.e. are `#read(Socket, Bytes)` and `#create_resume_event(Fiber)` in the same place?)
beta-ziliani marked this conversation as resolved.
Show resolved Hide resolved

- How many event loops are there gonna be in relation to execution contexts? ([ysbaddaden/execution_context#7](https://github.com/ysbaddaden/execution_context/issues/7))

### What’s the scope of the Crystal event loop?

This is a list of operations which we expect to go through the event loop:

- File descriptor: read, write
- Socket: read, write, accept, connect, send, receive
- Process: wait (Windows)
-
Potential extensions:

- DNS resolution? ([Async DNS resolution #13619](https://github.com/crystal-lang/crystal/issues/13619))
- OS signals?
- File system events?
- OS event (eventfd) ?
- User events

Should events from the Crystal runtime be part of the event loop as well?

- Fiber: sleep
- Select actions: send, receive, timeout

### Optional event loop features

Some activities are managed on the event loop on one platform but not on others. Example would be `Process#wait` which goes through IOCP on Windows but on Unix it’s part of signal handling. (Note: Perhaps we could try to get that on the event loop on Unix as well? **🤔** But there are other examples of system differences)

Do we require these optional methods to be present in all event loop implementations, i.e. they’re part of the global interface? Some impls would then just raise “Not implemented”. Alternatively, we could keep them out of the main interface and check for availability via `event_loop.responds_to?`. or a sub interface (`is_a?(EventLoop::Process)`). Or…?
beta-ziliani marked this conversation as resolved.
Show resolved Hide resolved

### `read` and `write` behaviour

The behaviour for buffered/evented `read` and `write` differs:

- `read` is expected to read up to `slice.size` bytes and return the number of bytes that were actually read

- `write` is expected to write _all_ of `slice`. It returns `Nil`.

This makes sense for a high-level API because you certainly want to write the entire thing. If you need to fill a buffer, you can use `read_fully`, but often you can keep working on whatever bunch is available right away.\
I’m not sure if the event loop implementation of `write` should do the same. I’d suggest to change it to work the same as `read`, and return the number of bytes written. The intermediary implementation of `unbuffered_write` can then make sure to iterate until the entire slice is written down. This keeps the event loop implementation minimal and more versatile.

### Type for sizes

Currently, the return type of `unbuffered_read` is unspecified and there’s a bit of a mess. Technically, it can only be `Int32` because that’s the size of `Slice`. We could use the same in the event loop API. However, in order to be future proof for bigger sizes (<https://github.com/crystal-lang/crystal/issues/4011>), we could design the API with `SizeT` instead. Considering it’s a low-level system API, this should be fine and makes a lot of sense.\
Currently, the only possible values would still be the positive range of `Int32`, so there would be no conversion risk.

#### Socket

One instance of this problem shows already in the core features: The event loop interface has type restrictions of the `Socket` namespace in abstract defs, but `Socket` is not in the core lib.

Options:

- Omit those abstract defs (dilutes the interface, so not ideal)
- Split `EventLoop` interface and add parts of it only with `require “socket”`
- Add stub declarations for the involved types (`Socket::Handle` and `Socket::Address` - `Socket` itself is only used as parameter type which is technically okay for abstract methods)

### Blocking event loop

There should be a basic implementation with blocking syscalls when non-blocking operation is unavailable. This would currently be used for WASI, for example, and allows IO to work, although not as efficiently.

When evented operations are not available (for example file system operations or DNS resolution), the scheduler could automatically fall back to execute the operation in a separate thread.

This works well with the libevent implementation because if the lib calls never return `EAGAIN`, the code works entirely without libevent. The WASI implementation raises when trying to create a resume event, but that never happens when IO operations are all blocking (`~O_NONBLOCK`). For that to happen, we’d need to explicitly set the IO to blocking.

Alternative idea: If polling is unavailable, we could consider sleeping the fiber for some time and let it retry again **🤷**

### Timeout and Resume events

Should we put them entirely into the scheduler / event loop driver? We’re already doing that in IOCP. Managing these events entirely in Crystal would allow us to drop `Crystal::System::Event` which is only really needed for libevent.

### `#connect` timeout

`Socket#connect` is the only method with a timeout parameter. This seems weird. All other timeouts are configured on the IO instance.\
It probably makes sense to standardize, introduce a new property \`Socket#connect\_timeout\`, and deprecate the def parameter.

### Bulk events without fibers

For some applications it might be useful to interact with the event loop directly, being able to push operations in bulk without having to spawn (and wait) a fiber for each one.

- [Fiber usage in high IO application](https://forum.crystal-lang.org/t/fiber-usage-in-high-io-application/6689)

### **#Integration with** `select`

`select` actions are implemented completely independent of the event loop, yet they operate in a similar domain (wait for something to happen). These actions are special in that they ensure atomicity. When waiting on multiple actions simultaneously, it’s guaranteed that only one of them executes. This probably won’t be exactly possible with most event loop features.

It would certainly be nice if we could use more actions with `select` and maybe putting everything on the event loop is actually the best path forward with that?

```cr
select
when data = @socket.gets
end
```

# Future possibilities

* Alternative event loop implementations based directly on the system selectors
instead of `libevent`.