-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventLoop: direct epoll/kqueue integration #14959
EventLoop: direct epoll/kqueue integration #14959
Conversation
We can't call EPOLL_CTL_MOD with EPOLLEXCLUSIVE. Let's disable it for now and see later if we can replace it with a pair of EPOLL_CTL_DEL and EPOLL_CTL_ADD.
Process.run sometimes hang forever after fork and before exec, because it tries to close a fd that requires to lock, but another thread may have already acquired the lock, while `fork` only duplicates the current thread (the other ones are not, and the forked process was left waiting for a mutex to be unlocked, which would never happen.
That required to allocate a Node for the interrupt event, which ain't a bad idea.
Extracts the generic parts of the event loop into an intermediary class between Crystal::EventLoop and Crystal::Epoll::EventLoop so we can reuse it to implement the event loop on other similar syscalls (poll, kqueue).
Sometimes we only want a pair of fds, and not IO::FileDescriptor objects.
For some reason specs fail with a fiber failing to raise an exception because `pthread_mutex_unlock` failed with EPERM while trying to dequeue the `Fiber#resume_event` from the event loop. Re-creating the thread mutex after fork seems to fix the issue.
This allows to keep a file descriptor into the evloop for its whole lifetime (from open to close) instead of adding it every time it would block and removing it as soon as it unblocks. This brings over 20% performance improvement on a simple HTTP/1.1 server (with keepalive). Among the advantages: this allows to remove the global mutex around handling IO events and instead have an almost never contended lock around the reader or the writer waiting lists for each IO. We don't even have to keep a global list of events (epoll and kqueue will do it). The drawback is that the preview MT scheduler isn't compatible with this scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be renamed as Crystal::Evented::Arena
since it's not a generic generational arena (memory region). It takes advantage that the OS kernels handle the fd
number (it's guaranteed unique) and always reuse closed fd
instead of growing (until it's needed).
An actual generational arena would keep a list of free indexes.
Note: the goal of the arena is to:
- avoid repeated allocations;
- avoid polluting the IO object with the PollDescriptor (doesn't exist in other evloops);
- avoid saving raw pointers into kernel data structures;
- safely detect allocation issues instead of segfaults because of raw pointers.
A couple issues with
|
More details about issue 2. The time is indeed completely wrong, so the
|
Integrates the epoll (Linux) and kqueue (*BSD, macOS) syscalls to handle the event loop on UNIX platforms.
Benefits
Instead of adding a
fd
to the poll structure when thefd
blocks and remove it when it's ready for read or write (then repeat), we now add it once and keep it there until we close thefd
. This is the ideal scenario for epoll and kqueue.Unlike the previous attempts to integrate epoll & kqueue directly that followed libevent's logic and didn't bring any performance improve and required a big lock (contented with MT) to keep a list of events, this change allows up to a +20% performance boost in an ideal scenario (http/server with long lived connections), and only requires fine grained locks for MT (usually uncontended).
To nobody's surprise: this is how Go's netpoll works.
Notes
The evloop supports
preview_mt
with still one evloop instance per thread (scheduler requirement). Execution contexts RFC #2 will have one evloop instance per context.We transfer the
fd
from an evloop to another when it would block, that evloop becomes the sole "owner" of thefd
. The transfer is automatic, there is nothing to do. This leads to a caveat: we can't have multiple fibers waiting for the samefd
in different evloops (aka threads). Trying to transfer thefd
will raise if there is any waiting fiber already. This is because an IO read/write can have a timeout which is registered in the current evloop timers, and timers aren't transferred. This also allows for future enhancements (e.g. evloop enqueues are always local).This can be an issue for
preview_mt
, for example with multiple fibers waiting for connections on a server socket; this shall be mitigated with execution contexts from RFC #2 that will share an evloop instance per context —just don't share afd
in multiple contexts.If you experience any issue, you can always recompile with the
-Devloop_libevent
compile-time flag to return to the regular libevent-based event loop instead of the shiny new one.Review
The branch kept the whole history of commits from the previous epoll and kqueue branches, and have far too many commits. Maybe a couple of them could be extracted on their own.
Each syscall is abstracted in its own little struct:
Crystal::System::Epoll
,Crystal::System::TimerFD
, etc. They could be simplified (possible some dead code).Crystal::Evented
namespace (src/crystal/system/unix/evented
) contains the base implementation that the system specificCrystal::Epoll::EventLoop
(src/crystal/system/unix/epoll
) andCrystal::Kqueue::EventLoop
(src/crystal/system/unix/kqueue
) are built on.Crystal::Evented::Timers
is a basic data structure to keep a list of timers (one instance per evloop); it could be optimized (in follow-up pull requests)Crystal::Evented::Event
holds the event, be it IO or sleep or select timeout or IO with timeout, whileFiberEvent
wraps anEvent
for sleeps and select timeouts.Crystal::Evented::PollDescriptor
are allocated in a Generational Arena and keeps the list of readers and writers (events/fibers waiting on IO).The run loop first waits on epoll/kqueue, canceling IO timeouts as it resumes fibers, then proceeds to process timers.
The epoll/kqueue call doesn't wait until the next erady timer (it could without MT and with
preview_mt
but can't for execution contexts). I instead rely on timerfd on linux and EVTFILT_TIMER on BSD to interrupt a blocking evloop wait. It also allows to circumvent the 1ms precision ofepoll_wait
on Linux.References
Supersedes both #14814 and #14829.