-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epoll event loop (linux) #14814
Epoll event loop (linux) #14814
Conversation
Urgh, supporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust you have read https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/ (and part 2: https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/? ) The answer to multitheaded issiues may be found in there.
raise "BUG: #{node.fd} is ready for reading but no registered reader for #{node.fd}!\n" if readable | ||
raise "BUG: #{node.fd} is ready for writing but no registered writer for #{node.fd}!\n" if writable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it sufficent to do normal raise here? I imagine it can be a bit weird in the multithreaded scenario if things raise here. As this is called from the scheduler itself I imagine most places are not really build to handle this well.
Will it bubble up properly and take down the whole thing (as at least I think it should)? Or could we get strange situations with some crashed threads but the program as a whole remains (potentially waiting for the crashed stuff).
I suppose that goes for other exceptions in this vicinity too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not meant to stay, and definitely not safe to handle.
All the raised "BUG:" are meant to loudly detect errors. If they raise, there's a 99% probability that I made a mistake, though I discovered the double event dequeue in Fiber#cancel_timeout
thanks to one of these.
I took care to use EPOLLEXCLUSIVE (available since 2017). I'm not sure we need EPOLLONESHOT since we register the fd once then dequeue readers/writers one by one. Now, there may still be races, especially in a MT environment, but there's little we can do without blocking the other readers/writers (there's no telling if a reader/writer will keep reading/writing). I didn't know the |
The close quirk needs some experimental verification:
The one thing to take care of would be to resume the cached event/fibers from all epoll instances (tricky). I also found an issue with EPOLLEXCLUSIVE: we can't modify, so either we don't use it or we must del/add when changing the set of events. |
Could be they fixed it at some point, there has been a bunch of years since that post was written after all.. |
Another man page, a longer answer:
|
The I'm trying to think of an efficient way to iterate the event loops on (something lighter than |
We can't call EPOLL_CTL_MOD with EPOLLEXCLUSIVE. Let's disable it for now and see later if we can replace it with a pair of EPOLL_CTL_DEL and EPOLL_CTL_ADD.
Perhaps it is possible to have a lookup table fd -> loop (or probably - a bit less contentious in mt scenario - have a list in the FileDescriptor (well no, but you perhaps better understand what I mean compared to other choices of words) that keeps track of what epolls uses it). But it is an icky problem :( |
Process.run sometimes hang forever after fork and before exec, because it tries to close a fd that requires to lock, but another thread may have already acquired the lock, while `fork` only duplicates the current thread (the other ones are not, and the forked process was left waiting for a mutex to be unlocked, which would never happen.
That required to allocate a Node for the interrupt event, which ain't a bad idea.
|
||
byte = 1_u8 | ||
LibC.write(@pipe[1], pointerof(byte), 1) | ||
# the atomic makes sure we only write once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is writing more than once a problem when using eventfd? Could be an alternative to simply write multiple times and not have the atomic variable at all. The eventfd will just return the sum of values that has been written to it, after all (unless using the semaphore flag).
(BTW, this method is to interrupt the blocking wait, right? Sometimes the term notify
is used for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It's not needed as for the pipe (we can write 0xfffffffffffffffe times until it blocks) but it helps avoid pointless write
syscalls, so ⚖️ ?
It looks like it's starting to work... except for the interpreter that now hangs forever 😭 |
Using a release compiler (with libevent), the interpreter segfaults 😭 Building a compiler with epoll (as CI does) in non release, the interpreter enters a kind of infinite loop (100% CPU, silent
It's creating the eventloop instance (epoll, eventfd, timerfd) then writes a colored Any pointers to debug the interpreter itself? |
I can't get tracing to work in the interpreted code, so I hacked myself into |
I only checked whether |
CI is finally green and the implementation can be reviewed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for rolling out this? Does it need some sort of opt-in flag for a version or two so that people can try and find issues? Or is it a 2.0 thing?
size = LibC.recvfrom(socket.fd, slice, slice.size, 0, sockaddr, pointerof(addrlen)) | ||
if size == -1 | ||
if Errno.value == Errno::EAGAIN | ||
wait_readable(socket.fd, socket.@read_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one potential issue here is the following potentially chain of confusing behavior:
-> recvfrom -> EAGAIN
-> waiting a while, say timeout half time span
-> recvfrom -> EAGAIN
.. etc, potentially waiting forever, never timing out.
But treating the timeout as a deadline is perhaps a different issue than this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.I'm 99% sure this problem is already in the libevent event loop. That doesn't mean we should reproduce it here.
Either opt-in with a flag... or we're bold and make it the default + a flag to opt-out. That will depend on how stable/performant it will get until the next release. |
2fc2f99
to
4d4c068
Compare
getter type : Type | ||
getter fd : Int32 | ||
|
||
property! time : Time::Span? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: IOCP uses wake_at
and it's much better name than time
!
Moving back to draft. There are some fixes in #14829 and... I got great ideas from the Go implementation that would allow to skip most of the synchronizations and take full advantage of epoll/kqueue (see ysbaddaden/execution_context#30). |
Superseded by #14959 |
Implements a custom event loop on top of
epoll
for Linux (and Solaris?), following RFC #7. It's not particularly optimized yet (e.g. it still allocates), but seems to be working.The EventQueue and Timers types aren't the best solutions (performance wise) but they work and abstract their own logic (keep a list of events per fd / keep a sorted list of timers). We should be able to improve (e.g. linked lists -> ordered linked lists -> skiplists, ...)
ISSUES
(not needed with one timerfd per eventloop, see below);#after_fork
must close & nullify everyFiber#timerfd
& update pending events to use the new fdmust reset mutex after fork before exec (preview mt);must "close" fd in all eventloops (preview mt);LibC bindings foraarch-linux-musl
are wrong;investigate segfault runs on CI;support:preview_mt
(broken CI);can't modify a registered fd with EPOLLEXCLUSIVE set(disabled for now);BUG: a timerfd file descriptor errored or got closed!
on CI;epoll_ctl(EPOLL_CTL_MOD): No such file or directory (RuntimeError)
on CI;Unhandled exception: pthread_mutex_unlock: Operation not permitted (RuntimeError)
on CI and againinfinite loop in interpreter on CI (primitives spec);OPTIMIZE
epoll_event.data
should point to Node instead of fd, so we could skip searches;eventfd
instead ofpipe2
to interrupt the loop;timerfd
per eventloop should be enough (wait until next timer => we can't enqueue another timer on that thread anymore => for MT a parallel enqueue will interruptepoll_wait
); we still need atimerfd
becauseepoll_wait
only has millisecond precision; we also need a list of timers to know the next timer, and which timers are ready.TODO
Fiber#cancel_timeout
(distinct PR);:ev_epoll
flag instead of forcing it for Linux (not yet: I want CI test runs);