Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux: don't delay EPOLL_CTL_DEL operations #4328

Merged
merged 4 commits into from
Mar 21, 2024
Merged

Conversation

bnoordhuis
Copy link
Member

Perform EPOLL_CTL_DEL immediately instead of going through io_uring's submit queue, otherwise the file descriptor may be closed by the time the kernel starts the operation.


I still need to steal adopt Santi's test case but I ran out of time for the day.

Copy link
Member

@santigimeno santigimeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

However, will this still leave open a race window if a user adds an event for only a short period of time or changes the set of events, if the following sequence occurs, or is that prevented by uv__epoll_ctl_flush:

  • user calls uv__epoll_ctl_prep(EPOLL_CTL_ADD)
  • user calls uv_close()
  • libuv calls epoll_ctl(EPOLL_CTL_DEL) # which fails, since EPOLL_CTL_ADD hasn't completed yet
  • kernel processes uv__epoll_ctl_prep(EPOLL_CTL_ADD)
  • libuv calls close() # which triggers this kernel issue with fd that are in epoll

If this is prevented by uv__epoll_ctl_flush, then would an alternative approach that preserves some of this batching to move add a uv__epoll_ctl_flush just before the close-callback phase of the event loop, but only if there are both pending flush and pending close calls? (and assuming it is still careful around a uv_poll_stop call, which is documented to run synchronously with respect to subsequent close calls)

@bnoordhuis
Copy link
Member Author

I don't think that can happen. Before io_uring, we basically called epoll_ctl(EPOLL_CTL_ADD) in a loop at the start of uv__io_poll() right before calling epoll_pwait().

Now, we use io_uring as a kind of vectored epoll_ctl but everything still happens at the start of uv__io_poll().

I could nop out the sqe in uv__platform_invalidate_fd() but that's a) another linear scan and we already have one of those, and b) dead code, I'm fairly sure.

I'll push a commit that shows what I mean but I observe that our test suite never hits that code path.

@bnoordhuis
Copy link
Member Author

Also, given the above and given that after this commit EPOLL_CTL_DEL no longer goes through io_uring, there's probably room to simplify src/unix/linux.c.

@oscardssmith
Copy link

is this ready to merge?

Perform EPOLL_CTL_DEL immediately instead of going through
io_uring's submit queue, otherwise the file descriptor may
be closed by the time the kernel starts the operation.

Fixes: libuv#4323
@bnoordhuis
Copy link
Member Author

Incorporated Santi's test case and simplified the logic in linux.c, PTAL.

@santigimeno
Copy link
Member

santigimeno commented Mar 19, 2024

It seems the new test is failing to build on macos though

@bnoordhuis bnoordhuis merged commit 3ecce91 into libuv:v1.x Mar 21, 2024
37 checks passed
@bnoordhuis bnoordhuis deleted the fix4323 branch March 21, 2024 08:23
Keno pushed a commit to JuliaLang/libuv that referenced this pull request Mar 23, 2024
Perform EPOLL_CTL_DEL immediately instead of going through
io_uring's submit queue, otherwise the file descriptor may
be closed by the time the kernel starts the operation.

Fixes: libuv#4323
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants