Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: Don't close already closed fds and bail on close(2) failures #3058

Merged
merged 2 commits into from
Jul 9, 2021

Conversation

rata
Copy link
Member

@rata rata commented Jul 2, 2021

Hi!

We've found these bugs while rebasing #2576. The bug is that we are closing an fd that is already closed, and as we ignore close(2) failures we never realized.

It was not a big deal in runc master, it seems, as that close fails and is otherwise ignored. However, when working on PR #2576 we are opening a new fd, sometimes between the two close(2) calls, the fd number is reused (allocates the lower available fd) and the second close(2) was closing this new open fd that reused the number.

I've create two patches: one to remove the bogus close, and another one to check failures on close(2) so this bug does not happen again (as long as we continue to check for close failures :)). I've also verified that if we check for close failures on the close I remove, it indeed always returns an error on master.

You can find a more in detail explanation in the commit messages :)

cc @alban

This was closed in the child[1], before calling clone_parent (so runc
INIT will have this fd closed too), there is no point closing it again.

This was not causing issues because we ignore the return code of
close(2) and no one was opening a new fd between both calls to close.
However, with the new patches that I'm working on (PR opencontainers#2576), this
problem is no longer inocuos: we do open a new fd in that PR, sometimes
that fd is allocated between the two close(2) calls and, as the lowest
fd is allocated to the new fd, sometimes the second close ends up
incorrectly closing this new fd.

Before it was not a problem in practice, but it was incorrect
nevertheless.

This seems to be long standing bug, present since at least 2018
(a54316b), when SYNC_GRANDCHILD was introduced.

[1]: https://github.com/opencontainers/runc/blob/5547b5774f71f75a088e7432fa961778750a0fbd/libcontainer/nsenter/nsexec.c#L888

Co-authored-by: Alban Crequy <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
cyphar
cyphar previously requested changes Jul 2, 2021
libcontainer/nsenter/nsexec.c Outdated Show resolved Hide resolved
@rata rata force-pushed the rata/nsexec-close-bug branch from 57eaa87 to 6289bab Compare July 2, 2021 14:39
@rata
Copy link
Member Author

rata commented Jul 2, 2021

@cyphar fixed, PTAL

@rata rata force-pushed the rata/nsexec-close-bug branch from 6289bab to 42c9054 Compare July 2, 2021 14:41
Don't ignore close(2) return code, rather bail if there is any
unexpected failures. By checking the close return code we make sure we
don't introduce the same bug (closing an already closed fd) I've fixed
in the previous patch.

As a side note, we are not handling in this patch when close(2) returns
EINTR and the go runtime, since go 1.14, sends SIGURG to preempt
goroutines. This should not happen here though, as nsenter is guaranteed
to be executed before the go runtime starts.

Signed-off-by: Rodrigo Campos <[email protected]>
@rata rata force-pushed the rata/nsexec-close-bug branch from 42c9054 to 83776dd Compare July 2, 2021 14:43
@rata rata requested a review from cyphar July 2, 2021 14:45
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit 497e403 into opencontainers:master Jul 9, 2021
@kolyshkin
Copy link
Contributor

As this is not a bug (without #2576) I think we don't need to backport it to 1.0.

@rata rata deleted the rata/nsexec-close-bug branch July 13, 2021 11:26
@rata
Copy link
Member Author

rata commented Jul 13, 2021

@kolyshkin agree, it was there for 3 years and doesn't seem to hurt anyone.

It should be safe to backport just the removal of the extra close "libcontainer: Don't close fds already closed" (7d479e6).

But as you think it is best, backporting it won't fix any visible regressions AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants