-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix set nofile rlimit error #4265
Fix set nofile rlimit error #4265
Conversation
@ls-ggg PTAL, could you please help to test whether this PR fix the issue #4195 or not? Thanks very much. This is my test with @kolyshkin 's script in #4195 (comment)
|
The ci test in centos7 is always fail after re-test:
I'll take a look tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I read your comment and wrote the very same code yesterday, still testing it since it needs about 10 minutes to repro the issue.
tests/integration/exec.bats
Outdated
@test "runc exec with RLIMIT_NOFILE" { | ||
update_config '.process.capabilities.bounding = ["CAP_SYS_RESOURCE"]' | ||
update_config '.process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 65536, "soft": 65536}]' | ||
|
||
runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox | ||
[ "$status" -eq 0 ] | ||
|
||
# issue: https://github.com/opencontainers/runc/issues/4195 | ||
runc exec test_busybox /bin/sh -c "ulimit -n" | ||
[[ "${output}" == "65536" ]] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @kolyshkin said in #4266, there isn't much point making a CI test for this -- because this is a race we would need to run this for many iterations to confirm whether the race still exists. Running it once just wastes CI time because even if the bug was re-introduced we would never catch it in a single CI run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll remove it later, at this time, I used it to test some other issues.
e5fa99b
to
a996139
Compare
tests/integration/exec.bats
Outdated
|
||
update_config '.process.args = ["/bin/sh", "-c", "ulimit -n"]' | ||
runc run test_ulimit | ||
[[ "${output}" == "65536" ]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intrestingly, I don't know how this PR impacts runc run
with RLIMIT_NOFILE
in centos7
:
not ok 91 runc exec with RLIMIT_NOFILE
# (in test file tests/integration/exec.bats, line 489)
# `[[ "${output}" == "65536" ]]' failed
#
# runc run -d --console-socket /tmp/bats-run-PQBrqc/runc.wbkYSh/tty/sock test_busybox (status=0):
#
# runc run test_ulimit (status=0):
# 1024
As it will not fail in #4267 .
@kolyshkin @cyphar @AkihiroSuda Do you know why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lifubang first, there's no such code (of running a second container with modified .process.args
) in #4267.
Second, you forgot to check the exit code (here and below).
Third, I think you don't have to add CAP_SYS_RESOURCE as this is the capability of the container itself, and rlimits are set by runc which already have it.
Finally, it looks like you just found a new bug -- RLIMIT_NOFILE
is not being set in CentOS 7 for runc run
. Can confirm this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lifubang the issue is, on CentOS 7 the hard limit for NOFILE is 4096. For some reason yet unknown to me, raising the limit is not possible for runc via prlimit(2) syscall, and it returns no error. Maybe it's a bug in CentOS 7 kernel (related to namespaces I guess).
Two ways to workaround it:
- Use lower limits (soft <= 1024, hard <= 4096) in the test code.
- Add something like this to the test:
function setup() {
+ prlimit --nofile=99999:99999 -p $$
setup_busybox
}
Surely, I won't recommend item 2, but it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in Ubuntu 20.04 LTS (kernel 5.15) and Fedora 40 (kernel 6.8) -- if the current hard limit is equal or lower than the one we're trying to set using prlimit syscall, it returns no error but the limit is not changed.
To repro this locally:
$ sudo
# ulimit -H -n 65536
# bats tests/integration/ulimit.bats
root@kir-tp1:/home/kir/git/runc# bats tests/integration/ulimit.bats
ulimit.bats
✗ runc run with RLIMIT_NOFILE
(in test file tests/integration/ulimit.bats, line 19)
`[[ "${output}" == "65536" ]]' failed
runc spec (status=0):
runc run test_ulimit (status=0):
1024
✓ runc exec with RLIMIT_NOFILE
✗ runc run+exec two containers with RLIMIT_NOFILE
(in test file tests/integration/ulimit.bats, line 45)
`[[ "${output}" == "65536" ]]' failed
runc spec (status=0):
runc run -d --console-socket /tmp/bats-run-TSa3k7/runc.jz4uuS/tty/sock test_busybox (status=0):
runc run test_ulimit (status=0):
1024
3 tests, 2 failures
Here's the test code (tests/integration/ulimit.bats
):
#!/usr/bin/env bats
load helpers
function setup() {
setup_busybox
}
function teardown() {
teardown_bundle
}
@test "runc run with RLIMIT_NOFILE" {
update_config '.process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 65536, "soft": 65536}]'
update_config '.process.args = ["/bin/sh", "-c", "ulimit -n"]'
runc run test_ulimit
[ "$status" -eq 0 ]
[[ "${output}" == "65536" ]]
}
@test "runc exec with RLIMIT_NOFILE" {
update_config ' .process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 2000, "soft": 1000}]'
runc run -d --console-socket "$CONSOLE_SOCKET" nofile
[ "$status" -eq 0 ]
for ((i = 0; i < 100; i++)); do
runc exec nofile sh -c 'ulimit -n'
echo "[$i] $output"
[[ "${output}" == "1000" ]]
done
}
@test "runc run+exec two containers with RLIMIT_NOFILE" {
update_config '.process.capabilities.bounding = ["CAP_SYS_RESOURCE"]'
update_config '.process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 65536, "soft": 65536}]'
runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
[ "$status" -eq 0 ]
update_config '.process.args = ["/bin/sh", "-c", "ulimit -n"]'
runc run test_ulimit
[ "$status" -eq 0 ]
[[ "${output}" == "65536" ]]
# issue: https://github.com/opencontainers/runc/issues/4195
runc exec test_busybox /bin/sh -c "ulimit -n"
[ "$status" -eq 0 ]
[[ "${output}" == "65536" ]]
}
If you set hard ulimit -n to 65537, it works again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in Ubuntu 20.04 LTS (kernel 5.15) and Fedora 40 (kernel 6.8) -- if the current hard limit is equal or lower than the one we're trying to set using prlimit syscall, it returns no error but the limit is not changed.
Thanks your corner case.
This is an issue which has no relate to the kernel version. This is another bug introduced by https://go-review.googlesource.com/c/go/+/476097 , but not a get/set race.
For example:
Consider Nofile limit: [cur, max], if syscall.rlimit.init
get the value of [1024, 65536], the value of origRlimitNofile is [1024, 65536], and the nofile limit of runc init process is [65536, 65536], after runc create/exec process set it to [65537, 65537] or [65536, 65536], it will success, and when run init do unix.Exec
, it will restore nofile limit to [1024, 65536], it will success.
But if runc create/exec set it to [65535, 65535], it will fail when restoring to [1024, 65536], and the error of this failure has been ignored by 'syscall.Exec`, so the nofile rliimit remains [65535, 65535].
Please see #4267 .
So I think the solution of #4237 is correct, but need some refactors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another addition: we do have a test case for setting RLIMIT_NOFILE during runc exec, but the test case is masking the issue because it sets Cur and Max to the same value (and so Go runtime does not do any tricks with saving/restoring NOFILE).
With this change, the test fails:
diff --git a/libcontainer/integration/exec_test.go b/libcontainer/integration/exec_test.go
index 9dd056ad..f4b8b839 100644
--- a/libcontainer/integration/exec_test.go
+++ b/libcontainer/integration/exec_test.go
@@ -140,7 +140,7 @@ func testRlimit(t *testing.T, userns bool) {
// the Setrlimit call happens early enough that we still have permissions to raise the limit.
ok(t, unix.Setrlimit(unix.RLIMIT_NOFILE, &unix.Rlimit{
Max: 1024,
- Cur: 1024,
+ Cur: 512,
}))
out := runContainerOk(t, config, "/bin/sh", "-c", "ulimit -n")
With the above change, the test now fails like this:
[kir@kir-tp1 runc]$ go1.20.14 test -exec sudo -v ./libcontainer/integration -run Rlimit
=== RUN TestRlimit
exec_test.go:148: expected rlimit to be 1025, got 512
--- FAIL: TestRlimit (0.11s)
=== RUN TestUsernsRlimit
exec_test.go:148: expected rlimit to be 1025, got 512
--- FAIL: TestUsernsRlimit (0.10s)
=== RUN TestExecInUsernsRlimit
execin_test.go:129: expected rlimit to be 1026, got 512
--- FAIL: TestExecInUsernsRlimit (0.14s)
=== RUN TestExecInRlimit
execin_test.go:129: expected rlimit to be 1026, got 512
--- FAIL: TestExecInRlimit (0.15s)
FAIL
FAIL github.com/opencontainers/runc/libcontainer/integration 0.542s
FAIL
d59fb11
to
f39d3ab
Compare
f39d3ab
to
db890b6
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
db890b6
to
56399ff
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
56399ff
to
775b1e5
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
775b1e5
to
291c4af
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
291c4af
to
2cd053a
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
2cd053a
to
b7efb3b
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
b7efb3b
to
be7e416
Compare
ee4686b
to
ce39ae6
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
ce39ae6
to
6c95ade
Compare
Issue: opencontainers#4195 Since https://go-review.googlesource.com/c/go/+/476097, there is a get/set race between runc exec and syscall.rlimit.init, so we need to call setupRlimits after syscall.rlimit.init() completed. Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
6c95ade
to
a83367c
Compare
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
a83367c
to
4ea0bf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
Fix: #4195
Close: #4237
1. Fix a get/set race between
runc exec
andsyscall.rlimit.init()
As @ls-ggg has given the detailed steps to reproduce the issue #4195 , and has given the core reason is that the go runtime will cache the value of rlimit nofile. [1]
When we are running
runc exec
with nofile limit, runc set it in the parent process, it may cause a race with go runtime init. The race condition is in the time when runc parent process set the container child process's nofile limit just after go runtime init has fetched the nofile limit. [2]So, we should set nofile limit after
syscall.rlimit.init()
completed.2. Fix an edge case caused by nofile rlimit cache in go stdlib
As @kolyshkin have found an edge case for set nofile rlimit by runc, which is also caused by nofile rlimit cache in go stdlib. Although we have a way to resolve the above get/set race, but if the hard value of nofile rlimit configured in
config.json
is bigger than this value ofrunc create/run/exec
, it will also be incorrect restored bysyscall.Exec
inrunc init
. So we need to clear the nofile rlimit cache before we start the container initial process if we need to set nofile rlimit for the container.[1] golang/go@f5eef58#diff-ec665e9789f8cf5cd1828ad7fa9f0ff4ebc1f5b5dd0fc82a296da5c07da7ece6
[2] https://github.com/golang/go/blob/f5eef58e4381259cbd84b3f2074c79607fb5c821/src/syscall/rlimit.go#L34-L35