-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up ulimits for rootless containers. #6004
Conversation
if !nofileSet { | ||
if isRootless { | ||
var rlimit unix.Rlimit | ||
if err := unix.Getrlimit(unix.RLIMIT_NOFILE, &rlimit); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can hard-error on these; I can see cases where this won't work, but the rest of Podman will function fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have --ulimit host
that tells Podman to not set any limit block in the OCI configuration file. When there are no limit sets the OCI runtime won't apply any limit to the container so the host settings are propagated to the container.
Do we have to make that the default (AFAICS it is not yet supported by Buildah)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would figure it was the default. Where does the 1024,1024 come from that we have now?
I also found it curious that
$ podman unshare grep open /proc/self/limits
Max open files 524288 524288 files
Why is the default to set the soft limit== harlimit in a user namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do that automatically to avoid the max number of open files
error when binding too many ports: 9126b45.
The issue with rootless is that we cannot set higher limits that the user currently has, so we can't default to the values used by root as they could not be allowed in the current session.
At the same time, we risk of setting higher limits than a root container would have if defaulting to --ulimit host
.
We'd probably have to check each default value and lower it when it is higher than what the user is allowed to use, but we can adress this part later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we merge this change or drop it?
☔ The latest upstream changes (presumably #6000) made this pull request unmergeable. Please resolve the merge conflicts. |
pkg/spec/spec.go
Outdated
} | ||
g.AddProcessRlimits("RLIMIT_NPROC", rlimit.Cur, rlimit.Max) | ||
} else { | ||
g.AddProcessRlimits("RLIMIT_NPROC", kernelMax, kernelMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps consider a small DRY principle change?
if !nprocSet{
var ( cur uint64 = kernelMax, max uint64 kernel Max )
if isRootless {
var rlimit unix.Rlimit
if err := unit.Getrlimit(unix.RLIMIT_NPROC, &rlimit); err != nil {
return errors.Wrapf(err, "failed to return RLIMIT_NPROC ulimit")
}
cur = rlimit.Cur, max = rlimit.Max
}
g.AddProcessRlimits("RLIMIT_NPROC", cur, max)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this DRY would apply to number of files as well.
pkg/spec/spec.go
Outdated
} | ||
g.AddProcessRlimits("RLIMIT_NOFILE", rlimit.Cur, rlimit.Max) | ||
} else { | ||
g.AddProcessRlimits("RLIMIT_NOFILE", kernelMax, kernelMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a hard-coded constant of 1048576
used for kernelMax
for both number of files and number of processes? Is there a proper default that could be determined from the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know, I was trying to fix rootless case. I would actually think we should just use the current environment in both cases. But not sure. I am slightly worried about root case being able to override default MAX if the process has CAP_SYS_RESOURCE.
man capabilities
...
CAP_SYS_RESOURCE
* Use reserved space on ext2 filesystems;
* make ioctl(2) calls controlling ext3 journaling;
* override disk quota limits;
* increase resource limits (see setrlimit(2));
* override RLIMIT_NPROC resource limit;
* override maximum number of consoles on console allocation;
* override maximum number of keymaps;
* allow more than 64hz interrupts from the real-time clock;
* raise msg_qbytes limit for a System V message queue above the limit in
/proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));
* allow the RLIMIT_NOFILE resource limit on the number of "in-flight" file descrip‐
tors to be bypassed when passing file descriptors to another process via a UNIX
domain socket (see unix(7));
* override the /proc/sys/fs/pipe-size-max limit when setting the capacity of a pipe
using the F_SETPIPE_SZ fcntl(2) command.
* use F_SETPIPE_SZ to increase the capacity of a pipe above the limit specified by
/proc/sys/fs/pipe-max-size;
* override /proc/sys/fs/mqueue/queues_max limit when creating POSIX message queues
(see mq_overview(7));
* employ the prctl(2) PR_SET_MM operation;
* set /proc/[pid]/oom_score_adj to a value lower than the value last set by a
process with CAP_SYS_RESOURCE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is just the default used by Docker.
Also, shouldn't these changes be done in pkg/specgen
instead of pkg/spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is being done in both places. pkg/spec is still used by varlink.
pkg/specgen/generate/oci.go
Outdated
logrus.Warnf("failed to get RLIMIT_NPROC ulimit %q", err) | ||
} | ||
} else { | ||
g.AddProcessRlimits("RLIMIT_NPROC", kernelMax, kernelMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the same DRY principle can be applied here.
Currently we are setting the maximum limits for rootful podman containers, no reason not to set them by default for rootless users as well Signed-off-by: Daniel J Walsh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, looks good!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: portante, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
lgtm |
/lgtm |
Currently we are setting the maximum limits for rootful podman containers,
no reason not to set them by default for rootless users as well
Signed-off-by: Daniel J Walsh [email protected]