-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mounting of /proc/sys/kernel/ns_last_pid #3451
Conversation
The CAP_CHECKPOINT_RESTORE linux capability provides the ability to update /proc/sys/kernel/ns_last_pid. However, because this file is under /proc, and by default both K8s and CRI-O specify that /proc/sys should be mounted as Read-Only, by default even with the capability specified, a process will not be able to write to ns_last_pid. To get around this, a pod author can specify a volume mount and a hostpath to bind-mount /proc/sys/kernel/ns_last_pid. However, runc does not allow specifying mounts under /proc. This commit adds /proc/sys/kernel/ns_last_pid to the validProcMounts string array to enable a pod author to mount ns_last_pid as read-write. The default remains unchanged; unless explicitly requested as a volume mount, ns_last_pid will remain read-only regardless of whether or not CAP_CHECKPOINT_RESTORE is specified. Signed-off-by: Irwin D'Souza <[email protected]>
@dsouzai looks like your idea is to use criu from inside the container. Do you think it is feasible to do so? |
@kolyshkin see cri-o/cri-o#5776 for a discussion about possible use cases. |
@kolyshkin Yeah, we have a prototype where we
In K8s, I'm able to successfully run the restore container with the runc change in this PR and the following pod spec:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@opencontainers/runc-maintainers PTAL |
@AkihiroSuda PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@dsouzai sent a mail on oci-dev asking when this will be in a release -- given that the patch is very simple and a bugfix, maybe we should backport it to release-1.1? |
The
CAP_CHECKPOINT_RESTORE
linux capability provides the ability to update/proc/sys/kernel/ns_last_pid
. However, because this "file" is under/proc
, and by default both K8s and CRI-O specify that/proc/sys
should be mounted as Read-Only, even with the capability specified, a process will not be able to write tons_last_pid
.To get around this, a pod author can specify a volume mount and a host path to bind-mount
/proc/sys/kernel/ns_last_pid
. However,runc
does not allow specifying mounts under/proc
.This PR adds
/proc/sys/kernel/ns_last_pid
to thevalidProcMounts
string array to enable a pod author to mountns_last_pid
as read-write. The default remains unchanged; unless explicitly requested as a volume mount,ns_last_pid
will remain read-only regardless of whether or notCAP_CHECKPOINT_RESTORE
is specified.