-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
entrypoint cleanup + non-systemd-host fix #2767
Changes from 4 commits
4173bb9
225da1a
cb8af45
0ffcf8d
66fa962
4175f82
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
[Unit] | ||
Description=slice used to run Kubernetes / Kubelet | ||
Before=slices.target | ||
|
||
[Slice] | ||
MemoryAccounting=true | ||
CPUAccounting=true |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,43 +55,43 @@ validate_userns() { | |
} | ||
|
||
overlayfs_preferrable() { | ||
if [[ -z "$userns" ]]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tabs => double space (like the rest of the file) |
||
# If we are outside userns, we can always assume overlayfs is preferrable | ||
return 0 | ||
fi | ||
|
||
# Debian 10 and 11 supports overlayfs in userns with a "permit_mount_in_userns" kernel patch, | ||
# but known to be unstable, so we avoid using it https://github.com/moby/moby/issues/42302 | ||
if [[ -e "/sys/module/overlay/parameters/permit_mounts_in_userns" ]]; then | ||
echo "INFO: UserNS: kernel seems supporting overlayfs with permit_mounts_in_userns, but avoiding due to instability." | ||
return 1 | ||
fi | ||
|
||
# Check overlayfs availability, by attempting to mount it. | ||
# | ||
# Overlayfs inside userns is known to be available for the following environments: | ||
# - Kernel >= 5.11 (but 5.11 and 5.12 have issues on SELinux hosts. Fixed in 5.13.) | ||
# - Ubuntu kernel | ||
# - Debian kernel (but avoided due to instability, see the /sys/module/overlay/... check above) | ||
# - Sysbox | ||
tmp=$(mktemp -d) | ||
mkdir -p "${tmp}/l" "${tmp}/u" "${tmp}/w" "${tmp}/m" | ||
if ! mount -t overlay -o lowerdir="${tmp}/l,upperdir=${tmp}/u,workdir=${tmp}/w" overlay "${tmp}/m"; then | ||
echo "INFO: UserNS: kernel does not seem to support overlayfs." | ||
rm -rf "${tmp}" | ||
return 1 | ||
fi | ||
umount "${tmp}/m" | ||
rm -rf "${tmp}" | ||
|
||
# Detect whether SELinux is Enforcing (or Permitted) by grepping /proc/self/attr/current . | ||
# Note that we cannot use `getenforce` command here because /sys/fs/selinux is typically not mounted for containers. | ||
if grep -q "_t:" "/proc/self/attr/current"; then | ||
# When the kernel is before v5.13 and SELinux is enforced, fuse-overlayfs might be safer, so we print a warning (but not an error). | ||
# https://github.com/torvalds/linux/commit/7fa2e79a6bb924fa4b2de5766dab31f0f47b5ab6 | ||
echo "WARN: UserNS: SELinux might be Enforcing. If you see an error related to overlayfs, try setting \`KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER=fuse-overlayfs\` ." >&2 | ||
fi | ||
return 0 | ||
if [[ -z "$userns" ]]; then | ||
# If we are outside userns, we can always assume overlayfs is preferrable | ||
return 0 | ||
fi | ||
|
||
# Debian 10 and 11 supports overlayfs in userns with a "permit_mount_in_userns" kernel patch, | ||
# but known to be unstable, so we avoid using it https://github.com/moby/moby/issues/42302 | ||
if [[ -e "/sys/module/overlay/parameters/permit_mounts_in_userns" ]]; then | ||
echo "INFO: UserNS: kernel seems supporting overlayfs with permit_mounts_in_userns, but avoiding due to instability." | ||
return 1 | ||
fi | ||
|
||
# Check overlayfs availability, by attempting to mount it. | ||
# | ||
# Overlayfs inside userns is known to be available for the following environments: | ||
# - Kernel >= 5.11 (but 5.11 and 5.12 have issues on SELinux hosts. Fixed in 5.13.) | ||
# - Ubuntu kernel | ||
# - Debian kernel (but avoided due to instability, see the /sys/module/overlay/... check above) | ||
# - Sysbox | ||
tmp=$(mktemp -d) | ||
mkdir -p "${tmp}/l" "${tmp}/u" "${tmp}/w" "${tmp}/m" | ||
if ! mount -t overlay -o lowerdir="${tmp}/l,upperdir=${tmp}/u,workdir=${tmp}/w" overlay "${tmp}/m"; then | ||
echo "INFO: UserNS: kernel does not seem to support overlayfs." | ||
rm -rf "${tmp}" | ||
return 1 | ||
fi | ||
umount "${tmp}/m" | ||
rm -rf "${tmp}" | ||
|
||
# Detect whether SELinux is Enforcing (or Permitted) by grepping /proc/self/attr/current . | ||
# Note that we cannot use `getenforce` command here because /sys/fs/selinux is typically not mounted for containers. | ||
if grep -q "_t:" "/proc/self/attr/current"; then | ||
# When the kernel is before v5.13 and SELinux is enforced, fuse-overlayfs might be safer, so we print a warning (but not an error). | ||
# https://github.com/torvalds/linux/commit/7fa2e79a6bb924fa4b2de5766dab31f0f47b5ab6 | ||
echo "WARN: UserNS: SELinux might be Enforcing. If you see an error related to overlayfs, try setting \`KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER=fuse-overlayfs\` ." >&2 | ||
fi | ||
return 0 | ||
} | ||
|
||
configure_containerd() { | ||
|
@@ -208,6 +208,8 @@ fix_cgroup() { | |
return | ||
fi | ||
echo 'INFO: detected cgroup v1' | ||
# We're looking for the cgroup-path for the cpu controller for the | ||
# current process. this tells us what cgroup-path the container is in. | ||
local current_cgroup | ||
current_cgroup=$(grep -E '^[^:]*:([^:]*,)?cpu(,[^,:]*)?:.*' /proc/self/cgroup | cut -d: -f3) | ||
if [ "$current_cgroup" = "/" ]; then | ||
|
@@ -225,16 +227,14 @@ fix_cgroup() { | |
# See: https://d2iq.com/blog/running-kind-inside-a-kubernetes-cluster-for-continuous-integration | ||
# Capture initial state before modifying | ||
# | ||
# Basically we're looking for the cgroup-path for the cpu controller for the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. things got re-arranged, and this comment got out of place. |
||
# current process. this tells us what cgroup-path the container is in. | ||
# Then we collect the subsystems that are active on this path. | ||
# Then we collect the subsystems that are active on our current process. | ||
# We assume the cpu controller is in use on all node containers, | ||
# and other controllers use the same sub-path. | ||
# | ||
# See: https://man7.org/linux/man-pages/man7/cgroups.7.html | ||
echo 'INFO: fix cgroup mounts for all subsystems' | ||
local cgroup_subsystems | ||
cgroup_subsystems=$(findmnt -lun -o source,target -t cgroup | grep "${current_cgroup}" | awk '{print $2}') | ||
cgroup_subsystems=$(findmnt -lun -o source,target -t cgroup | grep -F "${current_cgroup}" | awk '{print $2}') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -F / --fixed-strings means match fixed strings, not patterns / regexes. |
||
# Unmount the cgroup subsystems that are not known to runtime used to | ||
# run the container we are in. Those subsystems are not properly scoped | ||
# (i.e. the root cgroup is exposed, rather than something like docker/xxxx). | ||
|
@@ -245,7 +245,7 @@ fix_cgroup() { | |
# | ||
# See https://github.com/kubernetes/kubernetes/issues/109182 | ||
local unsupported_cgroups | ||
unsupported_cgroups=$(findmnt -lun -o source,target -t cgroup | grep_allow_nomatch -v "${current_cgroup}" | awk '{print $2}') | ||
unsupported_cgroups=$(findmnt -lun -o source,target -t cgroup | grep_allow_nomatch -v -F "${current_cgroup}" | awk '{print $2}') | ||
if [ -n "$unsupported_cgroups" ]; then | ||
local mnt | ||
echo "$unsupported_cgroups" | | ||
|
@@ -298,9 +298,15 @@ fix_cgroup() { | |
mount --make-rprivate /sys/fs/cgroup | ||
echo "${cgroup_subsystems}" | | ||
while IFS= read -r subsystem; do | ||
mount_kubelet_cgroup_root "/kubelet" "${subsystem}" | ||
mount_kubelet_cgroup_root "/kubelet.slice" "${subsystem}" | ||
mount_kubelet_cgroup_root /kubelet "${subsystem}" | ||
mount_kubelet_cgroup_root /kubelet.slice "${subsystem}" | ||
done | ||
# workaround for hosts not running systemd | ||
# we only do this for kubelet.slice because it's not relevant when not using | ||
# the systemd cgroup driver | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. functional change here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AkihiroSuda @kolyshkin can you please chime in here, I'm really in deep waters here and I think that a another pair of eyes will be nice for this change ... the cgroup glue keeps growing ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't work, wishful thinking :-) I need to get a v1 host going again and take a peek at what exactly systemd is up to with the pseudo-subsystem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I setup colima on an old personal machine to debug, this and a real slice works 0ffcf8d. |
||
if [[ ! "${cgroup_subsystems}" = */sys/fs/cgroup/systemd* ]]; then | ||
mount_kubelet_cgroup_root /kubelet.slice /sys/fs/cgroup/systemd | ||
fi | ||
} | ||
|
||
fix_machine_id() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these parameters ***Accounting needed twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes these parameters are for the slice. group vs specific unit.
https://www.freedesktop.org/software/systemd/man/systemd.slice.html