Skip to content

Commit

Permalink
Ignore kernel memory settings
Browse files Browse the repository at this point in the history
This is somewhat radical approach to deal with kernel memory.

Per-cgroup kernel memory limiting was always problematic. A few
examples:

 - older kernels had bugs and were even oopsing sometimes (best example
   is RHEL7 kernel);
 - kernel is unable to reclaim the kernel memory so once the limit is
   hit a cgroup is toasted;
 - some kernel memory allocations don't allow failing.

In addition to that,

 - users don't have a clue about how to set kernel memory limits
   (as the concept is much more complicated than e.g. [user] memory);
 - different kernels might have different kernel memory usage,
   which is sort of unexpected;
 - cgroup v2 do not have a [dedicated] kmem limit knob, and thus
   runc silently ignores kernel memory limits for v2;
 - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see
   torvalds/linux@0158115f702b).

In view of all this, and as the runtime-spec lists memory.kernel
and memory.kernelTCP as OPTIONAL, let's ignore kernel memory
limits (for cgroup v1, same as we're already doing for v2).

This should result in less bugs and better user experience.

The only bad side effect from it might be that stat can show kernel
memory usage as 0 (since the accounting is not enabled).

[v2: add a warning in specconv that limits are ignored]

Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed Apr 12, 2021
1 parent 59ad417 commit 52390d6
Show file tree
Hide file tree
Showing 15 changed files with 16 additions and 318 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,18 +61,18 @@ sudo make install
with some of them enabled by default (see `BUILDTAGS` in top-level `Makefile`).

To change build tags from the default, set the `BUILDTAGS` variable for make,
e.g.
e.g. to disable seccomp:

```bash
make BUILDTAGS='seccomp'
make BUILDTAGS=""
```

| Build Tag | Feature | Enabled by default | Dependency |
|-----------|------------------------------------|--------------------|------------|
| seccomp | Syscall filtering | yes | libseccomp |
| nokmem | disable kernel memory accounting | no | <none> |

The following build tags were used earlier, but are now obsoleted:
- **nokmem** (since runc v1.0.0-rc94 kernel memory settings are ignored)
- **apparmor** (since runc v1.0.0-rc93 the feature is always enabled)
- **selinux** (since runc v1.0.0-rc93 the feature is always enabled)

Expand Down
2 changes: 0 additions & 2 deletions contrib/completions/bash/runc
Original file line number Diff line number Diff line change
Expand Up @@ -737,8 +737,6 @@ _runc_update() {
--cpu-share
--cpuset-cpus
--cpuset-mems
--kernel-memory
--kernel-memory-tcp
--memory
--memory-reservation
--memory-swap
Expand Down
56 changes: 0 additions & 56 deletions libcontainer/cgroups/fs/kmem.go

This file was deleted.

15 changes: 0 additions & 15 deletions libcontainer/cgroups/fs/kmem_disabled.go

This file was deleted.

38 changes: 1 addition & 37 deletions libcontainer/cgroups/fs/memory.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,31 +33,6 @@ func (s *MemoryGroup) Name() string {
}

func (s *MemoryGroup) Apply(path string, d *cgroupData) (err error) {
if path == "" {
return nil
}
if memoryAssigned(d.config) {
if _, err := os.Stat(path); os.IsNotExist(err) {
if err := os.MkdirAll(path, 0755); err != nil {
return err
}
// Only enable kernel memory accouting when this cgroup
// is created by libcontainer, otherwise we might get
// error when people use `cgroupsPath` to join an existed
// cgroup whose kernel memory is not initialized.
if err := EnableKernelMemoryAccounting(path); err != nil {
return err
}
}
}
defer func() {
if err != nil {
os.RemoveAll(path)
}
}()

// We need to join memory cgroup after set memory limits, because
// kmem.limit_in_bytes can only be set when the cgroup is empty.
return join(path, d.pid)
}

Expand Down Expand Up @@ -140,23 +115,14 @@ func (s *MemoryGroup) Set(path string, cgroup *configs.Cgroup) error {
return err
}

if cgroup.Resources.KernelMemory != 0 {
if err := setKernelMemory(path, cgroup.Resources.KernelMemory); err != nil {
return err
}
}
// ignore KernelMemory and KernelMemoryTCP

if cgroup.Resources.MemoryReservation != 0 {
if err := fscommon.WriteFile(path, "memory.soft_limit_in_bytes", strconv.FormatInt(cgroup.Resources.MemoryReservation, 10)); err != nil {
return err
}
}

if cgroup.Resources.KernelMemoryTCP != 0 {
if err := fscommon.WriteFile(path, "memory.kmem.tcp.limit_in_bytes", strconv.FormatInt(cgroup.Resources.KernelMemoryTCP, 10)); err != nil {
return err
}
}
if cgroup.Resources.OomKillDisable {
if err := fscommon.WriteFile(path, "memory.oom_control", "1"); err != nil {
return err
Expand Down Expand Up @@ -238,8 +204,6 @@ func memoryAssigned(cgroup *configs.Cgroup) bool {
return cgroup.Resources.Memory != 0 ||
cgroup.Resources.MemoryReservation != 0 ||
cgroup.Resources.MemorySwap > 0 ||
cgroup.Resources.KernelMemory > 0 ||
cgroup.Resources.KernelMemoryTCP > 0 ||
cgroup.Resources.OomKillDisable ||
(cgroup.Resources.MemorySwappiness != nil && int64(*cgroup.Resources.MemorySwappiness) != -1)
}
Expand Down
56 changes: 0 additions & 56 deletions libcontainer/cgroups/fs/memory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,62 +190,6 @@ func TestMemorySetSwapSmallerThanMemory(t *testing.T) {
}
}

func TestMemorySetKernelMemory(t *testing.T) {
helper := NewCgroupTestUtil("memory", t)
defer helper.cleanup()

const (
kernelMemoryBefore = 314572800 // 300M
kernelMemoryAfter = 524288000 // 500M
)

helper.writeFileContents(map[string]string{
"memory.kmem.limit_in_bytes": strconv.Itoa(kernelMemoryBefore),
})

helper.CgroupData.config.Resources.KernelMemory = kernelMemoryAfter
memory := &MemoryGroup{}
if err := memory.Set(helper.CgroupPath, helper.CgroupData.config); err != nil {
t.Fatal(err)
}

value, err := fscommon.GetCgroupParamUint(helper.CgroupPath, "memory.kmem.limit_in_bytes")
if err != nil {
t.Fatalf("Failed to parse memory.kmem.limit_in_bytes - %s", err)
}
if value != kernelMemoryAfter {
t.Fatal("Got the wrong value, set memory.kmem.limit_in_bytes failed.")
}
}

func TestMemorySetKernelMemoryTCP(t *testing.T) {
helper := NewCgroupTestUtil("memory", t)
defer helper.cleanup()

const (
kernelMemoryTCPBefore = 314572800 // 300M
kernelMemoryTCPAfter = 524288000 // 500M
)

helper.writeFileContents(map[string]string{
"memory.kmem.tcp.limit_in_bytes": strconv.Itoa(kernelMemoryTCPBefore),
})

helper.CgroupData.config.Resources.KernelMemoryTCP = kernelMemoryTCPAfter
memory := &MemoryGroup{}
if err := memory.Set(helper.CgroupPath, helper.CgroupData.config); err != nil {
t.Fatal(err)
}

value, err := fscommon.GetCgroupParamUint(helper.CgroupPath, "memory.kmem.tcp.limit_in_bytes")
if err != nil {
t.Fatalf("Failed to parse memory.kmem.tcp.limit_in_bytes - %s", err)
}
if value != kernelMemoryTCPAfter {
t.Fatal("Got the wrong value, set memory.kmem.tcp.limit_in_bytes failed.")
}
}

func TestMemorySetMemorySwappinessDefault(t *testing.T) {
helper := NewCgroupTestUtil("memory", t)
defer helper.cleanup()
Expand Down
33 changes: 0 additions & 33 deletions libcontainer/cgroups/systemd/v1.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ import (
systemdDbus "github.com/coreos/go-systemd/v22/dbus"
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/cgroups/fs"
"github.com/opencontainers/runc/libcontainer/cgroups/fscommon"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/sirupsen/logrus"
)
Expand Down Expand Up @@ -170,14 +169,6 @@ func (m *legacyManager) Apply(pid int) error {
}
properties = append(properties, c.SystemdProps...)

// We have to set kernel memory here, as we can't change it once
// processes have been attached to the cgroup.
if c.Resources.KernelMemory != 0 {
if err := enableKmem(c); err != nil {
return err
}
}

if err := startUnit(dbusConnection, unitName, properties); err != nil {
return err
}
Expand Down Expand Up @@ -404,30 +395,6 @@ func (m *legacyManager) Set(container *configs.Config) error {
return nil
}

func enableKmem(c *configs.Cgroup) error {
path, err := getSubsystemPath(c, "memory")
if err != nil {
if cgroups.IsNotFound(err) {
return nil
}
return err
}

if err := os.MkdirAll(path, 0755); err != nil {
return err
}
// do not try to enable the kernel memory if we already have
// tasks in the cgroup.
content, err := fscommon.ReadFile(path, "tasks")
if err != nil {
return err
}
if len(content) > 0 {
return nil
}
return fs.EnableKernelMemoryAccounting(path)
}

func (m *legacyManager) GetPaths() map[string]string {
m.mu.Lock()
defer m.mu.Unlock()
Expand Down
6 changes: 0 additions & 6 deletions libcontainer/configs/cgroup_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,6 @@ type Resources struct {
// Total memory usage (memory + swap); set `-1` to enable unlimited swap
MemorySwap int64 `json:"memory_swap"`

// Kernel memory limit (in bytes)
KernelMemory int64 `json:"kernel_memory"`

// Kernel memory limit for TCP use (in bytes)
KernelMemoryTCP int64 `json:"kernel_memory_tcp"`

// CPU shares (relative weight vs. other containers)
CpuShares uint64 `json:"cpu_shares"`

Expand Down
35 changes: 0 additions & 35 deletions libcontainer/integration/exec_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -667,41 +667,6 @@ func testPids(t *testing.T, systemd bool) {
// As such, we don't test that case. YMMV.
}

func TestRunWithKernelMemory(t *testing.T) {
testRunWithKernelMemory(t, false)
}

func TestRunWithKernelMemorySystemd(t *testing.T) {
if !systemd.IsRunningSystemd() {
t.Skip("Systemd is unsupported")
}
testRunWithKernelMemory(t, true)
}

func testRunWithKernelMemory(t *testing.T, systemd bool) {
if testing.Short() {
return
}
if cgroups.IsCgroup2UnifiedMode() {
t.Skip("cgroup v2 does not support kernel memory limit")
}

rootfs, err := newRootfs()
ok(t, err)
defer remove(rootfs)

config := newTemplateConfig(&tParam{
rootfs: rootfs,
systemd: systemd,
})
config.Cgroups.Resources.KernelMemory = 52428800

_, _, err = runContainer(config, "", "ps")
if err != nil {
t.Fatalf("runContainer failed with kernel memory limit: %v", err)
}
}

func TestCgroupResourcesUnifiedErrorOnV1(t *testing.T) {
testCgroupResourcesUnifiedErrorOnV1(t, false)
}
Expand Down
8 changes: 3 additions & 5 deletions libcontainer/specconv/spec_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/opencontainers/runc/libcontainer/seccomp"
libcontainerUtils "github.com/opencontainers/runc/libcontainer/utils"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"

"golang.org/x/sys/unix"
)
Expand Down Expand Up @@ -510,11 +511,8 @@ func CreateCgroupConfig(opts *CreateOpts, defaultDevs []*devices.Device) (*confi
if r.Memory.Swap != nil {
c.Resources.MemorySwap = *r.Memory.Swap
}
if r.Memory.Kernel != nil {
c.Resources.KernelMemory = *r.Memory.Kernel
}
if r.Memory.KernelTCP != nil {
c.Resources.KernelMemoryTCP = *r.Memory.KernelTCP
if r.Memory.Kernel != nil || r.Memory.KernelTCP != nil {
logrus.Warn("Kernel memory settings are ignored and will be removed")
}
if r.Memory.Swappiness != nil {
c.Resources.MemorySwappiness = r.Memory.Swappiness
Expand Down
6 changes: 0 additions & 6 deletions libcontainer/specconv/spec_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -276,12 +276,6 @@ func TestLinuxCgroupWithMemoryResource(t *testing.T) {
if cgroup.Resources.MemorySwap != swap {
t.Errorf("Expected to have %d as swap, got %d", swap, cgroup.Resources.MemorySwap)
}
if cgroup.Resources.KernelMemory != kernel {
t.Errorf("Expected to have %d as Kernel Memory, got %d", kernel, cgroup.Resources.KernelMemory)
}
if cgroup.Resources.KernelMemoryTCP != kernelTCP {
t.Errorf("Expected to have %d as TCP Kernel Memory, got %d", kernelTCP, cgroup.Resources.KernelMemoryTCP)
}
if cgroup.Resources.MemorySwappiness != swappinessPtr {
t.Errorf("Expected to have %d as memory swappiness, got %d", swappinessPtr, cgroup.Resources.MemorySwappiness)
}
Expand Down
2 changes: 0 additions & 2 deletions man/runc-update.8.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,6 @@ other options are ignored.
--cpu-share value CPU shares (relative weight vs. other containers)
--cpuset-cpus value CPU(s) to use
--cpuset-mems value Memory node(s) to use
--kernel-memory value Kernel memory limit (in bytes)
--kernel-memory-tcp value Kernel memory limit (in bytes) for tcp buffer
--memory value Memory limit (in bytes)
--memory-reservation value Memory reservation or soft_limit (in bytes)
--memory-swap value Total memory usage (memory + swap); set '-1' to enable unlimited swap
Expand Down
Loading

0 comments on commit 52390d6

Please sign in to comment.