Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: Fix inotify fd leak when cgroup is deleted #212

Merged
merged 4 commits into from
Feb 1, 2022

Conversation

jepio
Copy link
Contributor

@jepio jepio commented Dec 3, 2021

This resolves the leak of inotify fd by additionally subscribing to cgroup.events notifications, so that we can handle container exits and cgroup deletion. This is best tested in the context of k8s, the same manifest as containerd/containerd#6323 can be used - increase the replica count and watch the inotify instances get exhausted.

See issue: containerd/containerd#5670

v2/manager.go Outdated Show resolved Hide resolved
@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch from 18751d0 to fc0bc23 Compare December 3, 2021 15:57
v2/manager.go Outdated Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch from fc0bc23 to 2ae9ba1 Compare December 3, 2021 16:04
@jepio
Copy link
Contributor Author

jepio commented Dec 7, 2021

Hi @AkihiroSuda, can this be merged? Does this repository need to be tagged so that it can be updated in containerd/containerd?

@AkihiroSuda AkihiroSuda requested review from fuweid and ktock December 7, 2021 09:32
@AkihiroSuda
Copy link
Member

Hi @AkihiroSuda, can this be merged?

Needs another LGTM from the committers.

Does this repository need to be tagged so that it can be updated in containerd/containerd?

Yes, we will probably create a tag right after merging this.

Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

v2/manager.go Show resolved Hide resolved
v2/manager.go Show resolved Hide resolved
v2/manager.go Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch from 2ae9ba1 to beb8965 Compare December 8, 2021 10:02
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some tests for this behavior change so we make sure it does not regress?

@jbehling
Copy link

Any word on when we can get this merged? This is blocking us from rolling out cgroups v2

@jepio
Copy link
Contributor Author

jepio commented Jan 26, 2022

@invidian you're welcome to add tests, I'm currently working on other things and don't have the time.

Is there any one else that wants to review this? I was hoping to get this into containerd 1.6.0 - containerd/containerd#5670 is marked as a priority 1 bug.

@invidian
Copy link

@invidian you're welcome to add tests, I'm currently working on other things and don't have the time.

I can have a look, but no promises and I guess it shouldn't be blocking this PR.

Copy link

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick draft of test which could be written, though it relies on go.uber.org/goleak for detecting leaking goroutines, maybe that could be improved. This currently fails on main branch , but passes on this PR. But this PR modifies much more so perhaps more tests should be added.

While reading the code and writing tests for it I also spotted some potentially breaking changes, so mentioned them in the comments.

package v2

import (
  "fmt"
  "io/fs"
  "os"
  "path/filepath"
  "strings"
  "testing"
  "time"

  "go.uber.org/goleak"
)

func Test_EventWatch_stops_watching_memory_events_when_cgroup_is_removed(t *testing.T) {
  skipWhenNotRunAsRoot(t)
  checkCgroupMode(t)

  group := "/cpu-test-cg"
  groupPath := fmt.Sprintf("%s-%d", group, os.Getpid())

  c, err := NewManager(defaultCgroup2Path, groupPath, &Resources{})
  if err != nil {
    t.Fatal("failed to init new cgroup manager: ", err)
  }

  evCh, errCh := c.EventChan()

  data := `low 0
high 0
max 0
oom 0
oom_kill 0
`

  ticker := time.NewTicker(100 * time.Millisecond)
  done := make(chan bool)

  go func() {
    for {
      select {
      case <-done:
        return
      case <-ticker.C:
        err := os.WriteFile(filepath.Join(defaultCgroup2Path, groupPath, "memory.events"), []byte(data), 0o444)
        if !strings.HasSuffix(err.Error(), fs.ErrInvalid.Error()) {
          t.Logf("Unexpected error writing event to cgroup: %v", err)
          t.Fail()
        }
      }
    }
  }()

  select {
  case ev := <-evCh:
    t.Logf("event received: %v", ev)
  case err := <-errCh:
    if err != nil {
      t.Fatalf("Received error event: %v", err)
    }
  }

  t.Cleanup(func() {
    if err := os.Remove(filepath.Join(defaultCgroup2Path, groupPath)); err != nil {
      t.Fatalf("Removing cgroup path failed: %v", err)
    }

    ticker.Stop()
    done <- true

    goleak.VerifyNone(t)
  })
}

v2/manager.go Outdated
Comment on lines 571 to 590
if wd < 0 {
if err != nil {
syscall.Close(fd)
return 0, 0, fmt.Errorf("failed to add inotify watch for %q", fpath)
return 0, 0, fmt.Errorf("failed to add inotify watch for %q: %w", fpath, err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unrelated I think and could be done at least in a separate commit. Tests shouldn't be hard either, I think you just need to pass it an empty string (so it has NULL as first byte).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm touching the if condition for this branch and harmonizing the error handling with the other InotifyAddWatch so this change is related. It's not worth breaking out into a separate commit.

v2/manager.go Outdated Show resolved Hide resolved
v2/manager.go Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
v2/manager.go Show resolved Hide resolved
v2/manager.go Show resolved Hide resolved
v2/manager.go Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
@fuweid fuweid added this to the 1.6 milestone Jan 27, 2022
When running on cgroup2, we currently leak the inotify instance (and goroutine
blocked on read) used to monitor 'memory.events' on any container exit. This is
highly problematic when containers are automatically restarted because we will
exhaust either the fd limit or system-wide inotify instance limit.

When a process exits, there is no memory event and even when the cgroup is
deleted, the inotify read is also not unblocked. This is not the case when
containerd is running on cgroup (v1) because that uses a different mechanism
for notification and detects cgroup deletion.

Fulfill the contract on cgroup2 by additionally monitoring cgroup.events for
process exit. When the last process exits the kernel signals an event on
'cgroup.events'. For robustness we check both 'cgroup.events' and
'memory.events' on any notification and also handle ENOENT/ENODEV errors from
read/open of 'memory.events'.

We signal exit up the stack by closing the error channel. Strangely, the error
channel was not previously being returned to the caller.

Signed-off-by: Jeremi Piotrowski <[email protected]>
@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch 2 times, most recently from da4c41a to 64c0aa5 Compare January 27, 2022 15:53
Copy link

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments @jepio!

Vagrantfile Outdated Show resolved Hide resolved
v2/manager_test.go Outdated Show resolved Hide resolved
v2/manager_test.go Show resolved Hide resolved
v2/manager.go Outdated Show resolved Hide resolved
@dmcgowan
Copy link
Member

The vendor directory got pulled into one of the commits.

jepio added 2 commits January 31, 2022 14:29
This makes waitForEvents() more shorter and more readable.

Signed-off-by: Jeremi Piotrowski <[email protected]>
@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch 2 times, most recently from 1742764 to 964104f Compare January 31, 2022 13:51
@jepio
Copy link
Contributor Author

jepio commented Jan 31, 2022

@dmcgowan got rid of the vendor directory again, I believe that should cover everything.

@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch from 964104f to 8df5eac Compare January 31, 2022 15:32
Copy link

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

v2/manager_test.go Outdated Show resolved Hide resolved
Copy link
Member

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor nitpick

@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch from 8df5eac to 22d6547 Compare February 1, 2022 11:04
This test case demonstrates that now the oom monitor goroutine
terminates when a cgroup is removed. Systemd is responsible for cgroup
removal so a "cat" process is spawned in a scope, with stdin attached to
a pipe. When the pipe is closed the process terminates, systemd notices
and removes the cgroup, which results in the test case finishing.

Signed-off-by: Jeremi Piotrowski <[email protected]>
@jepio jepio force-pushed the jepio/cgroup2-fix-inotify-leak branch from 22d6547 to a7d6888 Compare February 1, 2022 11:05
Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 1df7813 into containerd:main Feb 1, 2022
@jepio jepio deleted the jepio/cgroup2-fix-inotify-leak branch February 17, 2022 13:04
kzys added a commit to kzys/containerd that referenced this pull request May 19, 2022
This commit brings the resource leak fix below.
containerd/cgroups#212

The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due
to conflicts.

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/containerd that referenced this pull request May 19, 2022
This commit brings the resource leak fix below.
containerd/cgroups#212

The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due
to conflicts.

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/containerd that referenced this pull request May 19, 2022
This commit brings the resource leak fix below.
containerd/cgroups#212

The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due
to conflicts.

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/containerd that referenced this pull request May 19, 2022
This commit brings the resource leak fix below.
containerd/cgroups#212

- The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due
  to conflicts.
- ErrCgroupDeleted check is removed since LoadManager won't return
  the error (see containerd#5739) and the variable was removed in 1.0.3.

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/containerd that referenced this pull request May 19, 2022
This commit brings the resource leak fix below.
containerd/cgroups#212

- The original upgrade PR was containerd#6498. I didn't cherry-pick the commit due
  to conflicts.
- ErrCgroupDeleted check is removed since LoadManager won't return
  the error (see containerd#5739) and the variable was removed in 1.0.3.

Signed-off-by: Kazuyoshi Kato <[email protected]>
yashsingh74 pushed a commit to yashsingh74/containerd-main that referenced this pull request Jul 31, 2022
This commit brings the resource leak fix below.
containerd/cgroups#212

- The original upgrade PR was containerd#6498. I didn't cherry-pick the commit due
  to conflicts.
- ErrCgroupDeleted check is removed since LoadManager won't return
  the error (see containerd#5739) and the variable was removed in 1.0.3.

Signed-off-by: Kazuyoshi Kato <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants