-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2: Fix inotify fd leak when cgroup is deleted #212
Conversation
18751d0
to
fc0bc23
Compare
fc0bc23
to
2ae9ba1
Compare
Hi @AkihiroSuda, can this be merged? Does this repository need to be tagged so that it can be updated in containerd/containerd? |
Needs another LGTM from the committers.
Yes, we will probably create a tag right after merging this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
2ae9ba1
to
beb8965
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some tests for this behavior change so we make sure it does not regress?
Any word on when we can get this merged? This is blocking us from rolling out cgroups v2 |
@invidian you're welcome to add tests, I'm currently working on other things and don't have the time. Is there any one else that wants to review this? I was hoping to get this into containerd 1.6.0 - containerd/containerd#5670 is marked as a priority 1 bug. |
I can have a look, but no promises and I guess it shouldn't be blocking this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick draft of test which could be written, though it relies on go.uber.org/goleak
for detecting leaking goroutines, maybe that could be improved. This currently fails on main
branch , but passes on this PR. But this PR modifies much more so perhaps more tests should be added.
While reading the code and writing tests for it I also spotted some potentially breaking changes, so mentioned them in the comments.
package v2
import (
"fmt"
"io/fs"
"os"
"path/filepath"
"strings"
"testing"
"time"
"go.uber.org/goleak"
)
func Test_EventWatch_stops_watching_memory_events_when_cgroup_is_removed(t *testing.T) {
skipWhenNotRunAsRoot(t)
checkCgroupMode(t)
group := "/cpu-test-cg"
groupPath := fmt.Sprintf("%s-%d", group, os.Getpid())
c, err := NewManager(defaultCgroup2Path, groupPath, &Resources{})
if err != nil {
t.Fatal("failed to init new cgroup manager: ", err)
}
evCh, errCh := c.EventChan()
data := `low 0
high 0
max 0
oom 0
oom_kill 0
`
ticker := time.NewTicker(100 * time.Millisecond)
done := make(chan bool)
go func() {
for {
select {
case <-done:
return
case <-ticker.C:
err := os.WriteFile(filepath.Join(defaultCgroup2Path, groupPath, "memory.events"), []byte(data), 0o444)
if !strings.HasSuffix(err.Error(), fs.ErrInvalid.Error()) {
t.Logf("Unexpected error writing event to cgroup: %v", err)
t.Fail()
}
}
}
}()
select {
case ev := <-evCh:
t.Logf("event received: %v", ev)
case err := <-errCh:
if err != nil {
t.Fatalf("Received error event: %v", err)
}
}
t.Cleanup(func() {
if err := os.Remove(filepath.Join(defaultCgroup2Path, groupPath)); err != nil {
t.Fatalf("Removing cgroup path failed: %v", err)
}
ticker.Stop()
done <- true
goleak.VerifyNone(t)
})
}
v2/manager.go
Outdated
if wd < 0 { | ||
if err != nil { | ||
syscall.Close(fd) | ||
return 0, 0, fmt.Errorf("failed to add inotify watch for %q", fpath) | ||
return 0, 0, fmt.Errorf("failed to add inotify watch for %q: %w", fpath, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unrelated I think and could be done at least in a separate commit. Tests shouldn't be hard either, I think you just need to pass it an empty string (so it has NULL as first byte).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm touching the if
condition for this branch and harmonizing the error handling with the other InotifyAddWatch
so this change is related. It's not worth breaking out into a separate commit.
When running on cgroup2, we currently leak the inotify instance (and goroutine blocked on read) used to monitor 'memory.events' on any container exit. This is highly problematic when containers are automatically restarted because we will exhaust either the fd limit or system-wide inotify instance limit. When a process exits, there is no memory event and even when the cgroup is deleted, the inotify read is also not unblocked. This is not the case when containerd is running on cgroup (v1) because that uses a different mechanism for notification and detects cgroup deletion. Fulfill the contract on cgroup2 by additionally monitoring cgroup.events for process exit. When the last process exits the kernel signals an event on 'cgroup.events'. For robustness we check both 'cgroup.events' and 'memory.events' on any notification and also handle ENOENT/ENODEV errors from read/open of 'memory.events'. We signal exit up the stack by closing the error channel. Strangely, the error channel was not previously being returned to the caller. Signed-off-by: Jeremi Piotrowski <[email protected]>
da4c41a
to
64c0aa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments @jepio!
The |
This makes waitForEvents() more shorter and more readable. Signed-off-by: Jeremi Piotrowski <[email protected]>
…entation Signed-off-by: Jeremi Piotrowski <[email protected]>
1742764
to
964104f
Compare
@dmcgowan got rid of the vendor directory again, I believe that should cover everything. |
964104f
to
8df5eac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor nitpick
8df5eac
to
22d6547
Compare
This test case demonstrates that now the oom monitor goroutine terminates when a cgroup is removed. Systemd is responsible for cgroup removal so a "cat" process is spawned in a scope, with stdin attached to a pipe. When the pipe is closed the process terminates, systemd notices and removes the cgroup, which results in the test case finishing. Signed-off-by: Jeremi Piotrowski <[email protected]>
22d6547
to
a7d6888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit brings the resource leak fix below. containerd/cgroups#212 The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due to conflicts. Signed-off-by: Kazuyoshi Kato <[email protected]>
This commit brings the resource leak fix below. containerd/cgroups#212 The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due to conflicts. Signed-off-by: Kazuyoshi Kato <[email protected]>
This commit brings the resource leak fix below. containerd/cgroups#212 The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due to conflicts. Signed-off-by: Kazuyoshi Kato <[email protected]>
This commit brings the resource leak fix below. containerd/cgroups#212 - The upgrade PR in main was containerd#6498. I didn't cherry-pick the commit due to conflicts. - ErrCgroupDeleted check is removed since LoadManager won't return the error (see containerd#5739) and the variable was removed in 1.0.3. Signed-off-by: Kazuyoshi Kato <[email protected]>
This commit brings the resource leak fix below. containerd/cgroups#212 - The original upgrade PR was containerd#6498. I didn't cherry-pick the commit due to conflicts. - ErrCgroupDeleted check is removed since LoadManager won't return the error (see containerd#5739) and the variable was removed in 1.0.3. Signed-off-by: Kazuyoshi Kato <[email protected]>
This commit brings the resource leak fix below. containerd/cgroups#212 - The original upgrade PR was containerd#6498. I didn't cherry-pick the commit due to conflicts. - ErrCgroupDeleted check is removed since LoadManager won't return the error (see containerd#5739) and the variable was removed in 1.0.3. Signed-off-by: Kazuyoshi Kato <[email protected]>
This resolves the leak of inotify fd by additionally subscribing to
cgroup.events
notifications, so that we can handle container exits and cgroup deletion. This is best tested in the context of k8s, the same manifest as containerd/containerd#6323 can be used - increase the replica count and watch the inotify instances get exhausted.See issue: containerd/containerd#5670