Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAdvisor leaks journalctl (and thus inotify instances) #1725

Closed
mtaufen opened this issue Aug 22, 2017 · 2 comments
Closed

CAdvisor leaks journalctl (and thus inotify instances) #1725

mtaufen opened this issue Aug 22, 2017 · 2 comments

Comments

@mtaufen
Copy link
Contributor

mtaufen commented Aug 22, 2017

When the Kubernetes kubelet exits, the journalctl started here:

cmd := exec.Command("journalctl", "-k", "-f")
is left running. Each of these journalctl orphans consumes four inotify instances. Since the Kubelet needs to restart for our dynamic config feature, an arbitrary number of config changes can put the Kubelet into a crash loop (the Kubelet crashes if it cannot start CAdvisor, which happens if we hit the inotify instance cap). The cap is typically low, e.g. 128, so this can happen pretty easily.

I think we can try setting cmd.SysProcAttr.Pdeathsig to SIGKILL to fix this.

@dashpole
Copy link
Collaborator

xref: kubernetes/kubernetes#34965

mtaufen added a commit to mtaufen/cadvisor that referenced this issue Aug 23, 2017
This fixes the journalctl leak that occurs when a process that uses
cadvisor exits. See issues google#1725 and
kubernetes/kubernetes#34965.
@dashpole
Copy link
Collaborator

closed via #1729

mtaufen added a commit to mtaufen/kubernetes that referenced this issue Sep 7, 2017
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.

Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
google/cadvisor#1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.

This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.

Related issue: kubernetes#50217
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Sep 8, 2017
Automatic merge from submit-queue (batch tested with PRs 52047, 52063, 51528)

Improve dynamic kubelet config e2e node test and fix bugs

Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.

Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
google/cadvisor#1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.

This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.

Related issue: #50217

I had been sitting on this until the cAdvisor fix merged in #51751, as these tests fail without that fix.

**Release note**:

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants