Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated OOM'ing (perhaps due to a large number of namespaces) #493

Closed
directionless opened this issue Jul 13, 2018 · 17 comments
Closed

Repeated OOM'ing (perhaps due to a large number of namespaces) #493

directionless opened this issue Jul 13, 2018 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@directionless
Copy link

directionless commented Jul 13, 2018

/kind bug

What happened:

I'm running kube-state-metrics as part of kube-prometheus but it's repeatedly triggering an OOMKilled.

I suspect this is because of the large number of namespaces we have. Some bits of information:

$ kubectl get ns | wc -l
     238

$ kubectl get nodes | wc -l
      47

$ kubectl get pods --all-namespaces | wc -l
    4008

$ kubectl get secrets --all-namespaces | wc -l
    8313

The resource request and limits are: { "cpu": "188m", "memory": "5290Mi" }. (Unfortunately, I'm having trouble getting resource utilization before the oom)

What you expected to happen:

Not OOM

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.4-gke.2", GitCommit:"eb2e43842aaa21d6f0bb65d6adf5a84bbdc62eaf", GitTreeState:"clean", BuildDate:"2018-06-15T21:48:39Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kube-state-metrics image version
"quay.io/coreos/kube-state-metrics:v1.3.1"
@directionless
Copy link
Author

This might be an instance of #461. I upped the CPU limit.

screen shot 2018-07-13 at 21 38 41

@brancz
Copy link
Member

brancz commented Jul 16, 2018

Could you try removing the addon-resizer and just remove all resource limits and requests? I have a feeling that the resource recommendations that we have currently are off. They are from the scalability tests from around a year ago.

@directionless
Copy link
Author

directionless commented Jul 16, 2018

I went the other way, and increased the limits until kube-state-metrics stopped hitting them. As alluded to with those graphs, when if I get the CPU allocation a bit higher, memory usage stops growing, and resource usage stabilizes. (As alluded to in the graph above)

But. now prometheus reports context deadline exceeded when trying to scrape it.

Some additional counts:

$ kubectl get pods --all-namespaces -a | wc -l
    4732
$ kubectl get jobs --all-namespaces -a | wc -l
    4651
$ kubectl get cronjobs --all-namespaces -a | wc -l
No resources found.
       0

@brancz
Copy link
Member

brancz commented Jul 17, 2018

You can try increasing your scrape interval to up to 2mins. The scrape interval configured the timeout, which is what you are seeing. Up to 2 mins is generally an accpetable upper bound for scrape intervals.

@brancz
Copy link
Member

brancz commented Jul 17, 2018

Sorry that was incorrect, I meant it's generally safe to bump the timeout up to the scrape interval and the scrape interval is safe to do up to 2mins.

@directionless
Copy link
Author

I upped the resource allocations, and the script timeouts, and let it sit a couple days. I think something is up.

scrape time is usually 10-20s, but it has hourlyish spikes to over 100s. CPU usage looks pretty constant. Memory usage shows a step function over time.

I don't know much about the go pprof stuff, but if I look at /debug/pprof the heap number grows over time. I don't know where that is in the graphed debug metrics. And I'm kind of guessing with pprof

I don't know if it's useful, but here's a collection of heap dumps.

for arg in inuse_space inuse_objects alloc_space alloc_objects; do 
  echo -e "\n$arg"; go tool pprof -$arg -top http://localhost:6060/debug/pprof/heap  \
    > /tmp/pprofs/$arg.txt
done

alloc_objects.txt
alloc_space.txt
inuse_objects.txt
inuse_space.txt

@brancz
Copy link
Member

brancz commented Jul 20, 2018

Thanks a lot for those numbers and profiles! From the profiles it looks that the majority comes from just producing the Prometheus metric output, which is arguably not as efficient as it should be. We should investigate more high performance solutions, also because kube-state-metrics only requires a small amount of features from the library. cc @mxinden

The second largest usage seems to be json parsing, which will go away in v1.4.0 as we are using the proto for communication by default in v1.4.0.

@jakewarr8
Copy link

I had the same issue with oom crashloops and removing the addon-resizer fixed it. We have around 100+ namespaces.

@mrsiano
Copy link

mrsiano commented Oct 3, 2018

@directionless how many namespaces we are talking about ?
/cc @mrsiano

@directionless
Copy link
Author

@mrsiano 200 - 1000 per cluster.

Some conversation elsewhere thought it might be more related to my pod/node ratio. I was running at close to 100 pods/node, and folks commented that a lot of the tuning expects more like 30 pods/node.

I've since shutdown this bit of prometheus. so I can't easily test it.

@mxinden
Copy link
Contributor

mxinden commented Oct 5, 2018

For anyone interested I am currently working on a performance optimization. Current effort and container images can be found here: #534. Feedback is very welcome.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 13, 2019
@brancz
Copy link
Member

brancz commented Feb 18, 2019

@directionless we released some pretty significant improvements with v1.5.0, have you had a chance to try out that release yet? 🙂

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wrossmann
Copy link

For anyone else that lands here investigating a similar issue, it seems like a large aggregate number of any/all resource tracked by this exporter can cause it to use a fair bit of memory. The simplest way is to query for counts of metrics like kube_*, but if that's fallen out of your history you can also bump up the memory limit on the exporter and then query it directly.

In my case I learned that Helm doesn't necessarily clean up old release revisions and I had 3600+ configmaps cluttering up the cluster.

Once you get your house in order you can restart the exporter to check that its memory usage is within reason, and then bump the limit back down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants