-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud: Respect Kubernetes resource limits #51471
Conversation
Detecting the number of available CPU cores and memory limits from within a container can be inaccurate. This change passes the CPU and memory limits used by the Kubernetes scheduler into the cockroach process. In the absense of a limits block (e.g. when using BestEffort QOS in a dev/staging environment), the scheduler will substitute the maximum value that is appropriate for the node on which the container is scheduled. This change can be applied retroactively to existing deployments and is not specific to a particular version of CockroachDB. The cockroach start command is broken onto multiple lines to improve readability. See also: #34988 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally OK with this approach - it brings us closer to a better situation.
However if I understand correctly (Andrew correct me if I'm wrong), setting GOMAXPROC correctly is only half of the solution. There's another part: CockroachDB internally inspects its OS environment to determine the number of cores available; this is used in the UI to display CPU utilization and other things. This inspection is independent/orthogonal from GOMAXPROCS; we also know that the code is currently wrong/problematic with k8s/docker.
NB: this inspection is currently performed via the upstream dep go-sigar
. IF there's somethign wrong, it's likely in there. We can evaluate the other dependency gopsutil
and check whether it has a better code to detect available cores as per orchestration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I have relates to using requests vs limits. Say a user sets a request for 4 cores but no limit. As I understand it, this is valid and will prevent the host from setting the cgroup which limits the total number of cores which may be utilized. IIRC this is done by some of our customers and may have been a recommendation at some point. Should the expression perhaps look at requests instead of limits?
However if I understand correctly (Andrew correct me if I'm wrong), setting GOMAXPROC correctly is only half of the solution. There's another part: CockroachDB internally inspects its OS environment to determine the number of cores available; this is used in the UI to display CPU utilization and other things. This inspection is independent/orthogonal from GOMAXPROCS; we also know that the code is currently wrong/problematic with k8s/docker.
NB: this inspection is currently performed via the upstream dep go-sigar. IF there's somethign wrong, it's likely in there. We can evaluate the other dependency gopsutil and check whether it has a better code to detect available cores as per orchestration.
You are correct, this does not touch the number of cores we believe the host to have. The CPU usage normalization will be off and so will the core count displayed in the admin UI.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)
I looked at using requests originally. If you don't specify a requests section, you get 1 core and 0 memory for the injected values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value of limits do you get if you don't specify a limits section? Perhaps we want to do something like take the max?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)
If no limits are specified, you get the maximum number of cores and amount of memory that the particular kubelet is configured to provide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm torn about the right thing to do here. Imagine (for whatever ghastly reason), somebody was running lots of cockroach instances (like our upcoming sql pods) in a k8s cluster. Imagine each instance is requests 1 vCPU but has no limits set. Say each of the VMs are actually 72 core VMs. Do we want the GOMAXPROCS to be 1 or 72?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced. Thanks for walking through the thought experiment with me.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)
cloud/kubernetes/cockroachdb-statefulset-secure.yaml, line 214 at r1 (raw file):
Quoted 9 lines of code…
# limits: # NOTE: Unless you have enabled the non-default Static CPU Management Policy # and are using an integer number of CPUs, we don't recommend setting a CPU limit. # See: # https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy # https://github.com/kubernetes/kubernetes/issues/51135 # cpu: "16" # memory: "8Gi"
Not for now but my sense is that we should really provide more ram/core in our recommendations...
I agree with the "let's be pedantic" part, unfortunately what's implemented currently is unlikely to be what the operator has requested :) We have two concrete problems:
We can't really say "we're addressing the problems" without covering grounds on both points simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anybody is claiming that this at all touches the second point. The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)
Ok then we can defer that conversation to later then. @piyush-singh can you keep track of this. Still, coming back to GOMAXPROCS: I think the config should always set a limit, and have GOMAXPROCS set to that limit. Then if the user sees GOMAXPROCS set to an extremely large value, they will find out on their own that they didn't set a limit properly, and get the idea to set one. Regarding the other issue (which we're not look at today):
We have cgroup-specific code in some places. IF that's what it takes, we can add more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.
We have cgroup-specific code in some places. IF that's what it takes, we can add more.
Absolutely. I've dropped the ball here. This library https://github.com/uber-go/automaxprocs has logic to read the cgroups and I began to look at adopting it in #44880 but never followed through.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean to say is that with that library as a dependency, we can use it also to determine the core count (though the interface to that library leaves much to be desired). Not that the linked PR above does any such thing. I'll create an issue there to expose the value it reads.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uber-go/automaxprocs#29 (comment)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see that GOMAXPROCS is set to the actual container limit. This aligns with my first point (I had misunderstood earlier). Thanks!
bors r+ |
bors retry |
Build failed |
bors retry |
Build succeeded |
56329: cloud: update resource requests/limits in StatefulSet configs r=taroface a=taroface Our various StatefulSet configs had a placeholder `resources.requests` block with guidelines that became outdated with #51471. The guidelines are now updated. Thanks to @tim-o for pointing this out. Release note: none Co-authored-by: taroface <[email protected]>
Detecting the number of available CPU cores and memory limits from within a
container can be inaccurate. This change passes the CPU and memory limits used
by the Kubernetes scheduler into the cockroach process.
In the absense of a limits block (e.g. when using BestEffort QOS in a
dev/staging environment), the scheduler will substitute the maximum value that
is appropriate for the node on which the container is scheduled.
This change can be applied retroactively to existing deployments and is not
specific to a particular version of CockroachDB.
The cockroach start command is broken onto multiple lines to improve
readability.
See also: #34988
Release note: None