cloud: Respect Kubernetes resource limits #51471

bobvawter · 2020-07-15T17:33:21Z

Detecting the number of available CPU cores and memory limits from within a
container can be inaccurate. This change passes the CPU and memory limits used
by the Kubernetes scheduler into the cockroach process.

In the absense of a limits block (e.g. when using BestEffort QOS in a
dev/staging environment), the scheduler will substitute the maximum value that
is appropriate for the node on which the container is scheduled.

This change can be applied retroactively to existing deployments and is not
specific to a particular version of CockroachDB.

The cockroach start command is broken onto multiple lines to improve
readability.

See also: #34988

Release note: None

Detecting the number of available CPU cores and memory limits from within a container can be inaccurate. This change passes the CPU and memory limits used by the Kubernetes scheduler into the cockroach process. In the absense of a limits block (e.g. when using BestEffort QOS in a dev/staging environment), the scheduler will substitute the maximum value that is appropriate for the node on which the container is scheduled. This change can be applied retroactively to existing deployments and is not specific to a particular version of CockroachDB. The cockroach start command is broken onto multiple lines to improve readability. See also: #34988 Release note: None

cockroach-teamcity · 2020-07-15T17:33:28Z

This change is

knz

I am generally OK with this approach - it brings us closer to a better situation.

However if I understand correctly (Andrew correct me if I'm wrong), setting GOMAXPROC correctly is only half of the solution. There's another part: CockroachDB internally inspects its OS environment to determine the number of cores available; this is used in the UI to display CPU utilization and other things. This inspection is independent/orthogonal from GOMAXPROCS; we also know that the code is currently wrong/problematic with k8s/docker.

NB: this inspection is currently performed via the upstream dep go-sigar. IF there's somethign wrong, it's likely in there. We can evaluate the other dependency gopsutil and check whether it has a better code to detect available cores as per orchestration.

ajwerner

One question I have relates to using requests vs limits. Say a user sets a request for 4 cores but no limit. As I understand it, this is valid and will prevent the host from setting the cgroup which limits the total number of cores which may be utilized. IIRC this is done by some of our customers and may have been a recommendation at some point. Should the expression perhaps look at requests instead of limits?

However if I understand correctly (Andrew correct me if I'm wrong), setting GOMAXPROC correctly is only half of the solution. There's another part: CockroachDB internally inspects its OS environment to determine the number of cores available; this is used in the UI to display CPU utilization and other things. This inspection is independent/orthogonal from GOMAXPROCS; we also know that the code is currently wrong/problematic with k8s/docker.

NB: this inspection is currently performed via the upstream dep go-sigar. IF there's somethign wrong, it's likely in there. We can evaluate the other dependency gopsutil and check whether it has a better code to detect available cores as per orchestration.

You are correct, this does not touch the number of cores we believe the host to have. The CPU usage normalization will be off and so will the core count displayed in the admin UI.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)

bobvawter · 2020-07-15T19:37:33Z

I looked at using requests originally. If you don't specify a requests section, you get 1 core and 0 memory for the injected values.

ajwerner

What value of limits do you get if you don't specify a limits section? Perhaps we want to do something like take the max?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)

bobvawter · 2020-07-15T19:45:22Z

If no limits are specified, you get the maximum number of cores and amount of memory that the particular kubelet is configured to provide.

ajwerner

Hmm, I'm torn about the right thing to do here. Imagine (for whatever ghastly reason), somebody was running lots of cockroach instances (like our upcoming sql pods) in a k8s cluster. Imagine each instance is requests 1 vCPU but has no limits set. Say each of the VMs are actually 72 core VMs. Do we want the GOMAXPROCS to be 1 or 72?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)

bobvawter · 2020-07-15T19:53:33Z

The "no limits" case is pretty much the default behavior that you get today running cockroach on kubernetes. At least with this patch, the node-info page is going to be very explicit about what your nodes are doing. My gut says "let's be pedantic and do exactly what the operator has requested".

ajwerner

I'm convinced. Thanks for walking through the thought experiment with me.

Reviewed 4 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

cloud/kubernetes/cockroachdb-statefulset-secure.yaml, line 214 at r1 (raw file):

Quoted 9 lines of code…


        #   limits:
            # NOTE: Unless you have enabled the non-default Static CPU Management Policy
            # and are using an integer number of CPUs, we don't recommend setting a CPU limit.
            # See:
            #   https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy
            #   https://github.com/kubernetes/kubernetes/issues/51135
            #   cpu: "16"
            #   memory: "8Gi"

Not for now but my sense is that we should really provide more ram/core in our recommendations...

knz · 2020-07-15T19:58:56Z

My gut says "let's be pedantic and do exactly what the operator has requested".

I agree with the "let's be pedantic" part, unfortunately what's implemented currently is unlikely to be what the operator has requested :)

We have two concrete problems:

Go does not restrict its CPU usage to the configured value of GOMAXPROCS. The env var only restricts the amount of parallelism for goroutines running Go code; any in-flight syscall invoked by a goroutine spills over into an OS thread that is spawned in excess of GOMAXPROCS. Therefore, the effective parallelism in a running CockroachDB process is not bound by GOMAXPROCS -- it is bound to the max number of cores/vcpus constrained by the container.

In light of this, if an operator says "I want this node to use 4 cores", we must ensure that the container places an OS-level limit on CPU usage to 4 cores/vcpus, regardless of GOMAXPROCS. Of course, it's important/useful to have GOMAXPROCS align with that limit, but if there's no limit we are not doing what the operator wants.
as I pointed out earlier, we are creating an absolute UX disaster if the Admin UI and other metrics do not do reflection properly on the number of cores/vcpus. Reflection does not use GOMAXPROCS. It uses the gosigar upstream dep. This is (AFAIK) broken when CockroachDB runs inside a container. As long as this is broken, we are failing the human operator: regardless of actual resource usage by CockroachDB, they will be utterly confused (and unhappy) with the output they see in metrics / UI.

We need to either fix the dependency or switch to another mechanism that does this correctly (maybe gopsutil does).

We can't really say "we're addressing the problems" without covering grounds on both points simultaneously.

ajwerner

I don't think anybody is claiming that this at all touches the second point. The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

knz · 2020-07-15T20:09:08Z

I don't think anybody is claiming that this at all touches the second point.

Ok then we can defer that conversation to later then. @piyush-singh can you keep track of this.

Still, coming back to GOMAXPROCS: I think the config should always set a limit, and have GOMAXPROCS set to that limit. Then if the user sees GOMAXPROCS set to an extremely large value, they will find out on their own that they didn't set a limit properly, and get the idea to set one.

Regarding the other issue (which we're not look at today):

The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.

We have cgroup-specific code in some places. IF that's what it takes, we can add more.

ajwerner

The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.

We have cgroup-specific code in some places. IF that's what it takes, we can add more.

Absolutely. I've dropped the ball here. This library https://github.com/uber-go/automaxprocs has logic to read the cgroups and I began to look at adopting it in #44880 but never followed through.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

ajwerner

What I mean to say is that with that library as a dependency, we can use it also to determine the core count (though the interface to that library leaves much to be desired). Not that the linked PR above does any such thing. I'll create an issue there to expose the value it reads.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

ajwerner

uber-go/automaxprocs#29 (comment)

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

knz

I now see that GOMAXPROCS is set to the actual container limit. This aligns with my first point (I had misunderstood earlier). Thanks!

bobvawter · 2020-07-15T21:11:47Z

bors r+

bobvawter · 2020-07-16T13:35:05Z

bors retry

craig · 2020-07-16T15:07:04Z

Build failed

GitHub CI (Cockroach)

bobvawter · 2020-07-16T15:27:22Z

bors retry

craig · 2020-07-16T16:24:05Z

Build succeeded

GitHub CI (Cockroach)

@tim-o

56329: cloud: update resource requests/limits in StatefulSet configs r=taroface a=taroface Our various StatefulSet configs had a placeholder `resources.requests` block with guidelines that became outdated with #51471. The guidelines are now updated. Thanks to @tim-o for pointing this out. Release note: none Co-authored-by: taroface <[email protected]>

bobvawter requested review from knz, ajwerner and piyush-singh July 15, 2020 17:33

knz reviewed Jul 15, 2020

View reviewed changes

ajwerner reviewed Jul 15, 2020

View reviewed changes

ajwerner approved these changes Jul 15, 2020

View reviewed changes

ajwerner reviewed Jul 15, 2020

View reviewed changes

ajwerner approved these changes Jul 15, 2020

View reviewed changes

knz approved these changes Jul 15, 2020

View reviewed changes

craig bot merged commit 3ef8a89 into cockroachdb:master Jul 16, 2020

bobvawter deleted the k8s_memory branch July 16, 2020 16:24

taroface mentioned this pull request Jul 23, 2020

K8s multiregion eks cockroachdb/docs#7782

Merged

taroface mentioned this pull request Jul 30, 2020

add guidance for CPU requirements on single-cluster k8s deployment cockroachdb/docs#5922

Closed

taroface mentioned this pull request Aug 10, 2020

--max-sql-memory and --cache values should be dynamic cockroachdb/cockroach-operator#131

Closed

taroface mentioned this pull request Nov 5, 2020

cloud: update resource requests/limits in StatefulSet configs #56329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud: Respect Kubernetes resource limits #51471

cloud: Respect Kubernetes resource limits #51471

bobvawter commented Jul 15, 2020

cockroach-teamcity commented Jul 15, 2020

knz left a comment •

edited

Loading

ajwerner left a comment

bobvawter commented Jul 15, 2020

ajwerner left a comment

bobvawter commented Jul 15, 2020

ajwerner left a comment

bobvawter commented Jul 15, 2020

ajwerner left a comment

knz commented Jul 15, 2020

ajwerner left a comment

knz commented Jul 15, 2020

ajwerner left a comment

ajwerner left a comment

ajwerner left a comment

knz left a comment

bobvawter commented Jul 15, 2020

bobvawter commented Jul 16, 2020

craig bot commented Jul 16, 2020

bobvawter commented Jul 16, 2020

craig bot commented Jul 16, 2020

cloud: Respect Kubernetes resource limits #51471

cloud: Respect Kubernetes resource limits #51471

Conversation

bobvawter commented Jul 15, 2020

cockroach-teamcity commented Jul 15, 2020

knz left a comment • edited Loading

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

bobvawter commented Jul 15, 2020

ajwerner left a comment

Choose a reason for hiding this comment

bobvawter commented Jul 15, 2020

ajwerner left a comment

Choose a reason for hiding this comment

bobvawter commented Jul 15, 2020

ajwerner left a comment

Choose a reason for hiding this comment

knz commented Jul 15, 2020

ajwerner left a comment

Choose a reason for hiding this comment

knz commented Jul 15, 2020

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

bobvawter commented Jul 15, 2020

bobvawter commented Jul 16, 2020

craig bot commented Jul 16, 2020

Build failed

bobvawter commented Jul 16, 2020

craig bot commented Jul 16, 2020

Build succeeded

knz left a comment •

edited

Loading