-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes / containers: Advise adjusting GOMAXPROCS based on cpu requests/limits #9001
Comments
Assigning @taroface for now, feel free to re-assign as appropriate |
Discussed offline - @itsbilal to fish out how this is computed in CC, as an example best practice. |
Had a conversation with CC engineers, seems like I was mistaken on CC adjusting for this. cockroachdb/cockroach#57390 is what they expect Cockroach to do in this situation, so I've gone ahead and implemented it. The advice outlined in this issue would still be necessary for all users of containers, especially those that use cpu weights/shares. |
Having a GOMAXPROCS value defaulting to the Go default (numCPUs) is inefficient if there's a lower CPU limit set for the cgroup the cockroach process is in. The Go runtime could be scheduling multiple OS-level threads, not all of which would get to run concurrently. This change sees if the GOMAXPROCS env variable was set, denoting an overriden value. If it isn't set, and we're inside a cgroup, the value of GOMAXPROCS is now lowered to the maximum This is an implementation of part of the advice given in cockroachdb/docs#9001 . The CPU requests/shares part of that equation cannot be implemented in Cockroach, so the docs issue remains unchanged. Release note (general change): Run fewer threads in parallel if running inside a container with a CPU limit.
Having a GOMAXPROCS value defaulting to the Go default (numCPUs) is inefficient if there's a lower CPU limit set for the cgroup the cockroach process is in. The Go runtime could be scheduling multiple OS-level threads, not all of which would get to run concurrently. This change sees if the GOMAXPROCS env variable was set, denoting an overriden value. If it isn't set, and we're inside a cgroup, the value of GOMAXPROCS is now lowered to the maximum This is an implementation of part of the advice given in cockroachdb/docs#9001 . The CPU requests/shares part of that equation cannot be implemented in Cockroach, so the docs issue remains unchanged. Release note (general change): Run fewer threads in parallel if running inside a container with a CPU limit.
Having a GOMAXPROCS value defaulting to the Go default (numCPUs) is inefficient if there's a lower CPU limit set for the cgroup the cockroach process is in. The Go runtime could be scheduling multiple OS-level threads, not all of which would get to run concurrently. This change sees if the GOMAXPROCS env variable was set, denoting an overriden value. If it isn't set, and we're inside a cgroup, the value of GOMAXPROCS is now lowered to the maximum This is an implementation of part of the advice given in cockroachdb/docs#9001 . The CPU requests/shares part of that equation cannot be implemented in Cockroach, so the docs issue remains unchanged. Release note (general change): Run fewer threads in parallel if running inside a container with a CPU limit.
55829: kvserver: improve logging of draining lease transfer failures r=andreimatei a=andreimatei The respective log message now tells you more about why the transfer failed; in particular it tells you when no transfer targets were found. Release note: None 56468: server: include more heap info in runtime stats r=andreimatei a=andreimatei This patch includes a few more memory stats in the runtime infor that we log periodically. This is to attempt to clarify the relationship between the go heap and the go "total" more. For example, currently you see lines like this: runtime stats: 13 GiB RSS, 544 goroutines, 5.3 GiB/3.6 GiB/8.2 GiB GO alloc/idle/total The relationship between the idle and the total is ambiguous; some part of idle is included, but another part (HeapReleased) was already released to the OS and not included. By printing more, we'll hopefully disambiguate, as well as get info on Go heap "fragmentation" - space reserved in the heap spans but not currently in use. Release note: None 57295: sql: support regr_avgx() r=yuzefovich a=obahareth Support aggregate function `regr_avgx()`. It referred to the PostgreSQL implementation, but the implementation details follow SQL 2003. See: postgresql.org/docs/10/functions-aggregate.html See #41274 57390: cli: Set GOMAXPROCS on start if in a CPU-limited cgroup r=knz a=itsbilal Having a GOMAXPROCS value defaulting to the Go default (numCPUs) is inefficient if there's a lower CPU limit set for the cgroup the cockroach process is in. The Go runtime could be scheduling multiple OS-level threads, not all of which would get to run concurrently. This change sees if the GOMAXPROCS env variable was set, denoting an overriden value. If it isn't set, and we're inside a cgroup, the value of GOMAXPROCS is now lowered to the maximum This is an implementation of part of the advice given in cockroachdb/docs#9001 . The CPU requests/shares part of that equation cannot be implemented in Cockroach, so the docs issue remains unchanged. Release note (general change): Run fewer threads in parallel if running inside a container with a CPU limit. 57412: backupccl: modify show backup privileges r=dt a=adityamaru Previously, show backup was admin only. This change makes it in line with BACKUP which determines the need for an admin or not based on the URI and creds. Fixes: #57294 Release note (sql change): Show backup is no longer admin only. It depends on the URI construct and the creds specified in the SHOW BACKUP query. 57544: jobs: quiet logging an error when context is canceled r=ajwerner a=ajwerner This was showing up during shutdown. Presumably this indicates that we should be waiting for these jobs to terminate and is a hint that there's a graceful shutdown problem, however, we already know that we have these. Release note: None 57555: sql: fix ProducerMetadata use after Release r=RaduBerinde,yuzefovich a=asubiotto The DistSQL receiver was reading ProducerMetadata fields after releasing it back to a sync.Pool. This would cause races. This commit fixes the ordering of events and moves the release of the ProducerMetadata object outside a conditional (it was previously only released when Metrics were set). Release note: None Closes #57527 cc @RaduBerinde Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Omar Bahareth <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Alfonso Subiotto Marques <[email protected]>
Having a GOMAXPROCS value defaulting to the Go default (numCPUs) is inefficient if there's a lower CPU limit set for the cgroup the cockroach process is in. The Go runtime could be scheduling multiple OS-level threads, not all of which would get to run concurrently. This change sees if the GOMAXPROCS env variable was set, denoting an overriden value. If it isn't set, and we're inside a cgroup, the value of GOMAXPROCS is now lowered to the maximum This is an implementation of part of the advice given in cockroachdb/docs#9001 . The CPU requests/shares part of that equation cannot be implemented in Cockroach, so the docs issue remains unchanged. Release note (general change): Run fewer threads in parallel if running inside a container with a CPU limit.
Having a GOMAXPROCS value defaulting to the Go default (numCPUs) is inefficient if there's a lower CPU limit set for the cgroup the cockroach process is in. The Go runtime could be scheduling multiple OS-level threads, not all of which would get to run concurrently. This change sees if the GOMAXPROCS env variable was set, denoting an overriden value. If it isn't set, and we're inside a cgroup, the value of GOMAXPROCS is now lowered to the maximum This is an implementation of part of the advice given in cockroachdb/docs#9001 . The CPU requests/shares part of that equation cannot be implemented in Cockroach, so the docs issue remains unchanged. Release note (general change): Run fewer threads in parallel if running inside a container with a CPU limit.
Bilal Akhtar (itsbilal) commented:
Currently, our docs on using Cockroach with Kubernetes mention CPU requests and limits:
https://www.cockroachlabs.com/docs/v20.2/kubernetes-performance#resource-requests
The same knobs also exist in all cases of running Cockroach in cgroup-driven clusters (eg. Docker), but for simplicity I'm going to use k8s as an umbrella term to refer to all container-based setups with CPU resource limiting in this issue.
Our docs make little reference made to the need to adjust the GOMAXPROCS environment variable based on either setting. The Cockroach process has visibility into CPU limits when it reports CPU usage percentages (since cockroachdb/cockroach#56461), however it doesn't adjust GOMAXPROCS on its own based on CPU limits. Furthermore, the cockroach process cannot calculate the percentage of all CPU shares in the parent cgroup that it has access to, as it doesn't have visibility into the shares given to other sibling cgroup controllers / k8s pods.
If GOMAXPROCS is higher than the CPU limit set, then some goroutines will get scheduled for execution by the Go runtime but will then have to block for one of the (fewer) available cpus to be ready to execute it, resulting in degraded concurrent performance.
Similarly, for CPU requests, let's say if there are 8 CPUs on a machine, and there are 4 pods that all request the same CPU request amount, each pod would get 25% of CPUs (so ~2 cores or core-equivalent runtime) when all pods are maxing out CPU utilization. If such a workload persists for a while, and GOMAXPROCS is greater than 2, then it's possible for performance to be less ideal than if GOMAXPROCS was right at 2.
The ideal GOMAXPROCS setting is going to vary based on the workload, but it's probably a good idea to advise setting it to something like this:
The 2 * is there to allow for some burst utilization, in cases where cpu requests are much lower than cpu limits.
Jira Issue: DOC-834
The text was updated successfully, but these errors were encountered: