Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,ui: CPU count metric is incorrect when using containers/orchestration/k8s #34988

Closed
robert-s-lee opened this issue Feb 15, 2019 · 8 comments
Assignees
Labels
A-kv-server Relating to the KV-level RPC server A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation

Comments

@robert-s-lee
Copy link
Contributor

Describe the problem

using Kubernetes on a server with 64 virtual CPUs and resource limits as below

resources:
  limits:
    cpu: "4"
    memory: 8Gi
  requests:
    cpu: "4"
    memory: 8Gi

Admin UI reports 64 CPUs instead of 4 CPUs.

To Reproduce

Using
CockroachDB 2.1.4
Kubernets 1.13.2

Expected behavior

is this expected behavior?

@gigatexal
Copy link

  1 # Please edit the object below. Lines beginning with a '#' will be ignored,
  2 # and an empty file will abort the edit. If an error occurs while saving this file will be
  3 # reopened with the relevant failures.
  4 #
  5 apiVersion: apps/v1
  6 kind: StatefulSet
  7 metadata:
  8   annotations:
  9     kubectl.kubernetes.io/last-applied-configuration: |
 10       {"apiVersion":"apps/v1beta1","kind":"StatefulSet","metadata":{"annotations":{},"name":"cockroachdb","namespace":"alex-narayan"},"spec":{"podManagementPolicy":"Parallel","replicas":3,"serviceName":"cockroachdb","template":{"metadata":{"labels":{"app":"cockroachdb"}},"spec":{"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution"    :[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["cockroachdb"]}]},"topologyKey":"kubernetes.io/hostname"},"weight":100}]}},"containers":[{"command":["/bin/bash","-ecx","exec /cockroach/cockroach start --logtostderr --insecure --advertise-host $(hostname -f) --http-addr 0.0.0.0 --join cockroachdb-0.cockroachdb    ,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%"],"env":[{"name":"COCKROACH_CHANNEL","value":"kubernetes-insecure"}],"image":"gigatexal/cockroachdb:12feb2019","imagePullPolicy":"IfNotPresent","livenessProbe":{"httpGet":{"path":"/health","port":"http"},"initialDelaySeconds":30,"periodSeconds":5},"name":"cockroachdb","port    s":[{"containerPort":26257,"name":"grpc"},{"containerPort":8080,"name":"http"}],"readinessProbe":{"failureThreshold":2,"httpGet":{"path":"/health?ready=1","port":"http"},"initialDelaySeconds":10,"periodSeconds":5},"resources":{"limits":{"memory":"8Gi"},"requests":{"cpu":"16","memory":"8Gi"}},"volumeMounts":[{"mountPath":"/cockroach/cockroach-data","name":"da    tadir"}]}],"terminationGracePeriodSeconds":60,"volumes":[{"name":"datadir","persistentVolumeClaim":{"claimName":"datadir"}}]}},"updateStrategy":{"type":"RollingUpdate"},"volumeClaimTemplates":[{"metadata":{"name":"datadir"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}}}}]}}
 11   creationTimestamp: "2019-02-12T13:39:06Z"
 12   generation: 4
 13   labels:
 14     app: cockroachdb
 15   name: cockroachdb
 16   namespace: alex-narayan
 17   resourceVersion: "56871940"
 18   selfLink: /apis/apps/v1/namespaces/alex-narayan/statefulsets/cockroachdb
 19   uid: 91b2f662-2ecb-11e9-ba0f-0cc47ab04802
 20 spec:
 21   podManagementPolicy: Parallel
 22   replicas: 3
 23   revisionHistoryLimit: 10
 24   selector:
 25     matchLabels:
 26       app: cockroachdb
 27   serviceName: cockroachdb
 28   template:
 29     metadata:
 30       creationTimestamp: null
 31       labels:
 32         app: cockroachdb
 33     spec:
 34       affinity:
 35         podAntiAffinity:
 36           preferredDuringSchedulingIgnoredDuringExecution:
 37           - podAffinityTerm:
 38               labelSelector:
 39                 matchExpressions:
 40                 - key: app
 41                   operator: In
 42                   values:
 43                   - cockroachdb
 44               topologyKey: kubernetes.io/hostname
 45             weight: 100
 46       containers:
 47       - command:
 48         - /bin/bash
 49         - -ecx
 50         - exec /cockroach/cockroach start --logtostderr --insecure --advertise-host
 51           $(hostname -f) --http-addr 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb
 52           --cache 25% --max-sql-memory 25%
 53         env:
 54         - name: COCKROACH_CHANNEL
 55           value: kubernetes-insecure
 56         image: gigatexal/cockroachdb:15feb2019
 57         imagePullPolicy: IfNotPresent
 58         livenessProbe:
 59           failureThreshold: 3
 60           httpGet:
 61             path: /health
 62             port: http
 63             scheme: HTTP
 64           initialDelaySeconds: 30
 65           periodSeconds: 5
-- VISUAL LINE --
  1 # Please edit the object below. Lines beginning with a '#' will be ignored,
  2 # and an empty file will abort the edit. If an error occurs while saving this file will be
  3 # reopened with the relevant failures.
  4 #
  5 apiVersion: apps/v1
  6 kind: StatefulSet
  7 metadata:
  8   annotations:
  9     kubectl.kubernetes.io/last-applied-configuration: |
 10       {"apiVersion":"apps/v1beta1","kind":"StatefulSet","metadata":{"annotations":{},"name":"cockroachdb","namespace":"alex-narayan"},"spec":{"podManagementPolicy":"Parallel","replicas":3,"serviceName":"cockroachdb","template":{"metadata":{"labels":{"app":"cockroachdb"}},"spec":{"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution"    :[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["cockroachdb"]}]},"topologyKey":"kubernetes.io/hostname"},"weight":100}]}},"containers":[{"command":["/bin/bash","-ecx","exec /cockroach/cockroach start --logtostderr --insecure --advertise-host $(hostname -f) --http-addr 0.0.0.0 --join cockroachdb-0.cockroachdb    ,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%"],"env":[{"name":"COCKROACH_CHANNEL","value":"kubernetes-insecure"}],"image":"gigatexal/cockroachdb:12feb2019","imagePullPolicy":"IfNotPresent","livenessProbe":{"httpGet":{"path":"/health","port":"http"},"initialDelaySeconds":30,"periodSeconds":5},"name":"cockroachdb","port    s":[{"containerPort":26257,"name":"grpc"},{"containerPort":8080,"name":"http"}],"readinessProbe":{"failureThreshold":2,"httpGet":{"path":"/health?ready=1","port":"http"},"initialDelaySeconds":10,"periodSeconds":5},"resources":{"limits":{"memory":"8Gi"},"requests":{"cpu":"16","memory":"8Gi"}},"volumeMounts":[{"mountPath":"/cockroach/cockroach-data","name":"da    tadir"}]}],"terminationGracePeriodSeconds":60,"volumes":[{"name":"datadir","persistentVolumeClaim":{"claimName":"datadir"}}]}},"updateStrategy":{"type":"RollingUpdate"},"volumeClaimTemplates":[{"metadata":{"name":"datadir"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}}}}]}}
 11   creationTimestamp: "2019-02-12T13:39:06Z"
 12   generation: 4
 13   labels:
 14     app: cockroachdb
 15   name: cockroachdb
 16   namespace: alex-narayan
 17   resourceVersion: "56871940"
 18   selfLink: /apis/apps/v1/namespaces/alex-narayan/statefulsets/cockroachdb
 19   uid: 91b2f662-2ecb-11e9-ba0f-0cc47ab04802
 20 spec:
 21   podManagementPolicy: Parallel
 22   replicas: 3
 23   revisionHistoryLimit: 10
 24   selector:
 25     matchLabels:
 26       app: cockroachdb
 27   serviceName: cockroachdb
 28   template:
 29     metadata:
 30       creationTimestamp: null
 31       labels:
 32         app: cockroachdb
 33     spec:
 34       affinity:
 35         podAntiAffinity:
 36           preferredDuringSchedulingIgnoredDuringExecution:
 37           - podAffinityTerm:
 38               labelSelector:
 39                 matchExpressions:
 40                 - key: app
 41                   operator: In
 42                   values:
 43                   - cockroachdb
 44               topologyKey: kubernetes.io/hostname
 45             weight: 100
 46       containers:
 47       - command:
 48         - /bin/bash
 49         - -ecx
 50         - exec /cockroach/cockroach start --logtostderr --insecure --advertise-host
 51           $(hostname -f) --http-addr 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb
 52           --cache 25% --max-sql-memory 25%
 53         env:
 54         - name: COCKROACH_CHANNEL
 55           value: kubernetes-insecure
 56         image: gigatexal/cockroachdb:15feb2019
 57         imagePullPolicy: IfNotPresent
 58         livenessProbe:
 59           failureThreshold: 3
 60           httpGet:
 61             path: /health
 62             port: http
 63             scheme: HTTP
 64           initialDelaySeconds: 30
 65           periodSeconds: 5

@piyush-singh piyush-singh self-assigned this Feb 15, 2019
@piyush-singh piyush-singh added the A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. label Feb 15, 2019
@piyush-singh
Copy link

cc @vilterp and @celiala - we haven't yet extensively tested these metrics in containerized environments, this is a good real world use case to see if we are handling this correctly.

@awoods187 awoods187 added the C-investigation Further steps needed to qualify. C-label will change. label Mar 6, 2019
@knz
Copy link
Contributor

knz commented Jun 23, 2020

@bdarnell reports separately:

We've paid more attention to memory than to CPU because it's used to derive cache size. But it's unsurprising that the library we're using here is not container aware for CPU since it's not for memory either

It's likely a bug in our upstream elastic/gosigar dependency. This is probably a KV/storage problem than a UI problem.

@lunevalex @petermattis how do you propose to triage this?

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-server Relating to the KV-level RPC server and removed C-investigation Further steps needed to qualify. C-label will change. labels Jun 23, 2020
@knz
Copy link
Contributor

knz commented Jun 23, 2020

@piyush-singh @johnrk I also guess you'd need to review the priority of this together now.

@knz knz changed the title ui: verify and document CPU count metric w/ Kubernetes server,ui: CPU count metric is incorrect when using containers/orchestration/k8s Jun 23, 2020
@ajwerner
Copy link
Contributor

It's not just that the metric is incorrect, it's that GOMAXPROCS is set incorrectly. There was a weak attempt by me to adopt automaxprocs earlier in the year that I should revive. A mismatch between the GOMAXPROCS value and the number of cores as constrainted by cgroups can have disastrous consequences for performance.

See #44880 and cockroachdb/docs#5922

@florence-crl
Copy link

florence-crl commented Jun 30, 2020

Clarification after discussing with @ajwerner:
If a user using k8s with VMs with 16 CPUs allocates 4 CPUs to cockroachdb and then sets GOMAXPROCS to 4, you should expect the AdminUI to report 16 for CPUs. Setting GOMAXPROCS to 4 is for cockroachdb performance not for reporting.

craig bot pushed a commit that referenced this issue Jul 16, 2020
50981: cli: add support for userfile upload CLI command r=adityamaru a=adityamaru

This change adds a CLI command allowing users to upload local files to
the user scoped file table ExternalStorage.  The command 
`userfile upload` uses the existing COPY protocol (similar to `nodelocal upload`)
to upload files and write them to the UserFileTableSystem. The
UserFileTableSystem is backed by two SQL tables which are currently
always created with `defaultdb.public.user` as the prefix of the
qualified name.  In the future we may allow users to specify the
`db.schema` they wish to store their tables in.

The command takes a source and destination path. The former is used to
find the file contents locally, the latter is used to reference the file
and its related metadata/payload in the SQL tables.

Known limitations:
- All destination paths must start with `/`, this is to help us
  disambiguate filepath from `db.schema` name when we allow users to
specify that in the future.

- Destination paths must not have a `..` in them. Since the
  UserFilTableSystem is not a "real" file system, storing SQL rows with
filenames such as /test/../test.csv seems strange. We will work on
enforcing a better naming scheme.

Informs: #47211

Release note (cli change): Adds a userfile upload command that can be
used to upload a file to the user scoped blob storage: `userfile upload
source/file /destination/of/file`

51392: build/deploy: add GEOS libraries to CRDB Docker builds r=jlinder a=otan

Now that we have the GEOS libraries being built, it's time we copy them
into the right place in the Docker container such that users can import
geospatial features out of the box.

The bless release script will also copy these files over.

Release note (general change): The Docker container that ships with
CockroachDB now includes the GEOS library needed for geospatial
functionality in `/usr/local/lib/cockroach` (which is the default
location of where the cockroach binary looks for the GEOS libraries).

51443: opt: improve geo func costing r=otan a=mjibson

Release note: None

51444: builtins: implement ST_Disjoint r=rytaft a=otan

Resolves #48919.

Release note (sql change): Implements the ST_Disjoint builtin for
geometry types.

51471: cloud: Respect Kubernetes resource limits r=bobvawter a=bobvawter

Detecting the number of available CPU cores and memory limits from within a
container can be inaccurate.  This change passes the CPU and memory limits used
by the Kubernetes scheduler into the cockroach process.

In the absense of a limits block (e.g. when using BestEffort QOS in a
dev/staging environment), the scheduler will substitute the maximum value that
is appropriate for the node on which the container is scheduled.

This change can be applied retroactively to existing deployments and is not
specific to a particular version of CockroachDB.

The cockroach start command is broken onto multiple lines to improve
readability.

See also: #34988

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Bob Vawter <[email protected]>
@knz
Copy link
Contributor

knz commented Jan 18, 2021

@itsbilal we have this issue in the backlog; did your latest changes fix this?

@itsbilal
Copy link
Contributor

itsbilal commented Apr 8, 2021

Yes - partly. CPU limits are now accounted for in charts and metrics. CPU requests are harder (if even possible) to account for in Cockroach, and cockroachdb/docs#9001 goes into why.

@itsbilal itsbilal closed this as completed Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-server Relating to the KV-level RPC server A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation
Projects
None yet
Development

No branches or pull requests

10 participants