Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Spike #1591

Closed
bitva77 opened this issue May 5, 2023 · 2 comments · Fixed by #1596
Closed

CPU Spike #1591

bitva77 opened this issue May 5, 2023 · 2 comments · Fixed by #1596
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@bitva77
Copy link

bitva77 commented May 5, 2023

/kind bug

What happened?

Upgraded EBS Addon in EKS and CPU usage of the node daemonsets spiked

image

What you expected to happen?

literally no change to happen

How to reproduce it (as minimally and precisely as possible)?

eksctl update addon --name aws-ebs-csi-driver --version latest \
  --cluster ${CLUSTER_NAME} \
  --service-account-role-arn arn:aws:iam::${IHSM_ARN}:role/AmazonEKS_EBS_CSI_DriverRole_${CLUSTER_NAME} \
  --force

To roll back to a non spiking version:

eksctl update addon --name aws-ebs-csi-driver --version v1.17.0-eksbuild.1 \
  --cluster ${CLUSTER_NAME} \
  --service-account-role-arn arn:aws:iam::${IHSM_ARN}:role/AmazonEKS_EBS_CSI_DriverRole_${CLUSTER_NAME} \
  --force
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 5, 2023
@rmak-cpi
Copy link

rmak-cpi commented May 5, 2023

Just to add some more information on this, I am actually getting CPU throttling alerts for the node-driver-registrar container.

@torredil
Copy link
Member

torredil commented May 8, 2023

Hey @bitva77 @rmak-cpi thanks for reporting this. Previously, the node-driver-registrarcontainer had 0 CPU usage as the container is mostly just idle until it receives a message:

Name:         ebs-csi-node-s28pf
Namespace:    kube-system
Labels:       app=ebs-csi-node
              app.kubernetes.io/name=aws-ebs-csi-driver
              controller-revision-hash=959db968d
              pod-template-generation=1
Annotations:  <none>
API Version:  metrics.k8s.io/v1beta1
Containers:
  Name:  ebs-plugin
  Usage:
    Cpu:     163999n
    Memory:  8984Ki
  Name:      liveness-probe
  Usage:
    Cpu:     198178n
    Memory:  3640Ki
  Name:      node-driver-registrar
  Usage:
    Cpu:     0
    Memory:  3492Ki

The increase in CPU usage is due to this recently introduced change #1570 which adds a livenessprobe check to the node-driver-registrar container. We could make this livenessprobe configurable by adding an option to enable/disable it or specify custom initialDelaySeconds and timeoutSeconds values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants