Keep getting CPUThrottlingHigh alert on the `gc` pod #1724

budimanjojo · 2024-05-28T09:27:52Z

What happened:
After updating to v0.16.0, I keep getting CPUThrottlingHigh alert on the garbage collection pod like this:

CPUThrottlingHigh (Info)
Description: 35.71% throttling of CPU in namespace kube-system for container gc in pod node-feature-discovery-gc-696b644f9-2rwql.

What you expected to happen: Everything should be running like it used to be. I have fairly default helm values:

master:
  extraLabelNs:
    - gpu.intel.com

How to reproduce it (as minimally and precisely as possible): Use the latest v0.16.0 with the values above.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.30.0
Cloud provider or hardware configuration: baremetal
OS (e.g: cat /etc/os-release): Talos Linux
Kernel (e.g. uname -a): 6.6.29-talos
Install tools: Helm
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

budimanjojo · 2024-05-28T09:32:41Z

Maybe this line is set too low

node-feature-discovery/deployment/helm/node-feature-discovery/values.yaml

Line 533 in 560905f

cpu: 20m

Or maybe there's a bug in the garbage collection logic making it taking too many resources.

marquiz · 2024-05-28T17:20:50Z

Thanks @budimanjojo for reporting this. How big is your cluster (ca. how many nodes)?

In retrospect, setting the cpu limits might not have been that good idea. We might want to remove those (and cut a patch release) 🧐

The most immediate fix for you would probably be to remove the cpu limits, i.e. do Helm install with --set gc.resources.limits.cpu=null

budimanjojo · 2024-05-29T03:53:21Z

Hi @marquiz!
I have a 3 nodes cluster so it's a pretty small one.

Yeah I agree with having no CPU limits set at least in the gc pod by default. Should I open a PR or I'll just wait?

marquiz · 2024-05-29T11:55:55Z

I have a 3 nodes cluster so it's a pretty small one.

OK, not a huge one, then. 😅 Looks like we need to investigate that a bit further 🤔

Yeah I agree with having no CPU limits set at least in the gc pod by default. Should I open a PR or I'll just wait?

Please do, more contributors -> better 😊 Let's remove cpu limits for all daemons. Also, we need to update the tables of parameters in docs/deployment/helm.md, accordingly (for the defaults)

budimanjojo · 2024-05-30T05:25:39Z

@marquiz I just created the PR, please take a look. I removed CPU limits for all daemons instead of just the garbage collection pod according to your recommendation.

budimanjojo added the kind/bug Categorizes issue or PR as related to a bug. label May 28, 2024

budimanjojo mentioned this issue May 30, 2024

helm: remove defaults CPU limits #1728

Merged

k8s-ci-robot closed this as completed in #1728 Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep getting CPUThrottlingHigh alert on the `gc` pod #1724

Keep getting CPUThrottlingHigh alert on the `gc` pod #1724

budimanjojo commented May 28, 2024

budimanjojo commented May 28, 2024

marquiz commented May 28, 2024

budimanjojo commented May 29, 2024

marquiz commented May 29, 2024

budimanjojo commented May 30, 2024

Keep getting CPUThrottlingHigh alert on the gc pod #1724

Keep getting CPUThrottlingHigh alert on the gc pod #1724

Comments

budimanjojo commented May 28, 2024

budimanjojo commented May 28, 2024

marquiz commented May 28, 2024

budimanjojo commented May 29, 2024

marquiz commented May 29, 2024

budimanjojo commented May 30, 2024

Keep getting CPUThrottlingHigh alert on the `gc` pod #1724

Keep getting CPUThrottlingHigh alert on the `gc` pod #1724