Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

func NumCPU() fails to return correct cpus when using cgroups2 #9665

Closed
sdelrio opened this issue Feb 23, 2023 · 19 comments · Fixed by #9816
Closed

func NumCPU() fails to return correct cpus when using cgroups2 #9665

sdelrio opened this issue Feb 23, 2023 · 19 comments · Fixed by #9816
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sdelrio
Copy link

sdelrio commented Feb 23, 2023

What happened:

Installing ingress-nginx helm chart into k8s 1.24 with base OS ubuntu 22 (cgroups v2) doensn't generate correct worker_process based on k8s resources, instead it get max cpu cores of the node.

  • values.yaml resourfes part
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1
              memory: 512Mi

On previous ubuntu 20 (cgroups v1) the /etc/nginx/nginx.conf file into the container got configured to

worker_processes 1;

On ubuntu 22 base os (groups v2) the /etc/nginx/nginx.conf file into the container gets configured all the posible cores in the node (64 in my case),

worker_process 64;

What you expected to happen:

  • /etc/nginx/nginx.conf:
 worker_process 1;

I think the way is calculated only works for cgroup v1 and not for cgroup v2 returning max cpus when not finding cgrup v1 files:

https://github.com/kubernetes/ingress-nginx/blob/main/pkg/util/runtime/cpu_linux.go#L38-L54

func NumCPU() uses cgroup v1 paths to calculate:

  • /sys/fs/cgroup/cpu/cpu.cfs_quota_us
  • /sys/fs/cgroup/cpu/cpu.cfs_period_us
    In cgroup v2, this was replaced by one file containing the previous 2 values from v1
  • /sys/fs/cgroup/cpu.max

NGINX Ingress controller version ):

NGINX Ingress controller
  Release:       v1.6.4
  Build:         69e8833858fb6bda12a44990f1d5eaa7b13f4b75
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6
-------------------------------------------------------------------------------

Kubernetes version: v1.24.6

Environment:

  • Cloud provider or hardware configuration: OnPremise
  • OS: Ubuntu 22.04
  • Kernel (e.g. uname -a): Linux worker-4 5.15.0-60-generic
  • Install tools:
    • kubespray v2.10 k8s
  • Basic cluster related info:
    • kubectl version 1.24.6

How to reproduce this issue:
nginx controller being executed in any node with cgroups 2 and limited by 1 CPU

@sdelrio sdelrio added the kind/bug Categorizes issue or PR as related to a bug. label Feb 23, 2023
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 23, 2023
@bmv126
Copy link

bmv126 commented Feb 24, 2023

From gke 1.26 cgroupsv2 will be made default. So this is an important fix that needs to be done.

@longwuyuan
Copy link
Contributor

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 24, 2023
@nickorlow
Copy link
Contributor

Hi maintainers (cc @longwuyuan )!

I'm a CS student at the University of Texas at Austin. I'm taking a class focusing on open-source software and virtualization at the moment. One part of the course is making open source contributions. I'd like to work on this issue.

The class requires that we ask the maintainers and get an OK to work on the issue before it can be counted for credit in the class. Would it be possible for me and my partner in the class to work on this issue?

@longwuyuan
Copy link
Contributor

@nickorlow you were misinformed. OSS projects don't require seeking permission to work on issues. OSS projects welcome improvements.

@nickorlow
Copy link
Contributor

Hi! Thanks for your response @longwuyuan ! The requirement to ask the maintainers is a requirement of the class we're in.

@longwuyuan
Copy link
Contributor

No such thing but wait for comments from others i guess. Surprised you did not get info on normally how things work in communities. This is dev guide https://kubernetes.github.io/ingress-nginx/developer-guide/getting-started/

@longwuyuan
Copy link
Contributor

Assuming you are not aware, helpful things to do is talk in kubernetes.slack.com (register at slack.k8s.io) and submit a PR for your improvement code. Others review and comment and next steps are based on reviews.

@nickorlow
Copy link
Contributor

Thanks for the information! I'll ask around on the slack.

The syllabus for my class just says that we need a maintainer to comment on the issue saying that we can work in it.

@strongjz
Copy link
Member

@nickorlow I responded in slack and assigned this task to you.

@bmv126
Copy link

bmv126 commented Mar 22, 2023

As the default configuration causes all the cores to be used by the container on a cluster using cgroups2.
I feel this needs to be added in known issues section, so that users are aware of this.

@bmv126
Copy link

bmv126 commented Mar 27, 2023

@nickorlow
Are you planning to provide PR on this issue.

@nickorlow
Copy link
Contributor

Yes, should push something in the next few days

@AlenversFr
Copy link

We have currently 400+ ingress controller on newly upgraded 1.25 Azure AKS clusters waiting for a solution.
Azure brings Ubuntu 22 with cgroup v2 with AKS 1.25 !

@bmv126
Copy link

bmv126 commented Mar 30, 2023

WA is to set it in configmap
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#worker-processes

@nickorlow
Copy link
Contributor

nickorlow commented Mar 30, 2023

I'm mostly done. The documentation for cgroups 2 states that the file format for the cpu.max file is $MAX $QUOTA, however “max” for $MAX indicates no limit. In this case should we just return runtime.NumCPU()?

@nickorlow
Copy link
Contributor

@bmv126
Copy link

bmv126 commented Mar 31, 2023

Use cgroups2 if available or else fall back to cgroups1.
If both of them cannot be determined use runtime.NumCPU.

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 30, 2024
@tao12345666333
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 30, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in [SIG Network] Ingress NGINX Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.

9 participants