Worker Processes (auto) doesn't honor cgroup allocations #2948

Stono · 2018-08-16T06:55:42Z

Hey,
I've noticed that when using the default value of https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/configmap.md#worker-processes, it is getting defaulted to the underlying node vCPU count, rather than that of the container.

It might be better to do a calculation based on the cgroup shares, and if that's not set, fall back to the node cpu count.

aledbf · 2018-08-17T20:49:48Z

@Stono I found this nginxinc/docker-nginx#31 (comment)
Can you check if $(cat /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu | wc -w) returns the correct value?

I am working on a more idiomatic fix

Stono · 2018-08-23T20:52:27Z

Hey
Sorry for the delay, it's been a mad few days.

Interestingly:

www-data@ingress-nginx-internal-controller-5b597b4589-95dbl:/etc/nginx$ cat /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu | wc -w
12

That's returning the node value.

I'm setting the limits like this:

          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi

Strangely the memory cgroup is fine:

www-data@ingress-nginx-internal-controller-5b597b4589-95dbl:/etc/nginx$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
1073741824

I think maybe you're looking for this:

www-data@ingress-nginx-internal-controller-5b597b4589-95dbl:/etc/nginx$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
50000

aledbf · 2018-08-23T21:02:28Z

@Stono please post the output of cat /sys/fs/cgroup/cpu/cpu.cfs_period_us

Stono · 2018-08-23T21:11:04Z

pretty useless really:

www-data@ingress-nginx-internal-controller-844fc8dcf5-845k6:/etc/nginx$ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000

/sys/fs/cgroup/cpu/cpu.cfs_quota_us is the one which returns the cpu shares from limits *100

aledbf · 2018-08-23T21:15:55Z

round_up(/sys/fs/cgroup/cpu/cpu.cfs_period_us / /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
100000 / 50000 = 2 (cpus)

This should be used only if /sys/fs/cgroup/cpu/cpu.cfs_period_us > -1

Stono · 2018-08-23T21:17:19Z

That wouldn't be right though, as I've only given it 0.5?

…

On Thu, 23 Aug 2018, 10:16 pm Manuel Alejandro de Brito Fontes, < ***@***.***> wrote: round_up(/sys/fs/cgroup/cpu/cpu.cfs_period_us / /sys/fs/cgroup/cpu/cpu.cfs_quota_us) 100000 / 50000 = 2 (cpus) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2948 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABavieF9N1W_uZEWXkdS5kd2ExpA8wpHks5uTxuSgaJpZM4V_TKu> .

aledbf · 2018-08-23T23:56:49Z

@Stono actually it's right https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt

docker run -ti --rm --cpu-quota=1000000 --cpu-period=500000 -v $PWD:/app 89c251c1a806 /app/main
Period: 500000
Quota: 1000000
Cores: 2

package main

import (
	"fmt"
	"io/ioutil"
	"math"
	"path/filepath"
	"runtime"
	"strconv"
	"strings"

	libcontainercgroups "github.com/opencontainers/runc/libcontainer/cgroups"
)

func main() {
	fmt.Printf("Cores: %v\n", numCPU())
}

func numCPU() int {
	cpus := runtime.NumCPU()

	cgroupPath, err := libcontainercgroups.FindCgroupMountpoint("cpu")
	if err != nil {
		return cpus
	}

	cpuQuota := readCgroupFileToInt64(cgroupPath, "cpu.cfs_quota_us")
	cpuPeriod := readCgroupFileToInt64(cgroupPath, "cpu.cfs_period_us")

	fmt.Printf("Period: %v\n", cpuPeriod)
	fmt.Printf("Quota: %v\n", cpuQuota)

	if cpuQuota == -1 || cpuPeriod == -1 {
		return cpus
	}

	return int(math.Ceil(float64(cpuQuota) / float64(cpuPeriod)))
}

func readCgroupFileToInt64(cgroupPath, cgroupFile string) int64 {
	contents, err := ioutil.ReadFile(filepath.Join(cgroupPath, cgroupFile))
	if err != nil {
		return -1
	}
	strValue := strings.TrimSpace(string(contents))
	if value, err := strconv.ParseInt(strValue, 10, 64); err == nil {
		return value
	} else {
		return -1
	}
}

Stono · 2018-08-24T06:10:59Z

Hmm interesting. I'm still not sure we have got this right 😂 but I'm certainly no expert.

I'll see if there is anyone at work who can wrap their head around this

Stono · 2018-08-25T19:17:57Z

@aledbf so I've been going off https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu:

Limits and requests for CPU resources are measured in cpu units. One cpu, in Kubernetes, is equivalent to: 1 GCP Core

So taking 1000m(illicores) to = 1 core. My limits above are 500m, or 0.5 cores, but by your calculations it's 2 cores?

aledbf · 2018-08-25T20:40:06Z

@Stono first, cgroup cpu limits != cpu cores because cgroups limits are time-based. In this case (NGINX) we just need a more suited value than real number of cores.
I've been trying to find existing calculations related to this and this one from openjdk uses the same logic I posted before
http://hg.openjdk.java.net/jdk/jdk/file/03f2bfdcb636/src/hotspot/os/linux/osContainer_linux.cpp#l499

Stono · 2018-08-25T20:43:15Z

Fair enough, like I say, I'm definitely no expert here :) I'm purely going off the Kubernetes docs which tell me that specify "1" for my cpu limit translates into "1" core.

I'll continue to hard code worker threads in the config map to match my limit setting in Kubernetes.

aledbf · 2018-08-25T20:52:01Z

I'm purely going off the Kubernetes docs which tell me that specify "1" for my cpu limit translates into "1" core.

Just in case, I cannot find any helper in the k8s codebase that converts limits to cores

mmack · 2018-08-27T08:15:12Z

Guys our problem with "auto" being the amount of cpus in a system is this:

Imagine a deployment with a 2 CPU and 1 GB of RAM limit deployed to a) a kubernetes node with 64 CPU's and b) a node with 16 CPU's.

On the big machine nginx spawns 64 processes which exceeds in total the configured limit of 1GB RAM. On the node with 16 CPU's it works...

I would love to have "auto" being the configured cpu limit or the amount of cpus if limit > $amount-of-cpus-on-this-node.

Does this makes sense to you guys?

Stono · 2018-08-27T12:20:24Z

Not sure what you mean @mmack, @aledbf PR will attempt to discern a good value for the number of worker threads based on your cgroup limits, if those aren't set, it'll default to the nodes number of cpus. With the changes he's put in, a 2cpu and 1gb instance of nginx on a 64vcpu host, won't spawn 64 threads, if you set an appropriate limit (once this is merged).

I still have my reservations about the way it's being calculated however, so in my case I'm opting to set cpu limits, and then using worker-processes: "N" in my configmap to control the number of worker processes.

Either way, once this PR is merged it'll be a lot better.

dougnd · 2018-10-12T21:34:28Z

For future readers trying to follow the logic (as I just did). The code change looks correct (to me) and does

return int(math.Ceil(float64(cpuQuota) / float64(cpuPeriod)))

Where as a comment had a typo and said:

round_up(/sys/fs/cgroup/cpu/cpu.cfs_period_us / /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
100000 / 50000 = 2 (cpus)

Instead it probably should have been

round_up( /sys/fs/cgroup/cpu/cpu.cfs_quota_us / /sys/fs/cgroup/cpu/cpu.cfs_period_us)
round_up(50000 /100000) = round_up(0.5) = 1.0 (cpus)`

discordianfish · 2019-06-19T13:55:03Z

I think the worker processes should be based on the resource requests, not the cgroup limits.
For latency critical components like nginx-ingress, the cpu limit needs to be set to something that is never reached, otherwise the process gets throttled and some requests can be delayed by seconds

@aledbf Thoughts? Should I fill a new issue for that?
Alternatively you could tweak the worker calculation to account for some headroom.

aledbf · 2019-06-19T14:00:48Z

For latency critical components like nginx-ingress, the cpu limit needs to be set to something that is never reached,

Agree. That's why you should not set any limits when you start using the ingress controller, at least for a day, where you can see exactly the resource utilization you have with your apps.

resource requests, not the cgroup limits.

What do you mean?

discordianfish · 2019-06-20T16:39:35Z

resource requests, not the cgroup limits.

What do you mean?

Right now this runs as many workers as the the limit of cpus it can use. If all workers are utilized 100%, every bit more cpu it would use (due to the go process etc) would cause some throttling.

So instead it would be nice it if uses the resource requests to calculate the number of workers. That's something you set to a value you ideally want to utilize. Does that make sense?
But I guess it's easy enough to set it yourself using the downward api.

embano1 · 2019-07-31T08:26:33Z

Sorry for the late pickup, but I agree with @discordianfish on (1) recommending to set requests for ingress (I talked about it here) and (2) using requests.cpu for the worker thread calculation. Especially since the community, based on experiences from Google Borg, Zalando et al, moves towards disabling limits altogether (e.g. globally on the kubelet via --cpu-cfs-quota=false). This could render PR #2990 ineffective.

Apparently using auto doesn't work well with docker: nginxinc/docker-nginx#31

aledbf mentioned this issue Aug 25, 2018

Check if cgroup cpu limits are defined to get the number of CPUs #2990

Merged

k8s-ci-robot closed this as completed in #2990 Aug 29, 2018

CHBMB mentioned this issue Jun 3, 2019

Change worker_processes value to auto linuxserver/docker-letsencrypt#318

Closed

pdbrito referenced this issue in pdbrito/laravel-cloud-run Sep 18, 2019

Set worker_processes to 1

0c95cf7

Apparently using auto doesn't work well with docker: nginxinc/docker-nginx#31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker Processes (auto) doesn't honor cgroup allocations #2948

Worker Processes (auto) doesn't honor cgroup allocations #2948

Stono commented Aug 16, 2018

aledbf commented Aug 17, 2018

Stono commented Aug 23, 2018 •

edited

Loading

aledbf commented Aug 23, 2018

Stono commented Aug 23, 2018

aledbf commented Aug 23, 2018 •

edited

Loading

Stono commented Aug 23, 2018 via email

aledbf commented Aug 23, 2018 •

edited

Loading

Stono commented Aug 24, 2018

Stono commented Aug 25, 2018 •

edited

Loading

aledbf commented Aug 25, 2018

Stono commented Aug 25, 2018

aledbf commented Aug 25, 2018

mmack commented Aug 27, 2018

Stono commented Aug 27, 2018

dougnd commented Oct 12, 2018

discordianfish commented Jun 19, 2019

aledbf commented Jun 19, 2019

discordianfish commented Jun 20, 2019 •

edited

Loading

embano1 commented Jul 31, 2019

Worker Processes (auto) doesn't honor cgroup allocations #2948

Worker Processes (auto) doesn't honor cgroup allocations #2948

Comments

Stono commented Aug 16, 2018

aledbf commented Aug 17, 2018

Stono commented Aug 23, 2018 • edited Loading

aledbf commented Aug 23, 2018

Stono commented Aug 23, 2018

aledbf commented Aug 23, 2018 • edited Loading

Stono commented Aug 23, 2018 via email

aledbf commented Aug 23, 2018 • edited Loading

Stono commented Aug 24, 2018

Stono commented Aug 25, 2018 • edited Loading

aledbf commented Aug 25, 2018

Stono commented Aug 25, 2018

aledbf commented Aug 25, 2018

mmack commented Aug 27, 2018

Stono commented Aug 27, 2018

dougnd commented Oct 12, 2018

discordianfish commented Jun 19, 2019

aledbf commented Jun 19, 2019

discordianfish commented Jun 20, 2019 • edited Loading

embano1 commented Jul 31, 2019

Stono commented Aug 23, 2018 •

edited

Loading

aledbf commented Aug 23, 2018 •

edited

Loading

aledbf commented Aug 23, 2018 •

edited

Loading

Stono commented Aug 25, 2018 •

edited

Loading

discordianfish commented Jun 20, 2019 •

edited

Loading