Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker Processes (auto) doesn't honor cgroup allocations #2948

Closed
Stono opened this issue Aug 16, 2018 · 19 comments · Fixed by #2990
Closed

Worker Processes (auto) doesn't honor cgroup allocations #2948

Stono opened this issue Aug 16, 2018 · 19 comments · Fixed by #2990

Comments

@Stono
Copy link
Contributor

Stono commented Aug 16, 2018

Hey,
I've noticed that when using the default value of https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/configmap.md#worker-processes, it is getting defaulted to the underlying node vCPU count, rather than that of the container.

It might be better to do a calculation based on the cgroup shares, and if that's not set, fall back to the node cpu count.

@aledbf
Copy link
Member

aledbf commented Aug 17, 2018

@Stono I found this nginxinc/docker-nginx#31 (comment)
Can you check if $(cat /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu | wc -w) returns the correct value?

I am working on a more idiomatic fix

@Stono
Copy link
Contributor Author

Stono commented Aug 23, 2018

Hey
Sorry for the delay, it's been a mad few days.

Interestingly:

www-data@ingress-nginx-internal-controller-5b597b4589-95dbl:/etc/nginx$ cat /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu | wc -w
12

That's returning the node value.

I'm setting the limits like this:

          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi

Strangely the memory cgroup is fine:

www-data@ingress-nginx-internal-controller-5b597b4589-95dbl:/etc/nginx$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
1073741824

I think maybe you're looking for this:

www-data@ingress-nginx-internal-controller-5b597b4589-95dbl:/etc/nginx$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
50000

@aledbf
Copy link
Member

aledbf commented Aug 23, 2018

@Stono please post the output of cat /sys/fs/cgroup/cpu/cpu.cfs_period_us

@Stono
Copy link
Contributor Author

Stono commented Aug 23, 2018

pretty useless really:

www-data@ingress-nginx-internal-controller-844fc8dcf5-845k6:/etc/nginx$ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000

/sys/fs/cgroup/cpu/cpu.cfs_quota_us is the one which returns the cpu shares from limits *100

@aledbf
Copy link
Member

aledbf commented Aug 23, 2018

round_up(/sys/fs/cgroup/cpu/cpu.cfs_period_us / /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
100000 / 50000 = 2 (cpus)

This should be used only if /sys/fs/cgroup/cpu/cpu.cfs_period_us > -1

@Stono
Copy link
Contributor Author

Stono commented Aug 23, 2018 via email

@aledbf
Copy link
Member

aledbf commented Aug 23, 2018

@Stono actually it's right https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt

docker run -ti --rm --cpu-quota=1000000 --cpu-period=500000 -v $PWD:/app 89c251c1a806 /app/main
Period: 500000
Quota: 1000000
Cores: 2
package main

import (
	"fmt"
	"io/ioutil"
	"math"
	"path/filepath"
	"runtime"
	"strconv"
	"strings"

	libcontainercgroups "github.com/opencontainers/runc/libcontainer/cgroups"
)

func main() {
	fmt.Printf("Cores: %v\n", numCPU())
}

func numCPU() int {
	cpus := runtime.NumCPU()

	cgroupPath, err := libcontainercgroups.FindCgroupMountpoint("cpu")
	if err != nil {
		return cpus
	}

	cpuQuota := readCgroupFileToInt64(cgroupPath, "cpu.cfs_quota_us")
	cpuPeriod := readCgroupFileToInt64(cgroupPath, "cpu.cfs_period_us")

	fmt.Printf("Period: %v\n", cpuPeriod)
	fmt.Printf("Quota: %v\n", cpuQuota)

	if cpuQuota == -1 || cpuPeriod == -1 {
		return cpus
	}

	return int(math.Ceil(float64(cpuQuota) / float64(cpuPeriod)))
}

func readCgroupFileToInt64(cgroupPath, cgroupFile string) int64 {
	contents, err := ioutil.ReadFile(filepath.Join(cgroupPath, cgroupFile))
	if err != nil {
		return -1
	}
	strValue := strings.TrimSpace(string(contents))
	if value, err := strconv.ParseInt(strValue, 10, 64); err == nil {
		return value
	} else {
		return -1
	}
}

@Stono
Copy link
Contributor Author

Stono commented Aug 24, 2018

Hmm interesting. I'm still not sure we have got this right 😂 but I'm certainly no expert.

I'll see if there is anyone at work who can wrap their head around this

@Stono
Copy link
Contributor Author

Stono commented Aug 25, 2018

@aledbf so I've been going off https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu:

Limits and requests for CPU resources are measured in cpu units. One cpu, in Kubernetes, is equivalent to: 1 GCP Core

So taking 1000m(illicores) to = 1 core. My limits above are 500m, or 0.5 cores, but by your calculations it's 2 cores?

@aledbf
Copy link
Member

aledbf commented Aug 25, 2018

@Stono first, cgroup cpu limits != cpu cores because cgroups limits are time-based. In this case (NGINX) we just need a more suited value than real number of cores.
I've been trying to find existing calculations related to this and this one from openjdk uses the same logic I posted before
http://hg.openjdk.java.net/jdk/jdk/file/03f2bfdcb636/src/hotspot/os/linux/osContainer_linux.cpp#l499

@Stono
Copy link
Contributor Author

Stono commented Aug 25, 2018

Fair enough, like I say, I'm definitely no expert here :) I'm purely going off the Kubernetes docs which tell me that specify "1" for my cpu limit translates into "1" core.

I'll continue to hard code worker threads in the config map to match my limit setting in Kubernetes.

@aledbf
Copy link
Member

aledbf commented Aug 25, 2018

I'm purely going off the Kubernetes docs which tell me that specify "1" for my cpu limit translates into "1" core.

Just in case, I cannot find any helper in the k8s codebase that converts limits to cores

@mmack
Copy link

mmack commented Aug 27, 2018

Guys our problem with "auto" being the amount of cpus in a system is this:

Imagine a deployment with a 2 CPU and 1 GB of RAM limit deployed to a) a kubernetes node with 64 CPU's and b) a node with 16 CPU's.

On the big machine nginx spawns 64 processes which exceeds in total the configured limit of 1GB RAM. On the node with 16 CPU's it works...

I would love to have "auto" being the configured cpu limit or the amount of cpus if limit > $amount-of-cpus-on-this-node.

Does this makes sense to you guys?

@Stono
Copy link
Contributor Author

Stono commented Aug 27, 2018

Not sure what you mean @mmack, @aledbf PR will attempt to discern a good value for the number of worker threads based on your cgroup limits, if those aren't set, it'll default to the nodes number of cpus. With the changes he's put in, a 2cpu and 1gb instance of nginx on a 64vcpu host, won't spawn 64 threads, if you set an appropriate limit (once this is merged).

I still have my reservations about the way it's being calculated however, so in my case I'm opting to set cpu limits, and then using worker-processes: "N" in my configmap to control the number of worker processes.

Either way, once this PR is merged it'll be a lot better.

@dougnd
Copy link

dougnd commented Oct 12, 2018

For future readers trying to follow the logic (as I just did). The code change looks correct (to me) and does

return int(math.Ceil(float64(cpuQuota) / float64(cpuPeriod)))

Where as a comment had a typo and said:

round_up(/sys/fs/cgroup/cpu/cpu.cfs_period_us / /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
100000 / 50000 = 2 (cpus)

Instead it probably should have been

round_up( /sys/fs/cgroup/cpu/cpu.cfs_quota_us / /sys/fs/cgroup/cpu/cpu.cfs_period_us)
round_up(50000 /100000) = round_up(0.5) = 1.0 (cpus)`

@discordianfish
Copy link
Contributor

I think the worker processes should be based on the resource requests, not the cgroup limits.
For latency critical components like nginx-ingress, the cpu limit needs to be set to something that is never reached, otherwise the process gets throttled and some requests can be delayed by seconds

@aledbf Thoughts? Should I fill a new issue for that?
Alternatively you could tweak the worker calculation to account for some headroom.

@aledbf
Copy link
Member

aledbf commented Jun 19, 2019

For latency critical components like nginx-ingress, the cpu limit needs to be set to something that is never reached,

Agree. That's why you should not set any limits when you start using the ingress controller, at least for a day, where you can see exactly the resource utilization you have with your apps.

resource requests, not the cgroup limits.

What do you mean?

@discordianfish
Copy link
Contributor

discordianfish commented Jun 20, 2019

resource requests, not the cgroup limits.

What do you mean?

Right now this runs as many workers as the the limit of cpus it can use. If all workers are utilized 100%, every bit more cpu it would use (due to the go process etc) would cause some throttling.

So instead it would be nice it if uses the resource requests to calculate the number of workers. That's something you set to a value you ideally want to utilize. Does that make sense?
But I guess it's easy enough to set it yourself using the downward api.

@embano1
Copy link
Member

embano1 commented Jul 31, 2019

Sorry for the late pickup, but I agree with @discordianfish on (1) recommending to set requests for ingress (I talked about it here) and (2) using requests.cpu for the worker thread calculation. Especially since the community, based on experiences from Google Borg, Zalando et al, moves towards disabling limits altogether (e.g. globally on the kubelet via --cpu-cfs-quota=false). This could render PR #2990 ineffective.

pdbrito referenced this issue in pdbrito/laravel-cloud-run Sep 18, 2019
Apparently using auto doesn't work well with docker:
nginxinc/docker-nginx#31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants