Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud: Respect Kubernetes resource limits #51471

Merged
merged 1 commit into from
Jul 16, 2020
Merged

cloud: Respect Kubernetes resource limits #51471

merged 1 commit into from
Jul 16, 2020

Conversation

bobvawter
Copy link
Contributor

Detecting the number of available CPU cores and memory limits from within a
container can be inaccurate. This change passes the CPU and memory limits used
by the Kubernetes scheduler into the cockroach process.

In the absense of a limits block (e.g. when using BestEffort QOS in a
dev/staging environment), the scheduler will substitute the maximum value that
is appropriate for the node on which the container is scheduled.

This change can be applied retroactively to existing deployments and is not
specific to a particular version of CockroachDB.

The cockroach start command is broken onto multiple lines to improve
readability.

See also: #34988

Release note: None

Detecting the number of available CPU cores and memory limits from within a
container can be inaccurate.  This change passes the CPU and memory limits used
by the Kubernetes scheduler into the cockroach process.

In the absense of a limits block (e.g. when using BestEffort QOS in a
dev/staging environment), the scheduler will substitute the maximum value that
is appropriate for the node on which the container is scheduled.

This change can be applied retroactively to existing deployments and is not
specific to a particular version of CockroachDB.

The cockroach start command is broken onto multiple lines to improve
readability.

See also: #34988

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally OK with this approach - it brings us closer to a better situation.

However if I understand correctly (Andrew correct me if I'm wrong), setting GOMAXPROC correctly is only half of the solution. There's another part: CockroachDB internally inspects its OS environment to determine the number of cores available; this is used in the UI to display CPU utilization and other things. This inspection is independent/orthogonal from GOMAXPROCS; we also know that the code is currently wrong/problematic with k8s/docker.

NB: this inspection is currently performed via the upstream dep go-sigar. IF there's somethign wrong, it's likely in there. We can evaluate the other dependency gopsutil and check whether it has a better code to detect available cores as per orchestration.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have relates to using requests vs limits. Say a user sets a request for 4 cores but no limit. As I understand it, this is valid and will prevent the host from setting the cgroup which limits the total number of cores which may be utilized. IIRC this is done by some of our customers and may have been a recommendation at some point. Should the expression perhaps look at requests instead of limits?

However if I understand correctly (Andrew correct me if I'm wrong), setting GOMAXPROC correctly is only half of the solution. There's another part: CockroachDB internally inspects its OS environment to determine the number of cores available; this is used in the UI to display CPU utilization and other things. This inspection is independent/orthogonal from GOMAXPROCS; we also know that the code is currently wrong/problematic with k8s/docker.

NB: this inspection is currently performed via the upstream dep go-sigar. IF there's somethign wrong, it's likely in there. We can evaluate the other dependency gopsutil and check whether it has a better code to detect available cores as per orchestration.

You are correct, this does not touch the number of cores we believe the host to have. The CPU usage normalization will be off and so will the core count displayed in the admin UI.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)

@bobvawter
Copy link
Contributor Author

I looked at using requests originally. If you don't specify a requests section, you get 1 core and 0 memory for the injected values.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value of limits do you get if you don't specify a limits section? Perhaps we want to do something like take the max?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)

@bobvawter
Copy link
Contributor Author

If no limits are specified, you get the maximum number of cores and amount of memory that the particular kubelet is configured to provide.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm torn about the right thing to do here. Imagine (for whatever ghastly reason), somebody was running lots of cockroach instances (like our upcoming sql pods) in a k8s cluster. Imagine each instance is requests 1 vCPU but has no limits set. Say each of the VMs are actually 72 core VMs. Do we want the GOMAXPROCS to be 1 or 72?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @piyush-singh)

@bobvawter
Copy link
Contributor Author

The "no limits" case is pretty much the default behavior that you get today running cockroach on kubernetes. At least with this patch, the node-info page is going to be very explicit about what your nodes are doing. My gut says "let's be pedantic and do exactly what the operator has requested".

Screen Shot 2020-07-15 at 12 40 15 PM

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced. Thanks for walking through the thought experiment with me.

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)


cloud/kubernetes/cockroachdb-statefulset-secure.yaml, line 214 at r1 (raw file):

Quoted 9 lines of code…

        #   limits:
            # NOTE: Unless you have enabled the non-default Static CPU Management Policy
            # and are using an integer number of CPUs, we don't recommend setting a CPU limit.
            # See:
            #   https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy
            #   https://github.com/kubernetes/kubernetes/issues/51135
            #   cpu: "16"
            #   memory: "8Gi"

Not for now but my sense is that we should really provide more ram/core in our recommendations...

@knz
Copy link
Contributor

knz commented Jul 15, 2020

My gut says "let's be pedantic and do exactly what the operator has requested".

I agree with the "let's be pedantic" part, unfortunately what's implemented currently is unlikely to be what the operator has requested :)

We have two concrete problems:

  • Go does not restrict its CPU usage to the configured value of GOMAXPROCS. The env var only restricts the amount of parallelism for goroutines running Go code; any in-flight syscall invoked by a goroutine spills over into an OS thread that is spawned in excess of GOMAXPROCS. Therefore, the effective parallelism in a running CockroachDB process is not bound by GOMAXPROCS -- it is bound to the max number of cores/vcpus constrained by the container.

    In light of this, if an operator says "I want this node to use 4 cores", we must ensure that the container places an OS-level limit on CPU usage to 4 cores/vcpus, regardless of GOMAXPROCS. Of course, it's important/useful to have GOMAXPROCS align with that limit, but if there's no limit we are not doing what the operator wants.

  • as I pointed out earlier, we are creating an absolute UX disaster if the Admin UI and other metrics do not do reflection properly on the number of cores/vcpus. Reflection does not use GOMAXPROCS. It uses the gosigar upstream dep. This is (AFAIK) broken when CockroachDB runs inside a container. As long as this is broken, we are failing the human operator: regardless of actual resource usage by CockroachDB, they will be utterly confused (and unhappy) with the output they see in metrics / UI.

    We need to either fix the dependency or switch to another mechanism that does this correctly (maybe gopsutil does).

We can't really say "we're addressing the problems" without covering grounds on both points simultaneously.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anybody is claiming that this at all touches the second point. The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

@knz
Copy link
Contributor

knz commented Jul 15, 2020

I don't think anybody is claiming that this at all touches the second point.

Ok then we can defer that conversation to later then. @piyush-singh can you keep track of this.

Still, coming back to GOMAXPROCS: I think the config should always set a limit, and have GOMAXPROCS set to that limit. Then if the user sees GOMAXPROCS set to an extremely large value, they will find out on their own that they didn't set a limit properly, and get the idea to set one.

Regarding the other issue (which we're not look at today):

The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.

We have cgroup-specific code in some places. IF that's what it takes, we can add more.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem as I see it is that when a process gets its core count constrained in a container, procfs still reports all of the cores on the nodes. The only place that is aware of the constrained core count are the cgroups.

We have cgroup-specific code in some places. IF that's what it takes, we can add more.

Absolutely. I've dropped the ball here. This library https://github.com/uber-go/automaxprocs has logic to read the cgroups and I began to look at adopting it in #44880 but never followed through.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean to say is that with that library as a dependency, we can use it also to determine the core count (though the interface to that library leaves much to be desired). Not that the linked PR above does any such thing. I'll create an issue there to expose the value it reads.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uber-go/automaxprocs#29 (comment)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @piyush-singh)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see that GOMAXPROCS is set to the actual container limit. This aligns with my first point (I had misunderstood earlier). Thanks!

@bobvawter
Copy link
Contributor Author

bors r+

@bobvawter
Copy link
Contributor Author

bors retry

@craig
Copy link
Contributor

craig bot commented Jul 16, 2020

Build failed

@bobvawter
Copy link
Contributor Author

bors retry

@craig
Copy link
Contributor

craig bot commented Jul 16, 2020

Build succeeded

@craig craig bot merged commit 3ef8a89 into cockroachdb:master Jul 16, 2020
@bobvawter bobvawter deleted the k8s_memory branch July 16, 2020 16:24
craig bot pushed a commit that referenced this pull request Dec 2, 2020
56329: cloud: update resource requests/limits in StatefulSet configs r=taroface a=taroface

Our various StatefulSet configs had a placeholder `resources.requests` block with guidelines that became outdated with #51471. The guidelines are now updated.

Thanks to @tim-o for pointing this out.

Release note: none

Co-authored-by: taroface <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants