Investigate why installing 700+ CRDs causing degradation of performance in apiserver #47

muvaf · 2021-09-02T14:23:52Z

What problem are you facing?

Today if you run kubectl apply -f package/crds in provider-tf-aws, your cluster gets really slow. In GKE, kubectl command just stops after like 50 CRDs.

How could Terrajet help solve your problem?

We have some ideas around sharding the controllers and API types, allowing customers to install only a set of them. But we haven't identified the actual problem. So, we need to make sure we know the root cause of the problem before choosing a solution so that we have that problem in mind for future designs.

The text was updated successfully, but these errors were encountered:

chlunde · 2021-09-09T19:05:06Z

I suspect kubectl behaves slowly not because the control plane responds slowly, but because it self-throttles with a hardcoded QPS, and downloads CRDs frequently (but not for every command, there is a cache).

Example output w/ provider-aws, flux, crossplane and a few other types, 191 CRDs in total:

... $ kubectl get managed
I0909 20:16:19.782205  893807 request.go:665] Waited for 1.128844238s due to client-side throttling, not priority and fairness, request: GET:https://....../apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s

References for throttling:
https://github.com/kubernetes/client-go/blob/master/rest/request.go#L584-L593
kubernetes/kubectl#773

Background about kubectl discovery cache:
https://www.reddit.com/r/kubernetes/comments/bpfi48/why_does_kubectl_get_abc_take_10x_as_long_to/enuhn5v/?utm_source=reddit&utm_medium=web2x&context=3
"Discovery burst" of 100 reqs:
https://github.com/kubernetes/cli-runtime/blob/233e5fcb7112e0ab966b6a1e941a152416372ba4/pkg/genericclioptions/config_flags.go#L371

Please also note that kubectl on aws w/ aws eks update-kubeconfig is slow , (hack/workaround), this is also unrelated but makes the EKS control plane feel sluggish.

muvaf · 2021-09-13T17:04:13Z

@chlunde but the slowness in apiserver is experienced after kubectl apply operation is completed as well.

ulucinar · 2021-09-15T19:29:09Z

I did some experiments using provider-tf-aws & provider-tf-azure with the full set of resources generated for both. provider-tf-aws has 765 CRDs and provider-tf-azure has 658 CRDs if full set of supported resources are generated (we skip generating certain resources).

Experiment Setup # 1:

The experiments have been performed on a darwin_arm64 machine with 8 CPU cores. A control plane consisting of etcd, kube-apiserver and kube-controller-manager with native (darwin_arm64) binaries was used for the experiments. Debug-level logs for the etcd were enabled using the --log-level=debug command-line option. And for the API server a log-level verbosity of 7 was used with the --v=7 command-line option. CPU utilization and physical memory consumption metrics were collected for the control plane components. And CPU profiling has been performed for kube-apiserver during the experiments. Please note that verbose logging enabled for the etcd and kube-apiserver will contribute to higher CPU utilization, which I have not quantified. However, during the experiments I have collected CPU utilization and memory metrics before the CRDs are registered to establish a baseline and to observe the impacts of registering large numbers of CRDs on the control plane components. Please also note that the providers are not running in these sets of experiments as we first want to focus on the impact of registering large numbers of CRDs and want to avoid the cost of watches from the providers in these sets of experiments. After a warm-up period of 3m. to establish a baseline, all the ~765 CRDs from provider-tf-aws are registered. Kubernetes version is v1.21.0.

Registering 765 CRDs of provider-tf-aws

CPU profiling data collected from kube-apiserver during these experiments reveal that while we are trying to register 100s of CRDs, considerable CPU time (for instance, in the data provided which covers a period of 1 hour, 42.74% of CPU time) is spent while updating the OpenAPI spec served at the /openapi/v2 endpoint due to expensive Swagger JSON marshal (accounts for a total of the 17.74% of CPU time spent), proto binary serialization (accounts for a total of 17.88%) and JSON unmarshaling by the json-iterator library (frozenConfig.Unmarshal, which accounts for a total of 18.16%). Please also note that we cause a high heap churn as expected, runtime.memclrNoHeapPointers accounts for a total of 39.34% of CPU time sampled.

The following figure shows CPU utilization for etcd, kube-apiserver and kube-controller-manager. The kubectl command that's run for registering the 765 CRDs completes in ~1 min. However, kube-apiserver exhibits a high CPU utilization as it publishes the OpenAPI spec for ~20 min. After this period of high CPU load, kube-apiserver has an increased baseline CPU utilization as expected because there are watches in place for the registered CRDs and there are other periodic tasks being run in the background.

State of the Art for the Established Kubernetes Scalability Thresholds:

Unfortunately, the Kubernetes Scalability thresholds file from sig-scalability group does not consider CRDs per cluster as a dimension, although it establishes the base guidelines for understanding issues around Kubernetes scalability. However, here sig-api-machinery group suggests 500 as a maximum limit for the scalability target on the # of CRDs per cluster. They also note in the same document that this suggested limit of 500 CRDs is not due to API call latency SLOs but because the background OpenAPI publishing is very expensive as we also observed in our experiments.

Summary

Further tests are needed to measure API latency but I do not expect the high number of registered CRDs would by itself increase latency causing violations of Kubernetes API call latency SLOs, excluding high saturation cases (i.e., kube-apiserver or some other control plane component starve for CPU). An interesting question is whether we currently violate latency SLOs with the Terraform providers during synchronous operations like observing remote infrastructure via Terraform CLI. We had better keep an eye on that.
Selectively registering CRDs and selectively starting related controllers is certainly a more scalable approach. We should also keep an eye on the Kubernetes scalability thresholds document and probably honor the suggested maximum limit of 500 CRDs as a scale target in all cases (or any scalability threshold that will be established in the future).
In the next experiment, I'd like to discuss the additional overhead introduced when even more CRDs are introduced (like installing a second provider to the cluster) to reason about whether the overhead increases linearly with the # of CRDs.

I have some other experiments whose results I will publish in separate comments to this issue.

ulucinar · 2021-09-15T22:47:11Z

Experiment Setup # 2:

In a cluster where all provider-tf-aws CRDs are available, additionally register the 658 CRDs of provider-tf-azure with kubectl. Here, we would like to observe if an issue described here for Kubernetes Endpoints object exists for CRDs during OpenAPI spec publishing. Again, providers are not started.
As it can be seen in the below figure, although the initial registration and publishing of the 765 provider-tf-aws CRDs took ~20 min, registration of the additional 658 provider-tf-azure CRDs took ~74 min, although there was no CPU saturation:

Due to client-side throttling reported by kubectl as @chlunde describes here (and maybe due to some other factors I have not investigated), provisioning with kubectl of the last 658 CRDs takes more time (~75 s), which corresponds to an increase of 145% compared to the initial set of 765 CRDs. Please note that as mentioned above, kubectl performs client-side throttling.

Summary

As expected, as we increase the number of CRDs in the cluster, it becomes more expensive to compute the OpenAPI spec described in this comment per CRD. With a back-of-the-envelope calculation, kube-apiserver had spent an average of 20 min / 765=1.57 s per provider-tf-aws CRD for publishing but for each provider-tf-azure CRD, average time spent has increased to 74 min / 658=6.75 s, which corresponds to a ~430% increase. I have not collected CPU profiling data for this experiment and the assumption is that, kube-apiserver has again been busy with publishing the additional provider-tf-azure CRDs as described here in the spotted high utilization interval of 74 min.

muvaf added enhancement New feature or request post-alpha labels Sep 2, 2021

negz added the performance label Sep 7, 2021

luebken assigned ulucinar Sep 9, 2021

turkenh mentioned this issue Sep 9, 2021

Installing packages with many CRDs causes reconciler to exceed context deadline crossplane/crossplane#2564

Closed

turkenh mentioned this issue Sep 10, 2021

Bump context deadline for ProviderRevision controller to 3 mins crossplane/crossplane#2570

Merged

3 tasks

muvaf mentioned this issue Sep 13, 2021

Generate only selected resources for tech preview crossplane-contrib/provider-jet-aws#18

Merged

ulucinar mentioned this issue Sep 17, 2021

Load test with 100+ CR instances #55

Closed

muvaf added alpha and removed post-alpha labels Sep 24, 2021

ulucinar closed this as completed Sep 27, 2021

ulucinar mentioned this issue Oct 7, 2021

Make scaling experiment results & resources part of a one-pager in Terrajet repo #89

Open

This was referenced Oct 21, 2021

API Server (and clients) becomes unresponsive with too many CRDs crossplane/crossplane#2649

Closed

High CPU Utilization in kube-apiserver when a large number of CRDs are created kubernetes/kubernetes#105932

Closed

ulucinar mentioned this issue Feb 11, 2022

Consolidation of CRD Scaling Issues crossplane/crossplane#2895

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate why installing 700+ CRDs causing degradation of performance in apiserver #47

Investigate why installing 700+ CRDs causing degradation of performance in apiserver #47

muvaf commented Sep 2, 2021

chlunde commented Sep 9, 2021 •

edited

Loading

muvaf commented Sep 13, 2021

ulucinar commented Sep 15, 2021 •

edited

Loading

ulucinar commented Sep 15, 2021

Investigate why installing 700+ CRDs causing degradation of performance in apiserver #47

Investigate why installing 700+ CRDs causing degradation of performance in apiserver #47

Comments

muvaf commented Sep 2, 2021

What problem are you facing?

How could Terrajet help solve your problem?

chlunde commented Sep 9, 2021 • edited Loading

muvaf commented Sep 13, 2021

ulucinar commented Sep 15, 2021 • edited Loading

Experiment Setup # 1:

Registering 765 CRDs of provider-tf-aws

State of the Art for the Established Kubernetes Scalability Thresholds:

Summary

ulucinar commented Sep 15, 2021

Experiment Setup # 2:

Summary

chlunde commented Sep 9, 2021 •

edited

Loading

ulucinar commented Sep 15, 2021 •

edited

Loading