[cdk_infra] Add node auto scaling to EKS clusters #1059

bryan-aguilar · 2023-01-26T18:46:22Z

There exists a possibility that during the Collector CI multiple EKS test cases will run against a single cluster. This could cause over utilization of node resources which will cause pods not to be scheduled. We have currently only seen issues with hitting caps due to CPU requests. Current resource quotas for deployments can be found by searching for limits = in the terraform directory. Example here. Currently, in most cases there is no request quota set but CPU limits set at .2.

EKS Clusters should be setup in a way that does not restrict how many tests that can be run in parallel. We should also not have to continually tweak requests/limits based on how many test cases may be running in parallel. To better accommodate this we could set up node a node autoscaler that can handle the increased test load on the clusters.

A temporary solution would also be to increase the minimum amount of nodes in the managed node group. This comes with a tradeoff in cost and should not be considered a long term solution.

The text was updated successfully, but these errors were encountered:

bryan-aguilar added the EKS EKS related issues label Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cdk_infra] Add node auto scaling to EKS clusters #1059

[cdk_infra] Add node auto scaling to EKS clusters #1059

bryan-aguilar commented Jan 26, 2023

[cdk_infra] Add node auto scaling to EKS clusters #1059

[cdk_infra] Add node auto scaling to EKS clusters #1059

Comments

bryan-aguilar commented Jan 26, 2023