You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There exists a possibility that during the Collector CI multiple EKS test cases will run against a single cluster. This could cause over utilization of node resources which will cause pods not to be scheduled. We have currently only seen issues with hitting caps due to CPU requests. Current resource quotas for deployments can be found by searching for limits = in the terraform directory. Example here. Currently, in most cases there is no request quota set but CPU limits set at .2.
EKS Clusters should be setup in a way that does not restrict how many tests that can be run in parallel. We should also not have to continually tweak requests/limits based on how many test cases may be running in parallel. To better accommodate this we could set up node a node autoscaler that can handle the increased test load on the clusters.
A temporary solution would also be to increase the minimum amount of nodes in the managed node group. This comes with a tradeoff in cost and should not be considered a long term solution.
The text was updated successfully, but these errors were encountered:
There exists a possibility that during the Collector CI multiple EKS test cases will run against a single cluster. This could cause over utilization of node resources which will cause pods not to be scheduled. We have currently only seen issues with hitting caps due to CPU requests. Current resource quotas for deployments can be found by searching for
limits =
in theterraform
directory. Example here. Currently, in most cases there is norequest
quota set but CPU limits set at.2
.EKS Clusters
should
be setup in a way that does not restrict how many tests that can be run in parallel. Weshould
also not have to continually tweakrequests/limits
based on how many test casesmay
be running in parallel. To better accommodate this wecould
set up node a node autoscaler that can handle the increased test load on the clusters.A temporary solution would also be to increase the minimum amount of nodes in the managed node group. This comes with a tradeoff in cost and should not be considered a long term solution.
The text was updated successfully, but these errors were encountered: