Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler CAPI provider only supports merged clusters #3196

Closed
elmiko opened this issue Jun 4, 2020 · 5 comments · Fixed by #3203
Closed

Cluster Autoscaler CAPI provider only supports merged clusters #3196

elmiko opened this issue Jun 4, 2020 · 5 comments · Fixed by #3203
Labels
area/provider/cluster-api Issues or PRs related to Cluster API provider

Comments

@elmiko
Copy link
Contributor

elmiko commented Jun 4, 2020

The CAPI project defines multiple ways to deploy the controllers and resources that define its behavior. The primary method described in the quickstart documentation involves creating a kubernetes management cluster which hosts the CAPI resources, and then using that management cluster to create workload clusters (these are separate kubernetes installations). This presents some challenges when a user wants to integrate a cluster autoscaler into their workload clusters.

The CAPI project does support users moving their management configuration into their workload clusters, essentially joining the two into a single cluster (see the clusterctl move command). This joining operation will allow a user to run a cluster autoscaler in their workload cluster, but it effectively reduces one of the primary use cases for CAPI.

This issue is being created to help track the efforts around creating a plan to address this gap in functionality. There are several ways to address this problem, and it is possible that a solution could be reached without changing the autoscaler, but as this issue is primarily with autoscaler integration this issue is being created here.

user story

as a CAPI user, i would like to be able to utilize cluster autoscaling in my workload clusters while retaining a separate management cluster to track and control the lifecycle of my workload clusters.

/area provider/cluster-api

@MarcusNoble
Copy link
Contributor

(Following on from our chat on Slack...)

We have a need for using cluster-autoscaler with cluster-api to autoscale many target clusters. Ideally we'd prefer if it was possible to run the cluster-autoscaler within the management cluster and have it modify the MachineSet sizes (saving using more resources on each of the target clusters) but we had already planned on running cluster-autoscaler on each cluster so wouldn't be a problem if this wasn't a possibility.

So other points worth noting that might have an impact on functionality:

  • We use multiple nodegroups for our target clusters (one per availability zone) and use --balance-similar-node-groups
  • Each target cluster is in a seperate account to the management cluster
  • If possible it'd be nice to be able to scale to 0 (e.g. we tend to scale down non-production environment out of hours to save costs)

@elmiko
Copy link
Contributor Author

elmiko commented Aug 21, 2020

thanks for the comments @MarcusNoble!

on the topic of scaling to zero, we do have an issue open to track that work but we are still having some discussions about the implementation. see #3150

@detiber
Copy link
Member

detiber commented Aug 26, 2020

Related PRs: #3314 #3203

With the above PRs, you should be able to deploy a copy of the autoscaler for each workload cluster you have. The simplest deployment method would be to deploy the autoscaler deployments to the management cluster, mounting the kubeconfig secret for the workload cluster, and setting the appropriate command line flags (the README updates should cover how to set them).

@benmoss
Copy link
Member

benmoss commented Sep 3, 2020

I think @MarcusNoble was raising an idea that isn't captured in @detiber's PR, which is that it makes sense to have one autoscaler on the management cluster that autoscales all the workload clusters. The problem with this is that the autoscaler is tied to the k8s version, so you'd at least need a 1.18.x autoscaler for all your 1.18.x clusters, etc.

This might still be an idea worth investigating, but it would require some substantial changes. I don't know if it makes sense for any provider other than CAPI to have this architecture, though I suppose it could be possible. Something like gcloud container clusters list and using metadata to store cluster-specific autoscaler options might be viable as an alternative to the one-autoscaler-per-cluster model of today.

@elmiko
Copy link
Contributor Author

elmiko commented Sep 3, 2020

that's a good distinction to draw out Ben. imo, it sounds like a large architectural undertaking given that currently the autoscaler is assumed to be in a mostly 1:1 relationship with the cluster it manges (1 autoscaler per cluster). i know folks are able to run multiple autoscalers and that there are ways to label and segment the nodes and pods they watch. but, changing the autoscaler to have the ability to watch multiple clusters sounds intimidating to me, perhaps i'm over-complicating it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/cluster-api Issues or PRs related to Cluster API provider
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants