-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run as k8s jobs rather than as a single running pod. #189
Comments
@JacobWeyer the default for the provider is to run one reconciliation at a time, but this is configurable using However, since the underlying code runs the terraform CLI ,the pod will attempt to use as many CPUs as it has threads configured, so to get a "true" parallel execution you would need to make sure that there are the same number of CPUs as you set for the reconcile rate. |
Will that require us to keep a massive reservation at all times rather than allowing this to be somewhat dynamic and to autoscale? |
I'm not sure what you mean by autoscale - the pod will try to use whatever CPUs it needs, if they are available. There is no way to add more pods to the deployment, since Kubernetes controllers can only run a single instance at a time. So your worker node would need to have the CPUs available for the pod to use, but when they aren't in use they would be available for other pods on the worker to use. You might be able to use something like Karpenter to auto scale your nodegroup when a worker runs out of CPUs, and then scale in when the load is reduced. |
I guess I'm confused why this was designed to run as a single instance instead of having the operator trigger each run as its own job in a similar manner to how something like Github Actions works. |
Crossplane providers are designed to be reconciling kubernetes controllers which are responsible for maintaining the state specified in the If each CLI command was dispatched as individual jobs they could take advantage of idle CPU resources on other workers but each would still require 1 CPU to run to completion, and it would add complexity to track the remote job completion so that subsequent reconciliations don't run while there is already a process running. |
@bobh66 / @JacobWeyer can we do it in a master slave , where master has configuration . and making slave as replicas which is scalable. will it work ?. |
Yeah that makes sense @bobh66, I'm still curious if there's a more distributed batching methodology that'd be beneficial. Especially at scale without just running more jobs in parallel on a single operator. |
I'm a little wary of this idea. Mostly in that I'm wary of |
What problem are you facing?
I'd like to see this operator function in a way where it would spin up workspace runs in parallel for every request (up to a max parallel limit that can be set by the user).
The intention is to use this for developer environments, load testing, integration testing and more in a very dynamic manner. The way the current operator seems to work is sequential by nature and ends up being slower as a result.
How could Official Terraform Provider help solve your problem?
By being able to run our terraform we can take advantage of crossplane's flexibility combined with helm and spin up a significant number of micro services and environments very quickly if they can run in parallel rather than being forced to wait for sequential execution. Sequential execution is a real bummer when we have something like RDS or DMS that can take up to 15 minutes to start up properly.
The text was updated successfully, but these errors were encountered: