Use CreateInstances() API when scaling up in GCE cloud provider #4158

olagacek · 2021-06-22T15:29:23Z

No description provided.

MaciekPytel · 2021-06-23T09:59:30Z

cluster-autoscaler/cloudprovider/gce/autoscaling_gce_client.go

@@ -346,6 +363,10 @@ func isInstanceNotRunningYet(gceInstance *gce.ManagedInstance) bool {
 	return gceInstance.InstanceStatus == "" || gceInstance.InstanceStatus == "PROVISIONING" || gceInstance.InstanceStatus == "STAGING"
 }

+func generateInstanceName(migRef GceRef) string {
+	return fmt.Sprintf("%v-%v", strings.TrimSuffix(migRef.Name, instanceGroupNameSuffix), rand.String(4))


Do we need to care about collisions with existing nodes here? Looking at rand library I can see it uses 27 different characters for generating strings. I've done a back of the envelope calculation and if I got it right in a cluster with 5k nodes ~1% of all possible names will already be taken. For a scale-up of dozens of nodes in such cluster a collision seems pretty likely.

What is going to happen when we use an existing name in createInstances call? Is the entire call going to fail? Are we going to create the nodes that don't have name collision? If we return error from cloudprovider CA will generally assume the NodeGroup is not healthy and will put it on exponential scale-up backoff, so we can't rely on CA retrying the scale-up.

I checked that if we have a collision the whole operation will fail.
We can either check for name collisions or use a different way of generating instance name; eg. use all 36 alphanumeric characters and possibly generating 5 extra characters instead of 4 (though this will differ from current instance name format)
WDYT?

This is per-MIG, so it's capped at 1k nodes and ~0.2% of names taken in a nearly full MIG. The probability of choosing 10 names without collision in such situation is ~98%, for 50 names it's only ~90%. If it's not super difficult to plumb the existing node names in there, I'm leaning towards this option.

Good point on maximum MIG size. I think in a cluster with thousands of nodes a scale-up of 50 nodes in one go is reasonably likely and the consequence of failing due to collision seems pretty bad, especially considering that if we fail to scale-up we give kubernetes more time to create pending pods and we run the risk of attempting a much larger scale-up next time (and possibly getting stuck for a really long time retrying a scale-up of few hundred nodes once every 30 minutes (maximum backoff duration)).

I agree that de-duping against existing nodes seems like the way to go. There is still a theoretical risk of collision if someone else (manual resize, some sort of node upgrade process, etc) creates nodes since last cache refresh, but that risk seems pretty low and a single retry should be sufficient in such case.

it looks to me that we would need to fetch the names with GetMigNodes() method from gceManagerImpl, which with this implementation requires call to GCE.

I'm ok with doing that, are retries already implemented? Or is it something that should be added as a part of this change

Hmm, seems like we're not caching this particular call. Given how much stuff we cache I just assumed we'd have the result somewhere :)

I think it should be fine to do an API call here. At most we're scaling-up 1 MIG / zone / loop so we're talking about ~3 extra API calls in a loop where we scale-up. Compared to 1 FetchMigInstances / MIG / loop we already do this doesn't seem like a big increase.

We need to do a scalability test of the API change anyway, so I'd suggest starting with doing an API call and in an unlikely case we find it has a scalability impact it should be pretty easy to extend cache.go to include this.

OK, added logic to fetch existing instance names

cluster-autoscaler/cloudprovider/gce/autoscaling_gce_client.go

towca · 2021-06-23T14:23:51Z

/lgtm
/approve
/hold

Thanks for addressing the comments! Feel free to unhold if you don't agree with the nit

k8s-ci-robot · 2021-06-23T14:24:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olagacek, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [towca]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

towca · 2021-06-23T14:31:45Z

/lgtm
/unhold

Use CreateInstances() API when scaling up in GCE cloud provider

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 22, 2021

k8s-ci-robot requested review from aleksandra-malinowska and Jeffwan June 22, 2021 15:29

MaciekPytel reviewed Jun 23, 2021

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 23, 2021

towca reviewed Jun 23, 2021

View reviewed changes

cluster-autoscaler/cloudprovider/gce/autoscaling_gce_client.go Outdated Show resolved Hide resolved

cluster-autoscaler/cloudprovider/gce/autoscaling_gce_client.go Outdated Show resolved Hide resolved

towca reviewed Jun 23, 2021

View reviewed changes

cluster-autoscaler/cloudprovider/gce/autoscaling_gce_client.go Outdated Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2021

k8s-ci-robot assigned towca Jun 23, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2021

Use CreateInstances() API when scaling up in GCE cloud provider

674de4f

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 23, 2021

k8s-ci-robot merged commit 07c7607 into kubernetes:master Jun 23, 2021

akirillov pushed a commit to airbnb/autoscaler that referenced this pull request Oct 27, 2022

Merge pull request kubernetes#4158 from olagacek/master

25c445d

Use CreateInstances() API when scaling up in GCE cloud provider

akirillov mentioned this pull request Oct 27, 2022

cluster autoscaler patch cluster autoscaler 1.21.3 airbnb0 airbnb/autoscaler#29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CreateInstances() API when scaling up in GCE cloud provider #4158

Use CreateInstances() API when scaling up in GCE cloud provider #4158

olagacek commented Jun 22, 2021

MaciekPytel Jun 23, 2021

olagacek Jun 23, 2021

towca Jun 23, 2021

MaciekPytel Jun 23, 2021 •

edited

Loading

olagacek Jun 23, 2021

MaciekPytel Jun 23, 2021

olagacek Jun 23, 2021

towca commented Jun 23, 2021

k8s-ci-robot commented Jun 23, 2021

towca commented Jun 23, 2021

Use CreateInstances() API when scaling up in GCE cloud provider #4158

Use CreateInstances() API when scaling up in GCE cloud provider #4158

Conversation

olagacek commented Jun 22, 2021

MaciekPytel Jun 23, 2021

Choose a reason for hiding this comment

olagacek Jun 23, 2021

Choose a reason for hiding this comment

towca Jun 23, 2021

Choose a reason for hiding this comment

MaciekPytel Jun 23, 2021 • edited Loading

Choose a reason for hiding this comment

olagacek Jun 23, 2021

Choose a reason for hiding this comment

MaciekPytel Jun 23, 2021

Choose a reason for hiding this comment

olagacek Jun 23, 2021

Choose a reason for hiding this comment

towca commented Jun 23, 2021

k8s-ci-robot commented Jun 23, 2021

towca commented Jun 23, 2021

MaciekPytel Jun 23, 2021 •

edited

Loading