Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CreateInstances() API when scaling up in GCE cloud provider #4158

Merged
merged 1 commit into from
Jun 23, 2021
Merged

Use CreateInstances() API when scaling up in GCE cloud provider #4158

merged 1 commit into from
Jun 23, 2021

Conversation

olagacek
Copy link
Contributor

No description provided.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 22, 2021
@@ -346,6 +363,10 @@ func isInstanceNotRunningYet(gceInstance *gce.ManagedInstance) bool {
return gceInstance.InstanceStatus == "" || gceInstance.InstanceStatus == "PROVISIONING" || gceInstance.InstanceStatus == "STAGING"
}

func generateInstanceName(migRef GceRef) string {
return fmt.Sprintf("%v-%v", strings.TrimSuffix(migRef.Name, instanceGroupNameSuffix), rand.String(4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about collisions with existing nodes here? Looking at rand library I can see it uses 27 different characters for generating strings. I've done a back of the envelope calculation and if I got it right in a cluster with 5k nodes ~1% of all possible names will already be taken. For a scale-up of dozens of nodes in such cluster a collision seems pretty likely.

What is going to happen when we use an existing name in createInstances call? Is the entire call going to fail? Are we going to create the nodes that don't have name collision? If we return error from cloudprovider CA will generally assume the NodeGroup is not healthy and will put it on exponential scale-up backoff, so we can't rely on CA retrying the scale-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that if we have a collision the whole operation will fail.
We can either check for name collisions or use a different way of generating instance name; eg. use all 36 alphanumeric characters and possibly generating 5 extra characters instead of 4 (though this will differ from current instance name format)
WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is per-MIG, so it's capped at 1k nodes and ~0.2% of names taken in a nearly full MIG. The probability of choosing 10 names without collision in such situation is ~98%, for 50 names it's only ~90%. If it's not super difficult to plumb the existing node names in there, I'm leaning towards this option.

Copy link
Contributor

@MaciekPytel MaciekPytel Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on maximum MIG size. I think in a cluster with thousands of nodes a scale-up of 50 nodes in one go is reasonably likely and the consequence of failing due to collision seems pretty bad, especially considering that if we fail to scale-up we give kubernetes more time to create pending pods and we run the risk of attempting a much larger scale-up next time (and possibly getting stuck for a really long time retrying a scale-up of few hundred nodes once every 30 minutes (maximum backoff duration)).

I agree that de-duping against existing nodes seems like the way to go. There is still a theoretical risk of collision if someone else (manual resize, some sort of node upgrade process, etc) creates nodes since last cache refresh, but that risk seems pretty low and a single retry should be sufficient in such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks to me that we would need to fetch the names with GetMigNodes() method from gceManagerImpl, which with this implementation requires call to GCE.

I'm ok with doing that, are retries already implemented? Or is it something that should be added as a part of this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems like we're not caching this particular call. Given how much stuff we cache I just assumed we'd have the result somewhere :)

I think it should be fine to do an API call here. At most we're scaling-up 1 MIG / zone / loop so we're talking about ~3 extra API calls in a loop where we scale-up. Compared to 1 FetchMigInstances / MIG / loop we already do this doesn't seem like a big increase.

We need to do a scalability test of the API change anyway, so I'd suggest starting with doing an API call and in an unlikely case we find it has a scalability impact it should be pretty easy to extend cache.go to include this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added logic to fetch existing instance names

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 23, 2021
@towca
Copy link
Collaborator

towca commented Jun 23, 2021

/lgtm
/approve
/hold

Thanks for addressing the comments! Feel free to unhold if you don't agree with the nit

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olagacek, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2021
@towca
Copy link
Collaborator

towca commented Jun 23, 2021

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 23, 2021
@k8s-ci-robot k8s-ci-robot merged commit 07c7607 into kubernetes:master Jun 23, 2021
akirillov pushed a commit to airbnb/autoscaler that referenced this pull request Oct 27, 2022
Use CreateInstances() API when scaling up in GCE cloud provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants