-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CreateInstances() API when scaling up in GCE cloud provider #4158
Conversation
@@ -346,6 +363,10 @@ func isInstanceNotRunningYet(gceInstance *gce.ManagedInstance) bool { | |||
return gceInstance.InstanceStatus == "" || gceInstance.InstanceStatus == "PROVISIONING" || gceInstance.InstanceStatus == "STAGING" | |||
} | |||
|
|||
func generateInstanceName(migRef GceRef) string { | |||
return fmt.Sprintf("%v-%v", strings.TrimSuffix(migRef.Name, instanceGroupNameSuffix), rand.String(4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to care about collisions with existing nodes here? Looking at rand library I can see it uses 27 different characters for generating strings. I've done a back of the envelope calculation and if I got it right in a cluster with 5k nodes ~1% of all possible names will already be taken. For a scale-up of dozens of nodes in such cluster a collision seems pretty likely.
What is going to happen when we use an existing name in createInstances call? Is the entire call going to fail? Are we going to create the nodes that don't have name collision? If we return error from cloudprovider CA will generally assume the NodeGroup is not healthy and will put it on exponential scale-up backoff, so we can't rely on CA retrying the scale-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that if we have a collision the whole operation will fail.
We can either check for name collisions or use a different way of generating instance name; eg. use all 36 alphanumeric characters and possibly generating 5 extra characters instead of 4 (though this will differ from current instance name format)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is per-MIG, so it's capped at 1k nodes and ~0.2% of names taken in a nearly full MIG. The probability of choosing 10 names without collision in such situation is ~98%, for 50 names it's only ~90%. If it's not super difficult to plumb the existing node names in there, I'm leaning towards this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on maximum MIG size. I think in a cluster with thousands of nodes a scale-up of 50 nodes in one go is reasonably likely and the consequence of failing due to collision seems pretty bad, especially considering that if we fail to scale-up we give kubernetes more time to create pending pods and we run the risk of attempting a much larger scale-up next time (and possibly getting stuck for a really long time retrying a scale-up of few hundred nodes once every 30 minutes (maximum backoff duration)).
I agree that de-duping against existing nodes seems like the way to go. There is still a theoretical risk of collision if someone else (manual resize, some sort of node upgrade process, etc) creates nodes since last cache refresh, but that risk seems pretty low and a single retry should be sufficient in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks to me that we would need to fetch the names with GetMigNodes() method from gceManagerImpl, which with this implementation requires call to GCE.
I'm ok with doing that, are retries already implemented? Or is it something that should be added as a part of this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, seems like we're not caching this particular call. Given how much stuff we cache I just assumed we'd have the result somewhere :)
I think it should be fine to do an API call here. At most we're scaling-up 1 MIG / zone / loop so we're talking about ~3 extra API calls in a loop where we scale-up. Compared to 1 FetchMigInstances / MIG / loop we already do this doesn't seem like a big increase.
We need to do a scalability test of the API change anyway, so I'd suggest starting with doing an API call and in an unlikely case we find it has a scalability impact it should be pretty easy to extend cache.go to include this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added logic to fetch existing instance names
/lgtm Thanks for addressing the comments! Feel free to unhold if you don't agree with the nit |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olagacek, towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Use CreateInstances() API when scaling up in GCE cloud provider
No description provided.