-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gce: rm unnecessary GetMigInstanceTemplate locking #4621
gce: rm unnecessary GetMigInstanceTemplate locking #4621
Conversation
0a32b15
to
abe0d41
Compare
@BigDarkClown can you take a look? I think you're the most familiar with this part of the code. |
I think this change is reasonable. As you noticed the cache setters and getters are already guarded by a different lock, so there should be no issues with concurrency there. In other |
/lgtm |
@BigDarkClown: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The lock held from (and covering most of) `GetMigInstanceTemplate()` prevents `InstanceTemplates.Get()` concurrent executions, while it's not really needed as all caches access functions used here already take care of locking cache accesses (and it's reading and filling one single cache entry per call). `InstanceTemplates.Get()` isn't fast (at about 0.5s/call in my test env), running all calls sequentially slows down startup significantly: ``` forceRefresh() fetchAutoMigs() workqueue.ParallelizeUntil(..., func() { <- can launch concurrent registerMigs registerMig() GetMigTemplateNode() GetMigInstanceTemplate() <- lock held FetchMigTemplate() client.gceService.InstanceTemplates.Get() <- slow API call ``` When running this change on a cluster having 800 MIGs, startup went from 6min to 1.5min. Also tested with a `-race` build.
abe0d41
to
432db09
Compare
Rebased + re-tested after the lock was renamed |
Oops, it completely slipped my radar. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bpineau, MaciekPytel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The lock held from (and covering most of)
GetMigInstanceTemplate()
preventsInstanceTemplates.Get()
concurrent executions, while it's not really neededas all caches access functions used here already take care of locking properly
(and it's reading/writing one single cache entry per call).
InstanceTemplates.Get()
isn't fast (at about 0.5s/call in my test env),running all calls sequentially slows down startup significantly:
When running this change on a cluster having 800 MIGs, startup went from 6min to 1.5min.
Also tested with a
-race
build.Which component this PR applies to?
Cluster-autoscaler's GCE cloud provider.
What type of PR is this?
/kind cleanup
/kind feature
What this PR does / why we need it:
Remove unnecessary locking to speed-up startup time on large clusters (many MIGs).