-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure: use per-vmss vmssvm incremental cache #93107
Azure: use per-vmss vmssvm incremental cache #93107
Conversation
Hi @bpineau. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
212d0ca
to
c795853
Compare
c795853
to
ceef4c4
Compare
could you please fix the error and re-run the tests? |
3abf8fb
to
61812bf
Compare
return node, nil | ||
} | ||
|
||
if len(nodeName) < 6 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use getScaleSetVMInstanceID()
here to check whether a node is VMSS instance or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the issue. Add it to v1.19 milestone since it would fix a serious bug for large number of VMSS scenarios.
/milestone v1.19
/retest |
Azure's cloud provider VMSS VMs API accesses are mediated through a cache holding and refreshing all VMSS together. Due to that we hit VMSSVM.List API more often than we could: an instance's cache miss or expiration should only require a single VMSS re-list, while it's currently O(n) relative to the number of attached Scale Sets. Under hard pressure (clusters with many attached VMSS that can't all be listed in one sequence of successive API calls) the controller manager might be stuck trying to re-list everything from scratch, then aborting the whole operation; then re-trying and re-triggering API rate-limits, affecting the whole Subscription. This patch replaces the global VMSS VMs cache by per-VMSS VMs caches. Refreshes (VMSS VMs lists) are scoped to the single relevant VMSS; under severe throttling the various caches can be incrementally refreshed. Signed-off-by: Benjamin Pineau <[email protected]>
Signed-off-by: Benjamin Pineau <[email protected]>
61812bf
to
fcb3f1f
Compare
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/retest
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bpineau, feiskyer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
@bpineau: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Review the full test history for this PR. Silence the bot with an |
Cherry pick of #93107: Azure: use per-vmss vmssvm incremental cache
Cherry pick of #93107: Azure: use per-vmss vmssvm incremental cache
Cherry pick of #93107: Azure: use per-vmss vmssvm incremental cache
What type of PR is this?
/kind bug
What this PR does / why we need it:
Azure's cloud provider VMSS VMs API accesses are mediated through a cache holding and refreshing all VMSS together.
Due to that we hit VMSSVM.List API more often than we could: an instance's cache miss or expiration should only require a single VMSS re-list, while it's currently O(n) relative to the number of attached Scale Sets.
Under hard pressure (clusters with many attached VMSS that can't all be listed in one sequence of successive API calls) the controller manager might be stuck trying to re-list everything from scratch, then aborting the whole operation due to rate limits, affecting the whole Subscription.
This patch replaces the global VMSS VMs cache by per-VMSS VMs caches. Refreshes (VMSS VMs lists) are scoped to the single relevant VMSS; under severe throttling the various caches can be incrementally refreshed.
Which issue(s) this PR fixes:
Fixes #93106
Special notes for your reviewer:
We are assuming VMSS nodes are named from VMSS' computerNamePrefix+id (or vmssName+id, when computerNamePrefix isn't specified), as described https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-instance-ids and https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/2018-10-01/virtualmachinescalesets#virtualmachinescalesetosprofile-object. Are there special cases not covered by that doc? if so we can probably complement that optimistic lookup (trying the vmss matching name prefix first, happy path) by a fallback to a scan over remaining scale sets. The non-optimal fallback path would still be an improvement as (in case of throttling) we keep partial results (per VMSS caches), refreshes are incremental.
Does this PR introduce a user-facing change?:
/assign @andyzhangx @feiskyer
/sig cloud-provider
/area provider/azure