-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Azure Provider HasInstance implementation #6956
feat: Azure Provider HasInstance implementation #6956
Conversation
Skipping CI for Draft Pull Request. |
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, with minor feedback, and some comments re implication of using cache.
How was this tested? Can we add unit tests? E2E tests?
f0d3407
to
ea410de
Compare
Co-authored-by: Alex Leites <[email protected]>
Co-authored-by: Alex Leites <[email protected]>
… didnt make any sense in the first place
Co-authored-by: Alex Leites <[email protected]>
ca72ce3
to
34a26ee
Compare
/lgtm |
What type of PR is this?
/kind bug
/kind regression
What this PR does / why we need it:
CA fails to scale up or cancel in progress schaledown when there are unschedulable pods. Stealing this description from the aws provider implementation.
I think the description of #5054 (comment) explains it well:
...original intent of determining the deleted nodes was incorrect, which led to the issues reported by other users. The nodes tainted with ToBeDeleted were misidentified as Deleted instead of Ready/Unready, which caused a miscalculation of the node being included as Upcoming. This caused problems described in #3949 and #4456.
Which issue(s) this PR fixes:
Special notes for your reviewer:
This PR introduces the HasInstance method to the Azure provider for Cluster Autoscaler. The primary purpose of this method is to ascertain whether a given node has a corresponding instance in the Azure cloud provider. This implementation helps to prevent the undercount of existing VMs and addresses issues related to the taint-based overcount of deleted VMs.
• The HasInstance method ensures that if it is uncertain whether an instance exists, it returns an error instead of false, nil. This approach enforces a fallback to the taint-based determination method, providing a more reliable count of existing VMs.
• If the instance exists: return true, nil
• If the instance does not exist: return *, ErrNotImplemented (consider using a custom error for autoscaled nodes)
• For unimplemented cases: return *, ErrNotImplemented
• For any other errors: return *, error
• ErrNotImplemented is used for silent fallback, while any other errors will be logged for further investigation.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: