-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #61363, Bounded retries for cloud allocator. #61375
Fix #61363, Bounded retries for cloud allocator. #61375
Conversation
/assign @bowei |
540988e
to
1eb3349
Compare
@bowei added the check for errors.IsNotFound to skip processing. |
/ok-to-test |
/retest |
1 similar comment
/retest |
Is there a repro and test for the issue that was hit? (How do we know this fixed the problem). |
The unit test was added for the simulation of the problem. Without the retry, the unit test hangs indefinitely and test fails with a timeout. thanks. |
/approve |
@@ -67,7 +67,7 @@ type cloudCIDRAllocator struct { | |||
|
|||
// Keep a set of nodes that are currectly being processed to avoid races in CIDR allocation | |||
lock sync.Mutex | |||
nodesInProcessing sets.String | |||
nodesInProcessing map[string]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to document
@@ -67,7 +68,7 @@ type cloudCIDRAllocator struct { | |||
|
|||
// Keep a set of nodes that are currectly being processed to avoid races in CIDR allocation | |||
lock sync.Mutex | |||
nodesInProcessing sets.String | |||
nodesInProcessing map[string]int // track number of retries per node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this
type nodeProcessingInfo {
retries int
}
map[string]nodeProcessingInfo
so it's self documenting.
return true | ||
} | ||
|
||
func (ca *cloudCIDRAllocator) retryNodeInProcessing(nodeName string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename canRetry(nodeName string)
, this doesn't actually do the retry, it's a predicate
1eb3349
to
adc71ff
Compare
good observations. updated with the changes. thanks! |
/lgtm will need backport to 1.9, 1.8 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bowei, satyasm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 60455, 61365, 61375, 61597, 61491). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Automatic merge from submit-queue. Backport Cloud CIDR allocator fixes to 1.9 **What this PR does / why we need it**: **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #61363 **Special notes for your reviewer**: Manually doing a backport as cherry-pick does not work due to package renaming. See #61375 for the change that triggered the need for the backport. **Release note**: ```release-note Backport Cloud CIDR allocator fixes to 1.9 ```
…#61124-upstream-release-1.10 Automatic merge from submit-queue. Automated cherry pick of #61375 #61124 upstream release 1.10 **What this PR does / why we need it**: Backport of stability fixes on IPAM controller to 1.10. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #61363, Fixes #61124 **Special notes for your reviewer**: These are already merged on master, backport to 1.10 branch. **Release note**: ```release-note Backport of stability fixes on IPAM controller to 1.10. ```
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #61363
Special notes for your reviewer:
Changed the tracking of nodesInProcessing from a set to map[string]int so that we can count the
number of times we re-process the node and not re-queue in case updateMaxRetries exceeded.
Release note: