Incorrect polling logic in handleRebootUncordon for Scheduled Event Draining #1059
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes:
I identified an issue with the polling logic in the
handleRebootUncordon
function within the IMDS mode of NTH. Currently, the polling continues regardless of whether the uncordon checks and requests succeed, because the wrapped function always returnsfalse
. This behavior results in repeated retries until the context timeout is reached, ultimately causing a misleading "context deadline exceeded" error.Here is the error message that is always displayed:
I’ve modified the wrapper function to return
true
whenhandleRebootUncordon
completes successfully without errors. This ensures that polling stops once all calls to the Kubernetes API have been successful, preventing unnecessary retries and providing clearer logs. This should help avoiding confusion.For reference, this code utilizes the
PollUntilContextCancel
function from thek8s.io/apimachinery
package. According to the documentation:You can find the full documentation here.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.