Drain Timeout is not respected during client side throttling #785
Labels
area/robustness
Robustness, reliability, resilience related
effort/2w
Effort for issue is around 2 weeks
kind/bug
Bug
lifecycle/stale
Nobody worked on this for 6 months (will further age)
needs/planning
Needs (more) planning with other MCM maintainers
priority/2
Priority (lower number equals higher priority)
size/m
Size of pull request is medium (see gardener-robot robot/bots/size.py)
status/closed
Issue is closed (either delivered or triaged)
How to categorize this issue?
/area robustness
/kind bug
/priority 2
What happened:
A case was seen where drain timeout was set to 2hrs but the drain ended up going till 11hrs.
If drainTimeout is 2hrs, then using podEvictionInterval(20sec) , we calculate
maxEvictRetries
as 360.In case of client side thottling , the evict call takes huge time and thus the interval b/w 2 pod eviction requests becomes more than 100sec also.
Since currently we don't have context based cancellation for
evictPodWithoutPVInternal
, we rely just onmaxEvictRetries
to exhaust to come out of the loop. This can happen in the case where pod eviction runs into a timeout.This leads to 11hrs or more of drain, and no force delete of the machine is done.
What you expected to happen:
Drain timeout to be respected in every situtation.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Context based cancellation should be used effectively.
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: