Drain Timeout is not respected during client side throttling #785

himanshu-kun · 2023-02-20T09:44:17Z

How to categorize this issue?

/area robustness
/kind bug
/priority 2

What happened:
A case was seen where drain timeout was set to 2hrs but the drain ended up going till 11hrs.
If drainTimeout is 2hrs, then using podEvictionInterval(20sec) , we calculate maxEvictRetries as 360.
In case of client side thottling , the evict call takes huge time and thus the interval b/w 2 pod eviction requests becomes more than 100sec also.
Since currently we don't have context based cancellation for evictPodWithoutPVInternal , we rely just on maxEvictRetries to exhaust to come out of the loop. This can happen in the case where pod eviction runs into a timeout.
This leads to 11hrs or more of drain, and no force delete of the machine is done.

What you expected to happen:
Drain timeout to be respected in every situtation.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Context based cancellation should be used effectively.

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
Others:

The text was updated successfully, but these errors were encountered:

gardener-robot · 2023-02-20T09:44:23Z

@himanshu-kun Label area/todo does not exist.

gardener-robot · 2023-02-20T09:46:18Z

@elankath Label area/todo does not exist.

himanshu-kun added the kind/bug Bug label Feb 20, 2023

gardener-robot added the priority/2 Priority (lower number equals higher priority) label Feb 20, 2023

himanshu-kun added priority/1 Priority (lower number equals higher priority) and removed priority/2 Priority (lower number equals higher priority) labels Feb 20, 2023

gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/1 Priority (lower number equals higher priority) labels Feb 20, 2023

elankath added the area/robustness Robustness, reliability, resilience related label Feb 20, 2023

himanshu-kun added priority/1 Priority (lower number equals higher priority) needs/planning Needs (more) planning with other MCM maintainers and removed priority/2 Priority (lower number equals higher priority) labels Feb 20, 2023

himanshu-kun added the needs/planning Needs (more) planning with other MCM maintainers label Feb 24, 2023

gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 3, 2023

sssash18 self-assigned this Jun 24, 2024

sssash18 mentioned this issue Jul 2, 2024

Improved context with timeout for DrainNode #920

Merged

rishabh-11 closed this as completed in #920 Jul 10, 2024

gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jul 10, 2024

rishabh-11 mentioned this issue Oct 4, 2024

Scalability issues with MCM #943

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain Timeout is not respected during client side throttling #785

Drain Timeout is not respected during client side throttling #785

himanshu-kun commented Feb 20, 2023 •

edited

Loading

gardener-robot commented Feb 20, 2023

gardener-robot commented Feb 20, 2023

Drain Timeout is not respected during client side throttling #785

Drain Timeout is not respected during client side throttling #785

Comments

himanshu-kun commented Feb 20, 2023 • edited Loading

gardener-robot commented Feb 20, 2023

gardener-robot commented Feb 20, 2023

himanshu-kun commented Feb 20, 2023 •

edited

Loading