Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain Timeout is not respected during client side throttling #785

Closed
himanshu-kun opened this issue Feb 20, 2023 · 2 comments · Fixed by #920
Closed

Drain Timeout is not respected during client side throttling #785

himanshu-kun opened this issue Feb 20, 2023 · 2 comments · Fixed by #920
Assignees
Labels
area/robustness Robustness, reliability, resilience related effort/2w Effort for issue is around 2 weeks kind/bug Bug lifecycle/stale Nobody worked on this for 6 months (will further age) needs/planning Needs (more) planning with other MCM maintainers priority/2 Priority (lower number equals higher priority) size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)

Comments

@himanshu-kun
Copy link
Contributor

himanshu-kun commented Feb 20, 2023

How to categorize this issue?

/area robustness
/kind bug
/priority 2

What happened:
A case was seen where drain timeout was set to 2hrs but the drain ended up going till 11hrs.
If drainTimeout is 2hrs, then using podEvictionInterval(20sec) , we calculate maxEvictRetries as 360.
In case of client side thottling , the evict call takes huge time and thus the interval b/w 2 pod eviction requests becomes more than 100sec also.
Since currently we don't have context based cancellation for evictPodWithoutPVInternal , we rely just on maxEvictRetries to exhaust to come out of the loop. This can happen in the case where pod eviction runs into a timeout.
This leads to 11hrs or more of drain, and no force delete of the machine is done.

What you expected to happen:
Drain timeout to be respected in every situtation.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Context based cancellation should be used effectively.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:
@gardener-robot gardener-robot added the priority/2 Priority (lower number equals higher priority) label Feb 20, 2023
@gardener-robot
Copy link

@himanshu-kun Label area/todo does not exist.

@himanshu-kun himanshu-kun added priority/1 Priority (lower number equals higher priority) and removed priority/2 Priority (lower number equals higher priority) labels Feb 20, 2023
@gardener-robot gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/1 Priority (lower number equals higher priority) labels Feb 20, 2023
@gardener-robot
Copy link

@elankath Label area/todo does not exist.

@elankath elankath added the area/robustness Robustness, reliability, resilience related label Feb 20, 2023
@himanshu-kun himanshu-kun added priority/1 Priority (lower number equals higher priority) needs/planning Needs (more) planning with other MCM maintainers and removed priority/2 Priority (lower number equals higher priority) labels Feb 20, 2023
@elankath elankath added size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) priority/2 Priority (lower number equals higher priority) effort/2w Effort for issue is around 2 weeks and removed priority/1 Priority (lower number equals higher priority) needs/planning Needs (more) planning with other MCM maintainers labels Feb 20, 2023
@himanshu-kun himanshu-kun added the needs/planning Needs (more) planning with other MCM maintainers label Feb 24, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 3, 2023
@sssash18 sssash18 self-assigned this Jun 24, 2024
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness Robustness, reliability, resilience related effort/2w Effort for issue is around 2 weeks kind/bug Bug lifecycle/stale Nobody worked on this for 6 months (will further age) needs/planning Needs (more) planning with other MCM maintainers priority/2 Priority (lower number equals higher priority) size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants