Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Pod is blocking scale down because it’s not backed by a controller" but the pod is a jenkins agent running a job #5434

Closed
FranAguiar opened this issue Jan 20, 2023 · 12 comments · Fixed by #6077
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@FranAguiar
Copy link

Which component are you using?:

Cluster autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

This is my use case, jenkins agents created to run jobs cause the autoscaler rise a warning that a pod is blocking the scale down because is not backed by a controller, those agents eventually will finish the job and the autoscaler will be able to shutdown the node. Is not a big deal but is annoying to see that warning time to time and have to check if it is real or not

Describe the solution you'd like.:
An annotation like "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" to said to the autoscaler to wait or ignore the pod without controller and not shutdown the node

@FranAguiar FranAguiar added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 20, 2023
@x13n
Copy link
Member

x13n commented Jan 30, 2023

I'm not sure I follow. Do you think the issue is that the jenkins pod should not block node removal or is it just that the logs are annoying?

@FranAguiar
Copy link
Author

Hi, correct, I mean that the log is annoying. I know the pod will end its task eventually, I would like to have an option to say to the autoscaler "I know about this pod without a controller, do not warn me about it and not shutdown the node"

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 30, 2023
@darrenbeck-te
Copy link

@FranAguiar Did you find a solution to this issue?

@FranAguiar
Copy link
Author

@FranAguiar Did you find a solution to this issue?

No, we are just ignoring the warning :(

@gp187
Copy link

gp187 commented Aug 1, 2023

Is there a way to find the POD that is preventing scale down?

@lauraseidler
Copy link
Contributor

lauraseidler commented Aug 21, 2023

We have the same issue - I was expecting this warning to go away by annotating the pod as not cluster-autoscaler.kubernetes.io/safe-to-evict: "false", but this did not help.

I think the reason is that the auto-scaler first checks for a controller and only at the end checks for the annotation.

IMO, it would make sense to flip this, and check for the annotation first - that way, pods can be marked as not safe to evict no matter what, and can generate always the same warning, which will also allow to essentially "ignore" all other warnings (in this case - we know those pods don't have a controller, we don't want to even consider them further for eviction).

This would also be helpful for users of managed k8s services - e.g. we are using GKE, which flags "pod not backed by controller" as a warning, but not "pod marked as not safe to evict".

@x13n as you have commented previously - does that seem reasonable? Would a PR for that be considered?

@x13n
Copy link
Member

x13n commented Aug 23, 2023

Changing the order of conditions would only replace one warning with another in that case: the scale down is blocked, it's just now it will be because the pod has cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation.

@lauraseidler
Copy link
Contributor

Yes - which could then be more easily ignored. E.g. we are ignoring all warnings generated by this annotation, as we are adding it on purpose - we know this is not safe to evict, so we can safely ignore this warning, which would solve the "the logs are annoying" issue.

I would like to have an option to say to the autoscaler "I know about this pod without a controller, do not warn me about it and not shutdown the node"

Essentially this - yes, it would still generate a warning, but one that can be more easily ignored. Hard to ignore all "pod doesn't have controller" warnings, easier for all "this pod has been explicitly marked as not safe to evict" warnings.

@x13n
Copy link
Member

x13n commented Aug 28, 2023

Ah, I see, thanks for clarifying. That makes sense to me - feel free to assign me a PR if you'd like to change the ordering of these conditions.

@lauraseidler
Copy link
Contributor

@x13n thanks! See the linked PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants