-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod Scheduling Readiness #3521
Comments
/sig scheduling |
/label lead-opted-in |
/milestone v1.26 |
Hey @kerthcet 👋, 1.26 Enhancements team here! Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022. This enhancement is targeting for stage Here's where this enhancement currently stands:
For this KEP, we would need to:
The status of this enhancement is marked as |
Thanks @Atharva-Shinde.
|
Hello @Huang-Wei 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze. Please plan to get the PR #3522 merged before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow For note, the current status of the enhancement is marked |
Thanks for the reminder. It's 99% accomplished atm, just some final comments waiting for the approver to +1. |
With #3522 merged, we have this marked as |
Yes, it can, but you might be breaking a few first-party and third-party controllers that assume that this label matches the nodeName or at least that it is unique. The label is documented as well-known, so it should be treated with care https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetesiohostname |
A good example of how …
status:
phase: Failed
…
message: 'Pod was rejected: Node didn''t have enough resource: cpu, requested: 400000000, used: 400038007, capacity: 159500'
reason: OutOfcpu
…
containerStatuses:
- name: nginx
state:
terminated:
exitCode: 137
reason: ContainerStatusUnknown
message: The container could not be located when the pod was terminated
…
image: 'nginx:1.14.2'
started: false To some extent I do not know the current state, but I do wonder - if we are not already doing this today - a Pod with scheduling gates and spec.nodeName should be rejected at admission time. |
@fabiand Yes, it will be rejected at the admission. |
I share that it's a general problem, but due to the special handling of
I do fear that - in your example - kubectl debug or oc debug should change and use affinity instead. The core problem is that kubelet starts to react once nodeName is set. Was it considered to change kubelet to only start acting once nodeName is set and schedulingGates is empty? |
According to the version skew policy, the change would have to be in the kubelet for 3 versions before we can relax the validation in apiserver. I guess that could be backwards compatible if we start in 1.31 and we allow scheduling_gates + nodeName in 1.34.
IMO, that falls under the special case where it might make sense to skip scheduler or an external quota system. You probably wouldn't even want to set requests in a debug pod. |
FWIW - I do wonder if debug pods should actually be guaranteed. I had a couple of cases where debug pods (as best effort) got killed quickly on busy nodes. |
that seems like making the problem and confusion around use of spec.nodeName worse to me... I don't see a compelling reason to do that |
TBH, I still try to understand how skipping the scheduler is ever helpful (when you're not using a custom scheduler).
While this might be correct, the question to me is who makes the decision. Granting a user a knob to skip quota mechanisms feels to me like granting a linux user to bypass permission checks when writing to a file. In both cases the whole idea is to restrict the users and enforce them to comply to a certain policy. Handing the user the possibility to bypass such mechanisms seems entirely contradicting to me and de-facto it makes external quota mechanisms unpractical.
Are you open to discussion on that? This way we can avoid breaking backward compatibility, support external quota mechanisms and extend scheduling-gates in a consistent matter, which IMHO makes the exceptional nodeName case to be less exceptional. |
Not to me... expecting pods which are already assigned to a node to run through scheduling gate removal phases (which couldn't do anything to affect the selected node) seems more confusing than the current behavior which simply forbids that combination. I don't think we should relax validation there and increase confusion. |
I (sadly) concur - If the |
Fun fact: I created a deployment with pod template that had IOW: I wonder if this |
The reason Although it might be achievable with tolerations as well. |
Hey folks 👋, 1.31 Enhancements Lead here. Since this KEP graduated to GA in the v1.30 release, please update the status field in the kep.yaml file here to /remove-label lead-opted-in |
@sreeram-venkitesh the status field of 3521-pod-scheduling-readiness/kep.yaml has been updated to |
@Huang-Wei Sorry, my mistake! It should be |
@sreeram-venkitesh feel free to close this issue. |
@Huang-Wei Sorry for the confusion, we still need to update the status to field to |
Sure, #4686 is gonna update it. |
Ah I missed it, thanks for linking it! 🙌🏼 |
Enhancement Description
schedulingGates
field to the Pod spec that marks a Pod's scheduling readiness.Alpha
scheduler_pending_pods
metric kubernetes#113946Beta
Stable
Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
The text was updated successfully, but these errors were encountered: