KEP: New pod restartPolicy to restart the whole pod instead of just a container #2342

amshuman-kr · 2018-07-04T05:56:14Z

A KEP for the PR that addresses the kubernetes issue #52345.

Instead of just a container.

amshuman-kr · 2018-07-04T06:00:33Z

/sig node
/sig apps

idealhack · 2018-07-04T07:24:54Z

/ok-to-test

idealhack · 2018-07-04T07:42:48Z

We have a wrong suggestion in the verify test.

client9/misspell#136

thockin

Do you really want to restart all containers when any container exits?

We've discussed in the past the idea of a "keystone" or "primary" container which would govern the whole pod's lifecycle. For restart-all pods, this would be the one that would trigger the teardown. For jobs, this would be the one that exits and everything else can be assume to be complete.

@yujuhong @kow3ns @erictune

@smarterclayton another case of desire for sub-pod orchestration

thockin · 2018-07-06T04:10:23Z

keps/sig-node/draft-20180702-pod-restartpolicy-to-restart-whole-pod.md

+  - "@smarterclayton"
+approvers:
+  - "@liggitt"
+  - "@derekwaynecarr"


amshuman-kr · 2018-07-06T04:16:25Z

@thockin No. I agree, it may not be desirable to restart the pod if any container exits. I like the keystone or primary container approach. It would be better than my proposal here. Is there any consensus on this approach?

erictune · 2018-07-09T16:55:55Z

I definitely see a need for "exit all containers if primary container exits: a Job with a sidecar. So, that would be another reason to define a primary.

…

On Thu, Jul 5, 2018 at 9:16 PM Amshuman K R ***@***.***> wrote: @thockin <https://github.com/thockin> No. I agree, it may not be desirable to restart the pod if any container exits. I like the *keystone* or *primary* container approach. It would be better than my proposal here. Is there any consensus on this approach? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2342 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHuudgKb1hT4Nt6VLOJ9GKrdxGgY2xIeks5uDuSngaJpZM4VB_oB> .

amshuman-kr · 2018-07-10T05:36:33Z

@erictune Agreed. Is #25908 the right place to track this?

keps/sig-node/draft-20180702-pod-restartpolicy-to-restart-whole-pod.md

k8s-ci-robot · 2018-07-17T03:34:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: dchen1107

If they are not already assigned, you can assign the PR to them by writing /assign @dchen1107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

amshuman-kr · 2018-07-17T03:48:31Z

This KEP seems to be the relevant one for primary/sidecar container feature.

amshuman-kr · 2018-08-02T08:32:03Z

@thockin @erictune I raised this requirement of re-executing init-containers when the primary or any non-sidecar container terminates on KEP 0008. But from the response I understand that such a requirement is currently not considered there.

What would you recommend? Is there any chance this requirement can be added to KEP 0008? Or should I update this KEP as an additional requirement on KEP 0008?

justaugustus · 2018-10-13T03:23:33Z

/kind kep

mattfarina

/cc @kow3ns

mattfarina · 2018-10-31T19:56:24Z

keps/sig-node/draft-20180702-pod-restartpolicy-to-restart-whole-pod.md

@@ -0,0 +1,125 @@
+---
+kep-number: 0


Can you please update the KEP to the latest and increment that file?

@mattfarina Thanks for the review!

Actually, I am in favour of closing this KEP if KEP 0008 can be expanded to include the case of RestartPolicyAlways.

That is, if there are some sidecar containers and one or more primary containers and the restartPolicy is Always then if any of the primary containers terminate then the whole pod is restarted; not just the terminated container.

If the above requirement can be included in the scope of KEP 0008 then this KEP would be redundant.

mattfarina · 2018-10-31T20:01:37Z

@derekwaynecarr @dchen1107 out of curiosity, where does this stand in sig-node right now?

mattfarina · 2018-11-02T19:53:28Z

FYI, We've added this to the agenda for Mondays SIG Apps to discuss.

mattfarina · 2018-11-05T20:40:06Z

keps/sig-node/draft-20180702-pod-restartpolicy-to-restart-whole-pod.md

+
+But both the `OnFailure` as well as the `Always` restart policies restart the individual containers in question and not the whole pod. This is, for the most part, desirable, even optimal.
+
+However, there are scenarios (some documented in [this issue][issue]) where the many containers in the pod (including init containers) might be interlinked or inter-dependent in such a way as to require closer co-ordination when any one of its containers are restarted.


@amshuman-kr We discussed this in SIG Apps this week. The motivation for adding this was not completely understood. Could you please expand on user stories (see the template) that outline situations that this would be used for. This will help others understand the need and now this can help with it.

justaugustus · 2018-11-20T04:47:39Z

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

smarterclayton · 2018-11-20T06:16:39Z

keps/sig-node/draft-20180702-pod-restartpolicy-to-restart-whole-pod.md

+
+### Risks and Mitigations
+
+The `restartPolicy` or `AlwaysPod` would be a new value for an existing field in the pod specefication. So, the question of backward compatibility may not apply.


This would not be backwards compatible for old kubelets, so we can’t change restartPolicy in v1 of the api. Something like “restartAffects: Pod” or a lifecycle hook would be necessary. It’s really orthogonal to restart policy and could be used with OnFailure or Always.

justaugustus · 2018-12-01T08:05:04Z

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

k8s-ci-robot · 2018-12-01T08:05:05Z

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

RiRa12621 · 2019-04-04T14:49:45Z

Just to clarify: this PR was simply closed and not moved anywhere?
Because to me this seems still relevant and I would like to follow up.

adamzr · 2020-03-12T22:27:04Z

I have a use case where the initContainer creates a one time use token that the application in the regular container consumes at startup. If the regular container is restarted it needs the initContainer to restart as well in order to generate a new token. This could solve my problem.

@mattfarina How I can I get this issue re-visited?

thockin · 2020-03-12T22:58:02Z

This happens to line up with the larger topic of pod and container lifecycle. If we are to pursue this idea we need a shepherd from sig-node who can make sure the overall lifecycle is considered.

@derekwaynecarr

dimitriosstander · 2021-10-15T09:58:24Z

I have a use case of a statefulset that when one containers fails it takes hours to become healthy again but if I restart the whole pod on failure it recovers within minutes.

tulsluper · 2021-12-01T11:43:05Z

Restart pod if a container fails would be helpful in case if something wrong on the Kubernetes level, not an application.

For example - kubernetes/kubernetes#105933

kubelet/csi_driver cannot check SELinux support;
pod starts without selinuxLabel;
init container cannot perform chmod command;
the container continuously restarts;
Pod really cannot work until it is terminated, and a new one starts with selinuxLabel.

But, despite the nature of the case, users are more likely to restart the pod than look for how to solve the problem in the configuration or application, that is not good.

szh · 2022-12-02T18:54:47Z

@amshuman-kr do you want to open an issue for this in k/enhancements?

amshuman-kr · 2022-12-03T05:23:16Z

@szh The discussions on container life-cycle enhancement were too fragmented and unfortunately, I didn't have the bandwidth to follow up across various semi-related discussions. Hence, I have abandoned this enhancement.

szh · 2022-12-06T14:21:23Z

OK. I just added kubernetes/enhancements#3676 to track it.

i344628 added 5 commits July 3, 2018 15:55

New pod restartPolicy to restart the whole pod.

c1ac921

Instead of just a container.

Fixed links.

e51a536

Fixed one more link.

585819c

Added links to current kubelet codebase where changes might be required.

e0d9c96

Added link to the PR.

2cdd8fa

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 4, 2018

k8s-ci-robot requested review from dchen1107 and derekwaynecarr July 4, 2018 05:56

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jul 4, 2018

k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jul 4, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 4, 2018

Fixed the typos.

86cf71b

amshuman-kr mentioned this pull request Jul 4, 2018

[WIP] Introducing RestartPolicy "AlwaysPod" kubernetes/kubernetes#65619

Closed

i344628 added 2 commits July 4, 2018 14:46

Added non-goals and alternatives.

7324579

Fixed a typo.

17ac54d

thockin reviewed Jul 6, 2018

View reviewed changes

jdumars mentioned this pull request Jul 12, 2018

KEP #0(dup): New pod restartPolicy to restart the whole pod instead of just a container kubernetes-retired/architecture-tracking#35

Closed

msohn reviewed Jul 16, 2018

View reviewed changes

keps/sig-node/draft-20180702-pod-restartpolicy-to-restart-whole-pod.md Outdated Show resolved Hide resolved

Fixed a typo.

e909b24

amshuman-kr mentioned this pull request Jul 17, 2018

kep-sidecar-containers #2148

Merged

k8s-ci-robot added the kind/kep label Oct 13, 2018

thockin changed the title ~~New pod restartPolicy to restart the whole pod instead of just a container.~~ KEP: New pod restartPolicy to restart the whole pod instead of just a container Oct 18, 2018

mattfarina reviewed Oct 31, 2018

View reviewed changes

k8s-ci-robot requested a review from kow3ns October 31, 2018 20:00

mattfarina reviewed Nov 5, 2018

View reviewed changes

smarterclayton reviewed Nov 20, 2018

View reviewed changes

k8s-ci-robot closed this Dec 1, 2018

JacobHenner mentioned this pull request May 1, 2019

Sidecar Containers kubernetes/enhancements#753

Open

15 tasks

thockin mentioned this pull request Jul 1, 2020

KEP for critical container feature kubernetes/enhancements#912

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP: New pod restartPolicy to restart the whole pod instead of just a container #2342

KEP: New pod restartPolicy to restart the whole pod instead of just a container #2342

amshuman-kr commented Jul 4, 2018

amshuman-kr commented Jul 4, 2018

idealhack commented Jul 4, 2018

idealhack commented Jul 4, 2018

thockin left a comment

thockin Jul 6, 2018

amshuman-kr commented Jul 6, 2018

erictune commented Jul 9, 2018 via email

amshuman-kr commented Jul 10, 2018 •

edited

Loading

k8s-ci-robot commented Jul 17, 2018

amshuman-kr commented Jul 17, 2018

amshuman-kr commented Aug 2, 2018

justaugustus commented Oct 13, 2018

mattfarina left a comment

mattfarina Oct 31, 2018

amshuman-kr Nov 13, 2018

mattfarina commented Oct 31, 2018

mattfarina commented Nov 2, 2018

mattfarina Nov 5, 2018

justaugustus commented Nov 20, 2018

smarterclayton Nov 20, 2018

justaugustus commented Dec 1, 2018

k8s-ci-robot commented Dec 1, 2018

RiRa12621 commented Apr 4, 2019 •

edited

Loading

adamzr commented Mar 12, 2020

thockin commented Mar 12, 2020

dimitriosstander commented Oct 15, 2021

tulsluper commented Dec 1, 2021

szh commented Dec 2, 2022

amshuman-kr commented Dec 3, 2022

szh commented Dec 6, 2022


		But both the `OnFailure` as well as the `Always` restart policies restart the individual containers in question and not the whole pod. This is, for the most part, desirable, even optimal.

		However, there are scenarios (some documented in [this issue][issue]) where the many containers in the pod (including init containers) might be interlinked or inter-dependent in such a way as to require closer co-ordination when any one of its containers are restarted.


		### Risks and Mitigations

		The `restartPolicy` or `AlwaysPod` would be a new value for an existing field in the pod specefication. So, the question of backward compatibility may not apply.

		@@ -0,0 +1,125 @@
		---
		kep-number: 0

KEP: New pod restartPolicy to restart the whole pod instead of just a container #2342

KEP: New pod restartPolicy to restart the whole pod instead of just a container #2342

Conversation

amshuman-kr commented Jul 4, 2018

amshuman-kr commented Jul 4, 2018

idealhack commented Jul 4, 2018

idealhack commented Jul 4, 2018

thockin left a comment

Choose a reason for hiding this comment

thockin Jul 6, 2018

Choose a reason for hiding this comment

amshuman-kr commented Jul 6, 2018

erictune commented Jul 9, 2018 via email

amshuman-kr commented Jul 10, 2018 • edited Loading

k8s-ci-robot commented Jul 17, 2018

amshuman-kr commented Jul 17, 2018

amshuman-kr commented Aug 2, 2018

justaugustus commented Oct 13, 2018

mattfarina left a comment

Choose a reason for hiding this comment

mattfarina Oct 31, 2018

Choose a reason for hiding this comment

amshuman-kr Nov 13, 2018

Choose a reason for hiding this comment

mattfarina commented Oct 31, 2018

mattfarina commented Nov 2, 2018

mattfarina Nov 5, 2018

Choose a reason for hiding this comment

justaugustus commented Nov 20, 2018

smarterclayton Nov 20, 2018

Choose a reason for hiding this comment

justaugustus commented Dec 1, 2018

k8s-ci-robot commented Dec 1, 2018

RiRa12621 commented Apr 4, 2019 • edited Loading

adamzr commented Mar 12, 2020

thockin commented Mar 12, 2020

dimitriosstander commented Oct 15, 2021

tulsluper commented Dec 1, 2021

szh commented Dec 2, 2022

amshuman-kr commented Dec 3, 2022

szh commented Dec 6, 2022

amshuman-kr commented Jul 10, 2018 •

edited

Loading

RiRa12621 commented Apr 4, 2019 •

edited

Loading