Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-run initContainers in a Deployment when containers exit on error #3676

Open
4 tasks
szh opened this issue Dec 6, 2022 · 16 comments
Open
4 tasks

Re-run initContainers in a Deployment when containers exit on error #3676

szh opened this issue Dec 6, 2022 · 16 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@szh
Copy link

szh commented Dec 6, 2022

I'm copying this issue from kubernetes/kubernetes#52345 because it seems that this is the appropriate repo for it.

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Container in a Deployment exits on error, container is restarted without first re-running the initContainer.

What you expected to happen: Container in a Deployment exits on error, initContainer is re-run before restarting the container.

How to reproduce it (as minimally and precisely as possible):

Sample spec:

kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
  name: "test"
  labels:
    name: "test"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "test"
  template:
    metadata:
      name: "test"
      labels:
        name: "test"
    spec:
      initContainers:
        - name: sleep
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - 1s
      containers:
        - name: test
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - exit 1

Implementation Context:

I have an initContainer that waits for a service running in Kubernetes to detect its existence via pod annotations, and send it an HTTP request, upon which it writes this value to disk. The main container then reads this value upon startup and "unwraps" it via another service, upon which it stores the unwrapped value in memory.

The value that is written to disk by the initContainer is a one-time read value, in that once it is used the value is then expired. The problem is that if the main container ever restarts due to fatal error, it loses that unwrapped value and upon startup tries to unwrap the expired value again, leading to an infinite crashing loop until I manually delete the pod, upon which a new pod is created, the initContainer runs, and all is again well.

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

Enhancement Description

  • One-line enhancement description (can be used as a release note):
  • Kubernetes Enhancement Proposal:
  • Discussion Link:
  • Primary contact (assignee):
  • Responsible SIGs:
  • Enhancement target (which target equals to which milestone):
    • Alpha release target (x.y):
    • Beta release target (x.y):
    • Stable release target (x.y):
  • Alpha
    • KEP (k/enhancements) update PR(s):
    • Code (k/k) update PR(s):
    • Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 6, 2022
@szh
Copy link
Author

szh commented Dec 6, 2022

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 6, 2022
@thockin
Copy link
Member

thockin commented Dec 13, 2022

This is a challenging use-case. How do you trigger this if your app has 2 containers? What if one of them is a sidecar that you (the pod author) don't really know about or control?

It seems to me that initContainer (as defined today) is a poor fit here - your app startup could either do this itself or you can wrap it in another tool/script that does the unwrap and then starts your app. That answer is, itself, somewhat unsatisfying because it means you can't decouple those ideas or those container images or credentials/permissions.

@SergeyKanzhelev since "keystone" came up in the sidecar discussion too - this is what I really meant when we started the idea. It doesn't mean "this is an app" vs "this is a sidecar" - it means "if this one goes down, everything goes down" Most pods would not use this feature at all, but those who need it KNOW they need it.

@jpbetz since you're looking at the lifecycle stuff, too.

@sftim
Copy link
Contributor

sftim commented Dec 22, 2022

@thockin What you term “keystone” containers, I've heard named “essential” (eg in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions)

@thockin
Copy link
Member

thockin commented Dec 22, 2022 via email

@jpbetz
Copy link
Contributor

jpbetz commented Jan 13, 2023

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

This is the direction I started thinking when I saw this issue. I agree with @thockin that the initContainers are a poor fit. initContainers are containers that initialize the pod and they do exactly that.

Say it was possible to define a Deployment with a restartPolicy=Never pod (today it can only be Always). That would make the desired pod lifecycle clear for this "initContainer initializes a one-time read value" case-- if the main container fails terminate the pod and create a new one to replace it. But would have the major downside of requiring a new pod be scheduled each time the main container failed. That's probably not what most people would want?

One alternative would be a sidecar that can produce a "one-time read value". Each time the main container starts, it retrieves a new "one-time read value" from the sidecar. It would then be possible to have a simple process in the main container that retrieves the "one-time read value", writes it to the appropriate location on disk and then starts the main process for the container.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2023
@Ugzuzg
Copy link

Ugzuzg commented Apr 14, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2023
@SergeyKanzhelev
Copy link
Member

@Ugzuzg do you plan to work on this for 1.28? I see you removed the stale lifecycle.

@bzhang-liveperson
Copy link

Wondering if this can make it into 1.29?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024
@bzhang-liveperson
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2024
@objnf-dev
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 25, 2024
@thockin thockin removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 10, 2024
@haircommander haircommander moved this from Draft Stage to Not for release in SIG Node 1.32 KEPs planning Sep 30, 2024
@NeckBeardPrince
Copy link

+Bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
No open projects
Status: Not for release
Development

No branches or pull requests