Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init containers should not be restarted #16698

Merged
merged 1 commit into from
Dec 9, 2022

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Nov 30, 2022

This is causing podman to wait about 25 seconds before starting
the primary container.

Fixes: #16343

[NO NEW TESTS NEEDED]

Signed-off-by: Daniel J Walsh [email protected]

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Nov 30, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 30, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2022
@rhatdan
Copy link
Member Author

rhatdan commented Nov 30, 2022

@vrothberg Need to talk on this one.

Trying a Pod YAML file like:

apiVersion: v1
kind: Pod
metadata:
  name: pod
spec:
  containers:
  - command:
    - ls
    - /dev/shm/test1
    image: docker.io/library/alpine:latest
    name: testCtr
  initContainers:
  - command:
    - touch
    - /dev/shm/test1
    image: docker.io/library/alpine:latest
    name: initCtr

You see podman waiting around 30 seconds to finish the container.

@vrothberg
Copy link
Member

We need to keep the timeout. The timeout should only kick in if conmon has forcefully been killed, which is not the case your example. I will take a look.

@vrothberg
Copy link
Member

Head's up, do not merge the below diff:

diff --git a/libpod/container_api.go b/libpod/container_api.go
index 9abe0a189e66..598bd98632dd 100644
--- a/libpod/container_api.go
+++ b/libpod/container_api.go
@@ -566,7 +566,7 @@ func (c *Container) WaitForExit(ctx context.Context, pollInterval time.Duration)
                        case <-conmonTimer.C:
                                logrus.Debugf("Exceeded conmon timeout waiting for container %s to exit", id)
                        default:
-                               if !c.ensureState(define.ContainerStateExited, define.ContainerStateConfigured) {
+                               if !c.ensureState(define.ContainerStateExited, define.ContainerStateConfigured, define.ContainerStateStopped) {
                                        return false, -1, nil
                                }
                        }

The timer is not the problem since the loop continues to check and wait for the exit code to show up etc. until the timer kicks in (it's a non-blocking wait). Usually containers should transition from stopped to exited but that doesn't seem to be the case here. And I do not know why.

We cannot merge the above as we really really really need to wait for the container to transition to exited to make sure all conmon buffers etc. are cleared - remember the Gitlab rabbit hole. So the diff above really is a symptomatic patch. What we need to figure out is: why doesn't the container transition to exited?

@vrothberg
Copy link
Member

Seems to be a unique fart for init containers.

@vrothberg
Copy link
Member

There's much more strangeness going on. Just run podman events in parallel and you'll see the testCtr being restarted all the time.

@vrothberg
Copy link
Member

vrothberg commented Dec 1, 2022

There's much more strangeness going on. Just run podman events in parallel and you'll see the testCtr being restarted all the time.

That's because we default to setting --restart=always. Not sure that makes so much sense for Podman but it's ~compatible with the kubelet.

@vrothberg
Copy link
Member

vrothberg commented Dec 1, 2022

OK, finally got it:

diff --git a/pkg/domain/infra/abi/play.go b/pkg/domain/infra/abi/play.go
index e73bf6614d60..b26f55367738 100644
--- a/pkg/domain/infra/abi/play.go
+++ b/pkg/domain/infra/abi/play.go
@@ -620,7 +620,7 @@ func (ic *ContainerEngine) playKubePod(ctx context.Context, podName string, podY
                        PodInfraID:         podInfraID,
                        PodName:            podName,
                        PodSecurityContext: podYAML.Spec.SecurityContext,
-                       RestartPolicy:      ctrRestartPolicy,
+                       RestartPolicy:      define.RestartPolicyNo,
                        SeccompPaths:       seccompPaths,
                        SecretsManager:     secretsManager,
                        UserNSIsHost:       p.Userns.IsHost(),

The init containers were restarted always. That will have caused a race with waiting for the init container to exit. I think they should never restart, should they?

This is causing podman to wait about 25 seconds before starting
the primary container.

Fixes: containers#16343

[NO NEW TESTS NEEDED]

Signed-off-by: Daniel J Walsh <[email protected]>
@rhatdan rhatdan changed the title [WIP] If conmon does not exists, we shouldn't wait. Init containers should not be restarted Dec 1, 2022
@openshift-ci openshift-ci bot added release-note-none and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Dec 1, 2022
@rhatdan
Copy link
Member Author

rhatdan commented Dec 1, 2022

@containers/podman-maintainers PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for it? It seems very easy to regress on the fix. What we can check for is that 1) the init container exits cleanly, and 2) that its restart counter is at 0 before running kube down.

@rhatdan
Copy link
Member Author

rhatdan commented Dec 3, 2022

After the init container is run, it is removed, as far as I can see, so no way to examine it.

@vrothberg
Copy link
Member

After the init container is run, it is removed, as far as I can see, so no way to examine it.

That seems like something we can test for. podman container exists $init-container should return non-0. Before, it was restarted all the time, so Podmans before this change would return 0.

@umohnani8
Copy link
Member

changes LGTM

@rhatdan rhatdan added the lgtm Indicates that a PR is ready to be merged. label Dec 9, 2022
@openshift-merge-robot openshift-merge-robot merged commit ecd33d0 into containers:main Dec 9, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 18, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

play kube: initContainer keeps restarting
4 participants