-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init containers should not be restarted #16698
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@vrothberg Need to talk on this one. Trying a Pod YAML file like:
You see podman waiting around 30 seconds to finish the container. |
We need to keep the timeout. The timeout should only kick in if conmon has forcefully been killed, which is not the case your example. I will take a look. |
Head's up, do not merge the below diff: diff --git a/libpod/container_api.go b/libpod/container_api.go
index 9abe0a189e66..598bd98632dd 100644
--- a/libpod/container_api.go
+++ b/libpod/container_api.go
@@ -566,7 +566,7 @@ func (c *Container) WaitForExit(ctx context.Context, pollInterval time.Duration)
case <-conmonTimer.C:
logrus.Debugf("Exceeded conmon timeout waiting for container %s to exit", id)
default:
- if !c.ensureState(define.ContainerStateExited, define.ContainerStateConfigured) {
+ if !c.ensureState(define.ContainerStateExited, define.ContainerStateConfigured, define.ContainerStateStopped) {
return false, -1, nil
}
} The timer is not the problem since the loop continues to check and wait for the exit code to show up etc. until the timer kicks in (it's a non-blocking wait). Usually containers should transition from stopped to exited but that doesn't seem to be the case here. And I do not know why. We cannot merge the above as we really really really need to wait for the container to transition to exited to make sure all conmon buffers etc. are cleared - remember the Gitlab rabbit hole. So the diff above really is a symptomatic patch. What we need to figure out is: why doesn't the container transition to exited? |
Seems to be a unique fart for init containers. |
There's much more strangeness going on. Just run |
That's because we default to setting --restart=always. Not sure that makes so much sense for Podman but it's ~compatible with the kubelet. |
OK, finally got it: diff --git a/pkg/domain/infra/abi/play.go b/pkg/domain/infra/abi/play.go
index e73bf6614d60..b26f55367738 100644
--- a/pkg/domain/infra/abi/play.go
+++ b/pkg/domain/infra/abi/play.go
@@ -620,7 +620,7 @@ func (ic *ContainerEngine) playKubePod(ctx context.Context, podName string, podY
PodInfraID: podInfraID,
PodName: podName,
PodSecurityContext: podYAML.Spec.SecurityContext,
- RestartPolicy: ctrRestartPolicy,
+ RestartPolicy: define.RestartPolicyNo,
SeccompPaths: seccompPaths,
SecretsManager: secretsManager,
UserNSIsHost: p.Userns.IsHost(), The init containers were restarted always. That will have caused a race with waiting for the init container to exit. I think they should never restart, should they? |
This is causing podman to wait about 25 seconds before starting the primary container. Fixes: containers#16343 [NO NEW TESTS NEEDED] Signed-off-by: Daniel J Walsh <[email protected]>
@containers/podman-maintainers PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for it? It seems very easy to regress on the fix. What we can check for is that 1) the init container exits cleanly, and 2) that its restart counter is at 0 before running kube down
.
After the init container is run, it is removed, as far as I can see, so no way to examine it. |
That seems like something we can test for. |
changes LGTM |
This is causing podman to wait about 25 seconds before starting
the primary container.
Fixes: #16343
[NO NEW TESTS NEEDED]
Signed-off-by: Daniel J Walsh [email protected]
Does this PR introduce a user-facing change?