Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not error on signalling a just-stopped container #14533

Merged

Conversation

mheon
Copy link
Member

@mheon mheon commented Jun 8, 2022

Previous PR #12394 tried to address this, but made a mistake: containers that have just exited do not move to the Exited state but rather the Stopped state - as such, the code would never have run (there is no way we start podman kill, and the container
transitions to Exited while we are doing it - that requires holding the container lock, which Kill already does).

Fix the code to check Stopped as well (we omit Exited entirely but it's a cheap check and our state logic could change in the
future). Also, return an error, instead of exiting cleanly - the Kill failed, after all. ErrCtrStateInvalid is already handled by the sig-proxy logic so there won't be issues.

Fixed a bug where Podman could print error messages when signals were forwarded to a container via `--sig-proxy` to a container as the container process exited.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 8, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2022
@mheon mheon force-pushed the avoid_error_on_container_stop branch from 304ae8a to bf2c0a0 Compare June 8, 2022 14:01
if ctr.state.State == define.ContainerStateExited {
return nil
if ctr.ensureState(define.ContainerStateStopped, define.ContainerStateExited) {
return define.ErrCtrStateInvalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get handled in the upstream, since previous check was no failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is handled by sig-proxy

@mheon
Copy link
Member Author

mheon commented Jun 8, 2022

Looks like an issue pulling the system test image. #14529 might help?

@TomSweeneyRedHat
Copy link
Member

LGTM
once tests are hip

Previous PR containers#12394 tried to address this, but made a mistake:
containers that have just exited do not move to the Exited state
but rather the Stopped state - as such, the code would never have
run (there is no way we start `podman kill`, and the container
transitions to Exited while we are doing it - that requires
holding the container lock, which Kill already does).

Fix the code to check Stopped as well (we omit Exited entirely
but it's a cheap check and our state logic could change in the
future). Also, return an error, instead of exiting cleanly - the
Kill failed, after all. ErrCtrStateInvalid is already handled by
the sig-proxy logic so there won't be issues.

[NO NEW TESTS NEEDED] This fixes a race that I cannot reproduce
myself, and I have no idea how we'd repro in CI.

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the avoid_error_on_container_stop branch from bf2c0a0 to c77691f Compare June 9, 2022 13:11
@mheon
Copy link
Member Author

mheon commented Jun 9, 2022

Rebased and force pushed

@mheon
Copy link
Member Author

mheon commented Jun 9, 2022

Error: copying system image from manifest list: Source image rejected: Get "https://access.redhat.com/webassets/docker/content/sigstore/ubi8-init@sha256=f6afbab2349ef86bd4ac0d59ba4b9b5df3e176b4bdeaff643c3e6386a7414c24/signature-4": remote error: tls: internal error

@edsantiago Seen this one before?

@edsantiago
Copy link
Member

edsantiago commented Jun 9, 2022

Oh yeah, known flake, mentioned in #14359 (comment) but not addressed yet

@mheon
Copy link
Member Author

mheon commented Jun 9, 2022

OK, flake, good. Last time all the integration tests failed so I assumed it was an actual infrastructure issue.

@edsantiago
Copy link
Member

Duh, wrong link #14359 (comment)

And yes, it's a flake, but I also think it's broken infrastructure. Hard to tell with "internal error"

@mheon
Copy link
Member Author

mheon commented Jun 9, 2022

Looks like this is going to go green. @containers/podman-maintainers LGTM/hold would be appreciated, would like to get a scratchbuild of this out for testing soon.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2022
@mheon
Copy link
Member Author

mheon commented Jun 9, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2022
@openshift-merge-robot openshift-merge-robot merged commit fd1d0d6 into containers:main Jun 9, 2022
@lsm5
Copy link
Member

lsm5 commented Jun 24, 2022

/cherrypick v4.0-rhel

@openshift-cherrypick-robot
Copy link
Collaborator

@lsm5: new pull request created: #14727

In response to this:

/cherrypick v4.0-rhel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants