Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to sig-proxy for podman-remote #15131

Merged
merged 1 commit into from
Sep 22, 2022

Conversation

boaz0
Copy link
Collaborator

@boaz0 boaz0 commented Jul 31, 2022

closes #14707

Does this PR introduce a user-facing change?

None

@boaz0 boaz0 requested a review from mheon July 31, 2022 20:31
pkg/domain/infra/abi/terminal/sigproxy_linux.go Outdated Show resolved Hide resolved
pkg/domain/infra/tunnel/containers.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@boaz0 boaz0 requested a review from Luap99 August 4, 2022 19:02
@boaz0 boaz0 force-pushed the closes_14707 branch 2 times, most recently from c9d4365 to a65d611 Compare August 4, 2022 20:13
@TomSweeneyRedHat
Copy link
Member

Quick review and the code LGTM, but the tests aren't a bit happy.

// we terminate the proxy and let the defaults
// play out.
signal.StopCatch(sigBuffer)
if err := syscall.Kill(syscall.Getpid(), s.(syscall.Signal)); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have to happen on the server side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. @mheon am I right?

@boaz0 boaz0 force-pushed the closes_14707 branch 2 times, most recently from 086a75f to c728e66 Compare August 14, 2022 11:02
@boaz0
Copy link
Collaborator Author

boaz0 commented Aug 14, 2022

@edsantiago do you have an idea how should I test this. Should I run podman run -i nginx in the background and run kill -9 <pid>?

I saw there are some tests in https://github.com/containers/podman/blob/main/test/e2e/run_signal_test.go but I am not sure if they are related to the issue. In addition, when I run them on my local environment they fail.

//cc @TomSweeneyRedHat

@edsantiago
Copy link
Member

edsantiago commented Aug 15, 2022

The following works (says BYE) with your PR, does not work (silence) on main:

$ bin/podman-remote run --name foo $IMAGE sh -c 'trap "echo BYE;exit 0" INT;while :;do sleep 0.1;done' &
...
$ kill -INT %1
...
$ podman logs foo
BYE

Making that into a proper test -- capturing the PID (instead of %1), using randomized strings, and cleaning up -- is left as an exercise for the reader. HTH.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2022
@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 5, 2022

@edsantiago I tried for too long to make this work in the system tests but it keeps failing. When I do it manually it works but running it through bats doesn't show the same results.

@edsantiago
Copy link
Member

@boaz0 try this:

@test "podman sigproxy" {
    $PODMAN run -i --name foo $IMAGE sh -c 'trap "echo BYE;exit 0" INT;echo READY;while :;do sleep 0.1;done' &
    local kidpid=$!

    # Wait for container to appear
    local timeout=5
    while :;do
          sleep 0.5
          run_podman '?' container exists foo
          if [[ $status -eq 0 ]]; then
              break
          fi
          timeout=$((timeout - 1))
          if [[ $timeout -eq 0 ]]; then
              die "Timed out waiting for container to start"
          fi
    done

    wait_for_ready foo

    # Signal, and wait for container to exit
    kill -INT $kidpid
    local timeout=5
    while :;do
          sleep 0.5
          run_podman logs foo
          if [[ "$output" =~ BYE ]]; then
              break
          fi
          timeout=$((timeout - 1))
          if [[ $timeout -eq 0 ]]; then
              die "Timed out waiting for BYE from container"
          fi
    done

    run_podman rm -f -t0 foo
}

Quick reminder that SIGKILL cannot be handled by a receiving process. Please never use SIGKILL except as a last resort.

Another note: please squash your commits.

Note to Podman team: the above works fine with podman-remote but, on the rootless server, spits out a red warning:

2022-09-05T13:39:04.000572179Z: open pidfd: No such process

@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 6, 2022

@edsantiago you're amazing. I have no idea how you figured it out but it works.

@edsantiago
Copy link
Member

@boaz0 blush, all it is is familiarity with one obscure little fraction of code. Still, thank you!

@containers/podman-maintainers PTAL; the new imports are triggering bloat checks and Mac/Windows build failures. There might be a reason why those modules aren't imported in podman-remote :-(

@mheon
Copy link
Member

mheon commented Sep 6, 2022

Code changes LGTM. Not sure what's causing the bloat... My assumption would be the terminal package, but I don't see what in there could be so big

@vrothberg vrothberg added the bloat_approved Approve a PR in which binary file size grows by over 50k label Sep 13, 2022
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @boaz0!
LGTM
@Luap99 @rhatdan PTAL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 16, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: boaz0, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 16, 2022
@edsantiago
Copy link
Member

I restarted one flake, but am now seeing an alarming number of other test failures: ubuntu, fedora, both remote, both (mostly) hangs. I'm not going to restart those: there's something real-looking about them.

@edsantiago
Copy link
Member

Yeah... the other two tests failed, all in remote, all in different places, and most with timeouts. There's almost certainly something broken in here.

@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 20, 2022

@edsantiago - yep that doesn't look good. I am first rebasing and will try to reproduce these on my local environment.

@boaz0 boaz0 force-pushed the closes_14707 branch 2 times, most recently from a770cf6 to 3c962a3 Compare September 20, 2022 12:47
@edsantiago
Copy link
Member

Still failing in remote:

# podman-remote --url unix:/tmp/podman_tmp_VdZP run --unsetenv-all --rm quay.io/libpod/testimage:20220615 /bin/printenv
timeout: sending signal TERM to command ‘/var/tmp/go/src/github.com/containers/podman/bin/podman-remote’
timeout: sending signal KILL to command ‘/var/tmp/go/src/github.com/containers/podman/bin/podman-remote’

I have a year-and-a-half catalog of podman flakes. This error is not in my catalog. That suggests that the problem is with this PR.

@boaz0 boaz0 force-pushed the closes_14707 branch 2 times, most recently from cd9dec1 to 5ff7c88 Compare September 20, 2022 15:52
@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 20, 2022

OK I think I know what's the problem. Let's see if that fixes the problem.
Thanks for the help @edsantiago

@boaz0
Copy link
Collaborator Author

boaz0 commented Sep 20, 2022

@edsantiago looks like it fixed the problem but now Windows Cross failed. 🤔 I have a better idea.

@rhatdan
Copy link
Member

rhatdan commented Sep 22, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2022
@openshift-merge-robot openshift-merge-robot merged commit 8bf3535 into containers:main Sep 22, 2022
@edsantiago
Copy link
Member

@boaz0 thank you for your perseverance on this tricky one!

@boaz0 boaz0 deleted the closes_14707 branch September 28, 2022 05:18
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ctrl+C doesn't kill the container using podman run -i with podman machine
8 participants