Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: show command and output when a timeout happens #15021

Merged
merged 1 commit into from
Jul 21, 2022

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Jul 21, 2022

To make debugging easier we should see the command and its output when a
failure happens.

Does this PR introduce a user-facing change?

None

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2022
@edsantiago
Copy link
Member

That was the first thing I tried yesterday (except for the s.Command.Args, which I did not know about, thank you!)

Unfortunately, in my testing, it appears that the right-hand side is evaluated at call time, not fail time, therefore stdout and stderr are blank. I ended up solving it by using a loop (this did not seem like thread-critical code) and that actually worked in my testing but does not give anything useful in the containerized-checkpoint test timeouts. Possibly I need to kill the process so its stdout/err get flushed?

Anyhow, try testing this by running a podman command like run alpine top -b, and WaitWithTimeout(10), and see if you get any output.

To make debugging easier we should see the command and its output when a
failure happens.

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 force-pushed the e2e-timeout-output branch from 5dc2e80 to 3a016c6 Compare July 21, 2022 12:43
@Luap99
Copy link
Member Author

Luap99 commented Jul 21, 2022

@edsantiago Ok I tried this locally and now it should work, we have to change it to a function

@Luap99
Copy link
Member Author

Luap99 commented Jul 21, 2022

Test output:

  Timed out after 5.000s.
  command [/home/pholzing/go/src/github.com/containers/podman/bin/podman --storage-opt vfs.imagestore=/tmp/podman/imagecachedir --root /tmp/podman_test1969136265/root --runroot /tmp/podman_test1969136265/runroot --runtime crun --conmon /usr/bin/conmon --network-config-dir /home/pholzing/.config/cni/net.d --cgroup-manager cgroupfs --tmpdir /tmp/podman_test1969136265 --events-backend file --network-backend cni --storage-driver vfs run quay.io/libpod/alpine:latest top -b] timed out
  STDOUT: Mem: 28031604K used, 4489828K free, 5996084K shrd, 4200K buff, 19537524K cached
  CPU:   5% usr   0% sys   0% nic  93% idle   0% io   0% irq   0% sirq
  Load average: 2.06 1.76 1.64 3/1945 1
    PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
      1     0 root     R     1560   0%   6   0% top -b
  
  STDERR: 
  Expected process to exit.  It did not.

@Luap99
Copy link
Member Author

Luap99 commented Jul 21, 2022

Tests are green. I like this better than your fix, @edsantiago WDYT?

@edsantiago
Copy link
Member

This is 1E6 times better than mine, thank you! One quibble: I would prefer:

- Command %v timed out
+ Command timed out after %ds: %v

(i.e., include timeout value, and show command after, not before, to make 'Command timed out' a searchable in-page string).

I'm happy to merge this now and address those changes in my neverending #14972 if that's OK with you. Feel free to unhold but continue discussion here.

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2022
@Luap99
Copy link
Member Author

Luap99 commented Jul 21, 2022

/hold cancel

Fell free to make those changes in your PR

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2022
@openshift-merge-robot openshift-merge-robot merged commit 04ed519 into containers:main Jul 21, 2022
@Luap99 Luap99 deleted the e2e-timeout-output branch July 21, 2022 14:12
edsantiago added a commit to edsantiago/libpod that referenced this pull request Jul 22, 2022
...and enable the at-test-time confirmation, the one that
double-checks that if CI requests runc we actually use runc.
This exposed a nasty surprise in our setup: there are steps to
define $OCI_RUNTIME, but that's actually a total fakeout!
OCI_RUNTIME is used only in e2e tests, it has no effect
whatsoever on actual podman itself as invoked via command
line such as in system tests. Solution: use containers.conf

Given how fragile all this runtime stuff is, I've also added
new tests (e2e and system) that will check $CI_DESIRED_RUNTIME.

Image source: containers/automation_images#146

Since we haven't actually been testing with runc, we need
to fix a few tests:

  - handle an error-message change (make it work in both crun and runc)
  - skip one system test, "survive service stop", that doesn't
    work with runc and I don't think we care.

...and skip a bunch, filing issues for each:

  - containers#15013 pod create --share-parent
  - containers#15014 timeout in dd
  - containers#15015 checkpoint tests time out under $CONTAINER
  - containers#15017 networking timeout with registry
  - containers#15018 restore --pod gripes about missing --pod
  - containers#15025 run --uidmap broken
  - containers#15027 pod inspect cgrouppath broken
  - ...and a bunch more ("podman pause") that probably don't
    even merit filing an issue.

Also, use /dev/urandom in one test (was: /dev/random) because
the test is timing out and /dev/urandom does not block. (But
the test is still timing out anyway, even with this change)

Also, as part of the VM switch we are now using go 1.18 (up
from 1.17) and this broke the gitlab tests. Thanks to @Luap99
for a quick fix.

Also, slight tweak to containers#15021: include the timeout value, and
reword message so command string is at end.

Also, fixed a misspelling in a test name.

Fixes: containers#14833

Signed-off-by: Ed Santiago <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants