test/e2e: check for stderr errors in cleanup() #18442

Luap99 · 2023-05-03T14:22:26Z

Note I really do not expect this to pass, I only submit in order to gather some data and see if this shows us more bugs. In the long term something like this should be included. There are many code paths which only do logrus but still exit 0 so this should catch more bugs.

Does this PR introduce a user-facing change?

None

openshift-ci · 2023-05-03T14:22:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Luap99 · 2023-05-03T14:23:29Z

@edsantiago WDYT? I don't expect this to work without further tweeks but I think this important to catch more bugs and flakes.

edsantiago · 2023-05-03T14:31:15Z

I really like it! I think it might be too soon to put this in production (I've tried before and it has been impossible), but maybe once unlinkat/EBUSY gets fixed we can push this, thank you!

Luap99 · 2023-05-03T15:58:03Z

Looks actually doable. Between 10-20 failures depending on which test you look, and yes several flakes as well.
Can you check this against your know flakes? Just to see what new failures we see here.

edsantiago · 2023-05-03T18:53:37Z

Yow. Everything looks new. Since the hard failures seem consistent, I'm treating those as SEP, and am concentrating on looking at flakes only. The open pidfd one is nefarious, it has cropped up repeatedly since 2020 but never well enough to file an issue for. Here, I see it in f37 root and actually almost all the tests. I will file an issue for it later. IPAM/ips is a new one, will look closer at it tonight. And, sigh, unmount NS is back - I had hoped those would be gone with the ginkgov2 system reset work.

Luap99 · 2023-05-04T08:33:51Z

And, sigh, unmount NS is back

It may be back but this a completely different failure, no such file or directory vs invalid fs magic. From the other error report there and all the other errors in these tests it is pretty safe to say that we are trying to Cleanup() twice.

Luap99 · 2023-05-04T11:10:23Z

Filled #18460, trivial to reproduce. Still going through the logs to look for other hard failures.

edsantiago · 2023-05-04T14:38:22Z

Much better! Only about ten flakes, all of them the pidfd one. I did not see the IPAM flake.

github-actions · 2023-06-04T00:08:00Z

A friendly reminder that this PR had no activity for 30 days.

Luap99 · 2023-06-13T11:04:08Z

Ok, all last outstanding errors are like this:

           # podman [options] stop --all -t 0
           time="2023-06-13T10:17:45Z" level=warning msg="lstat /sys/fs/cgroup/devices/machine.slice/libpod-deb09b1224bf545821fb135679cd6d1ad2bbacd55415b32acb89f26b445ebe78.scope: no such file or directory"
           4e60fd09f05efc6ae524bd0a46e0f6494b09a45fd9ccd6e50e0ed28d2c7c9f12
           deb09b1224bf545821fb135679cd6d1ad2bbacd55415b32acb89f26b445ebe78
           # podman [options] pod rm -fa -t 0
           ca4099d1486c4a67d07d411bb0f7a7911fd53d2197829b1877c02b22212be2de
           # podman [options] rm -fa -t 0

           [FAILED] Failure recorded during attempt 1:
           stop --all -t0 error logged
           Expected
               <string>: time="2023-06-13T10:17:45Z" level=warning msg="lstat /sys/fs/cgroup/devices/machine.slice/libpod-deb09b1224bf545821fb135679cd6d1ad2bbacd55415b32acb89f26b445ebe78.scope: no such file or directory"
           to be empty

https://api.cirrus-ci.com/v1/artifact/task/6373332413579264/html/int-podman-debian-12-root-host-boltdb.log.html
(Note the links at the bottom are broken so you need to grep for [FAILED] )

Only debian but with root and rootless so could be something runc or cgroupv1 specific?
@giuseppe Could you take a look? Maybe we should ignore ENOENT?

edsantiago · 2023-06-13T12:22:28Z

Looks like #11784 which was closed ENOTHINGWECANDOABOUTIT

Luap99 · 2023-06-13T12:35:54Z

Looks like #11784 which was closed ENOTHINGWECANDOABOUTIT

Thanks, except this here is no flake. It is trivial to reproduce actually when using runc

$ sudo podman --runtime runc run -d --name test alpine top
23b74a32da9148ab003a57ea97b53a20e0539e4139fa49ad4171b0b9d11e9ec0
$ sudo podman --runtime runc run -d --pid container:test alpine top
ed3ebef0a4c988b3aedeef9633635002c6affabcb76581b4a9d7d5d527ab2a99
$ sudo podman stop --all
WARN[0000] freezer not supported: openat2 /sys/fs/cgroup/machine.slice/libpod-ed3ebef0a4c988b3aedeef9633635002c6affabcb76581b4a9d7d5d527ab2a99.scope/cgroup.freeze: no such file or directory 
WARN[0000] lstat /sys/fs/cgroup/machine.slice/libpod-ed3ebef0a4c988b3aedeef9633635002c6affabcb76581b4a9d7d5d527ab2a99.scope: no such file or directory 
23b74a32da9148ab003a57ea97b53a20e0539e4139fa49ad4171b0b9d11e9ec0
ed3ebef0a4c988b3aedeef9633635002c6affabcb76581b4a9d7d5d527ab2a99

Luap99 · 2023-06-13T12:40:58Z

In fact you do not even need two containers just using --pid host will trigger this error alone.

$ sudo podman --runtime runc run -d --pid host alpine top
d57fe8041fe014d0e85dba2e1aa1e0b903599531ac2067f15528acb4ee3756b7
$ sudo podman stop --all
WARN[0000] freezer not supported: openat2 /sys/fs/cgroup/machine.slice/libpod-d57fe8041fe014d0e85dba2e1aa1e0b903599531ac2067f15528acb4ee3756b7.scope/cgroup.freeze: no such file or directory 
WARN[0000] lstat /sys/fs/cgroup/machine.slice/libpod-d57fe8041fe014d0e85dba2e1aa1e0b903599531ac2067f15528acb4ee3756b7.scope: no such file or directory 
d57fe8041fe014d0e85dba2e1aa1e0b903599531ac2067f15528acb4ee3756b7

Luap99 · 2023-06-13T14:12:58Z

I tested with runc from the main branch and it seems to not emit these warnings anymore. So I guess the answer is wait a bit longer until this is released and packaged in debian.

giuseppe · 2023-06-13T15:06:41Z

--pid host with rootless on cgroup v1 should not be supported.

There is no reliable way to terminate the container since cgroups are not used. We shouldn't allow it as well as joining another PID namespace

Luap99 · 2023-06-13T15:39:09Z

Note I ran the reproducers locally on cgroupv2 so it shouldn't matter for this specific warning I assume.
In any case this seems to be fixed upstream so I am fine with just waiting.

github-actions · 2023-08-05T00:07:06Z

A friendly reminder that this PR had no activity for 30 days.

With few exceptions, commands that exit 0 should not emit any messages with level=warning or =error. Let's start enforcing that in run_podman. Allow one-off exceptions, typically when we're testing an actual warning condition (usual case: "podman stop" where it times out to SIGKILL). Exceptions are specified via: run_podman 0+w subcommand... ^^^---- or, rarely, 0+e "0" stands for "expect exit status 0", which is the default so it's implicit anyway. The +w / +e (or even +we) is the new part. I have added it to tests where necessary. And, because life is what it is, add two global exceptions: - Debian. Because runc has too many flakes. - kube. Ditto. Kube commands emit lots of nasty error messages (yes, level=error) that don't seem to affect results. Similar to containers#18442 Signed-off-by: Ed Santiago <[email protected]>

rhatdan · 2024-03-01T12:12:10Z

Can we close this or update it?

Luap99 · 2024-03-01T12:22:22Z

I want this although, I guess I can rebase and see what blows up.

Luap99 · 2024-03-01T17:18:07Z

Debian still seems to print these runc warnings for some tests.

I guess one way forward is to check stderr for crun/fedora only? @edsantiago WDYT?

edsantiago · 2024-03-02T02:29:16Z

I'm totally fine with skipping warning checks on Debian. There is precedent, see #21191

And needless to say I would LOVE to have this enabled wherever possible.

There are many code paths which only do logrus but still exit 0 so this should catch more bugs. Unfortunately runc logs way to much random stuff so we ignore this check for runc right now. Signed-off-by: Paul Holzinger <[email protected]>

edsantiago · 2024-03-04T11:41:14Z

/lgtm

thank you!

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none labels May 3, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2023

edsantiago mentioned this pull request May 4, 2023

open pidfd: no such process #18452

Closed

This was referenced May 4, 2023

container stop: handle race #18457

Closed

--pid host causes podman stop to print oci runtime errors when the container is already stopped #18460

Closed

Luap99 force-pushed the e2e-noerrout branch from 1ac07b0 to 46a6e86 Compare May 4, 2023 13:15

This was referenced May 9, 2023

Fix podman rm -fa with dependencies #18507

Merged

ginkgo tests EPIC #18540

Open

github-actions bot added the stale-pr label Jun 4, 2023

Luap99 force-pushed the e2e-noerrout branch from 46a6e86 to 997597f Compare June 13, 2023 09:52

github-actions bot removed the stale-pr label Jun 14, 2023

github-actions bot added the stale-pr label Aug 5, 2023

edsantiago mentioned this pull request Aug 23, 2023

podman stop: Unable to clean up network: unmounting network namespace: device or resource busy #19721

Closed

edsantiago mentioned this pull request Sep 6, 2023

sys tests: run_podman: check for unwanted warnings/errors #19878

Merged

Luap99 force-pushed the e2e-noerrout branch from 997597f to a3d69be Compare March 1, 2024 12:25

test/e2e: check for stderr errors in cleanup()

ebce0e7

There are many code paths which only do logrus but still exit 0 so this should catch more bugs. Unfortunately runc logs way to much random stuff so we ignore this check for runc right now. Signed-off-by: Paul Holzinger <[email protected]>

Luap99 force-pushed the e2e-noerrout branch from a3d69be to ebce0e7 Compare March 4, 2024 10:28

Luap99 marked this pull request as ready for review March 4, 2024 10:28

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2024

openshift-ci bot assigned edsantiago Mar 4, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2024

openshift-merge-bot bot merged commit 5a4864c into containers:main Mar 4, 2024
94 checks passed

Luap99 deleted the e2e-noerrout branch March 4, 2024 11:48

edsantiago mentioned this pull request Mar 4, 2024

podman stop: rootless netns ref counter out of sync, counter is at -1, resetting it back to 0 #21569

Closed

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jun 3, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/e2e: check for stderr errors in cleanup() #18442

test/e2e: check for stderr errors in cleanup() #18442

Luap99 commented May 3, 2023

openshift-ci bot commented May 3, 2023

Luap99 commented May 3, 2023

edsantiago commented May 3, 2023

Luap99 commented May 3, 2023

edsantiago commented May 3, 2023

Luap99 commented May 4, 2023

Luap99 commented May 4, 2023

edsantiago commented May 4, 2023

github-actions bot commented Jun 4, 2023

Luap99 commented Jun 13, 2023

edsantiago commented Jun 13, 2023

Luap99 commented Jun 13, 2023

Luap99 commented Jun 13, 2023

Luap99 commented Jun 13, 2023

giuseppe commented Jun 13, 2023

Luap99 commented Jun 13, 2023

github-actions bot commented Aug 5, 2023

rhatdan commented Mar 1, 2024

Luap99 commented Mar 1, 2024

Luap99 commented Mar 1, 2024

edsantiago commented Mar 2, 2024

edsantiago commented Mar 4, 2024

test/e2e: check for stderr errors in cleanup() #18442

test/e2e: check for stderr errors in cleanup() #18442

Conversation

Luap99 commented May 3, 2023

Does this PR introduce a user-facing change?

openshift-ci bot commented May 3, 2023

Luap99 commented May 3, 2023

edsantiago commented May 3, 2023

Luap99 commented May 3, 2023

edsantiago commented May 3, 2023

Luap99 commented May 4, 2023

Luap99 commented May 4, 2023

edsantiago commented May 4, 2023

github-actions bot commented Jun 4, 2023

Luap99 commented Jun 13, 2023

edsantiago commented Jun 13, 2023

Luap99 commented Jun 13, 2023

Luap99 commented Jun 13, 2023

Luap99 commented Jun 13, 2023

giuseppe commented Jun 13, 2023

Luap99 commented Jun 13, 2023

github-actions bot commented Aug 5, 2023

rhatdan commented Mar 1, 2024

Luap99 commented Mar 1, 2024

Luap99 commented Mar 1, 2024

edsantiago commented Mar 2, 2024

edsantiago commented Mar 4, 2024