flake: "timed out waiting for file" #5339

edsantiago · 2020-02-27T15:01:59Z

Seeing this flake in CI periodically; so far, it always seems to be connected to 'podman exec':

timed out waiting for file /tmp/podman_test276144922/crio/vfs-containers/a5affd794b77bd57a0b5e950b5884175320d9c6360ba65bdad9ef72ce5b2979b/userdata/9b0a41da6f1d5d50c61f75ad57c7a531d5688446a80bccaf61852b4fad8a0451/exit/a5affd794b77bd57a0b5e950b5884175320d9c6360ba65bdad9ef72ce5b2979b: internal libpod error"

This is a placeholder so we can track the problem and gather info.

The text was updated successfully, but these errors were encountered:

edsantiago · 2020-02-27T15:02:30Z

Here's one: https://api.cirrus-ci.com/v1/task/6227133299163136/logs/integration_test.log#t--podman-run---net-container--copies-hosts-and-resolv

edsantiago · 2020-02-27T17:36:54Z

Another one: on f30: https://api.cirrus-ci.com/v1/task/4863888097280000/logs/integration_test.log#t--podman-run---net-container--copies-hosts-and-resolv

mheon · 2020-02-27T17:42:29Z

Always the same test, seemingly - two execs, one after the other.

edsantiago · 2020-02-27T18:57:56Z

And another one: https://api.cirrus-ci.com/v1/task/6533719171268608/logs/system_test.log

This one is "special_testing_rootless" (I don't know if that's Fedora or Ubuntu) and, more interestingly, it's in the BATS tests instead of ginkgo. Common factor is, as @mheon pointed out, two execs in quick succession.

edsantiago · 2020-02-27T19:41:38Z

and another: f30: https://api.cirrus-ci.com/v1/task/5078770545590272/logs/integration_test.log#t--podman-run---net-container--copies-hosts-and-resolv

Have we seen this on f31?

edsantiago · 2020-02-27T20:20:27Z

Extra info: yesterday, at @cevich's suggestion, I tried switching to vfs and also some scheduler magic for compatibility with CI:

# grep ^driver /etc/containers/storage.conf
driver = "vfs"

# echo "mq-deadline" > /sys/block/vda/queue/scheduler

Tried an infinite-loop of the networking BATS test. No failures. But this is f31, and so far it's looking like all the flakes are happening on f30...?

cevich · 2020-02-27T20:25:19Z

Could this be related to the runc/crun thing I'm trying to fix in #5342

(we're using crun in F30 in CI vs earlier we were using runc IIRC)

baude · 2020-02-27T20:57:06Z

@cevich yes are we ready to merge that?

edsantiago · 2020-02-27T23:30:39Z

Bummer. f31: https://api.cirrus-ci.com/v1/task/5474270998429696/logs/integration_test.log#t--podman-run---net-container--copies-hosts-and-resolv

cevich · 2020-02-28T15:30:52Z

This task: https://cirrus-ci.com/task/5474270998429696

@edsantiago more data is almost never a bummer, it means we're guessing less 😄

So mostly F30 but possibly F31 then. In that task I show:

$SCRIPT_BASE/logcollector.sh packages
conmon-2.0.10-2.fc31-x86_64
containernetworking-plugins-0.8.5-1.fc31-x86_64
containers-common-0.1.41-1.fc31-x86_64
container-selinux-2.124.0-3.fc31-noarch
criu-3.13-5.fc31-x86_64
crun-0.12.2.1-1.fc31-x86_64
golang-1.13.6-1.fc31-x86_64
package runc is not installed
podman-1.8.0-2.fc31-x86_64
skopeo-0.1.41-1.fc31-x86_64
slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31-x86_64

cevich · 2020-02-28T18:19:50Z

An example of this in F30 using the new images from #5342 (with crun -> runc fixed)

https://cirrus-ci.com/task/4714742136700928

cevich · 2020-02-28T18:21:47Z

(implication being: the problem does not appear to be impacted by anything changed/fixed in that PR)

edsantiago · 2020-03-04T20:53:35Z

We are wasting unbelievable amounts of time because of this bug. I just did a pass through submitted PRs, and a number of them are in red-X state because of this bug. (I restarted the tasks).

I have been unable to reproduce it in any test environment: f30, f31, overlay, vfs, --root /tmp/xx. I'm starting to think it might be something in how podman is compiled in the CI environment.

Here is a summar of the flakes and retries on one of my PRs today:

special_testing_bindings fedora-31

2020-03-04T12:18:00 integration_test

testing fedora-30 fedora-30

edsantiago · 2020-03-04T20:58:45Z

Oh wait, there's a new error now:

time="2020-03-04T15:01:37-05:00" level=error msg="container create failed (no logs from conmon): EOF"
Error: non zero exit code: -2147483649: OCI runtime error

Happening in the same place the "timed out waiting for file" error happens. It, too, is a flake (goes away on retry). Any ideas?

rhatdan · 2020-03-04T21:02:28Z

@haircommander @giuseppe Conmon? crun?

haircommander · 2020-03-04T21:13:37Z

hm this is from #5373, which was supposed to fix this flake. Maybe I missed a case

edsantiago · 2020-03-04T21:15:59Z

Oh, thank you. It's a huge relief to know that this is getting attention.

haircommander · 2020-03-04T21:24:06Z

Unfortunately, making the change above only moved how podman failed here. There's still some issue where it seems conmon is crashing when consecutive execs happen on a system under load. That's all the details I have right now, unfortunately. I'll try to give this some love in the next couple of days, but there's a lot on my plate this week 😕

Fixes containers/podman#5339 See https://bugs.debian.org/964858

2.0.9 breaks `podman exec`: containers/podman#5339

2.0.9 breaks `podman exec`: containers/podman#5339 Closes #483

Versions earlier than 2.0.13 break `podman exec` due to containers#5339

edsantiago mentioned this issue Mar 2, 2020

container Exists: fix URL #5343

Merged

haircommander mentioned this issue Mar 4, 2020

exec: fix error code when conmon fails #5396

Merged

haircommander mentioned this issue Mar 12, 2020

stdio: sometimes quit main loop after io is done containers/conmon#128

Merged

mheon mentioned this issue Mar 17, 2020

CI flake: podman run --net container #5522

Closed

rhatdan closed this as completed in containers/conmon#128 Mar 17, 2020

edsantiago mentioned this issue Mar 30, 2020

CI: can we get a new conmon? #5668

Closed

marusak mentioned this issue Aug 3, 2020

Fix showing 'n/a' on stats when on CGroupsV1 cockpit-project/cockpit-podman#479

Merged

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Aug 3, 2020

TEST: Run debian tests with newer conmon

90f29ff

Fixes containers/podman#5339 See https://bugs.debian.org/964858

This was referenced Aug 3, 2020

TEST: Run debian tests with newer conmon cockpit-project/cockpit-podman#480

Closed

test: Update to conmon 2.0.18 in Debian unstable cockpit-project/cockpit-podman#483

Merged

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Aug 4, 2020

test: Update to conmon 2.0.18 in Debian unstable

259c33c

2.0.9 breaks `podman exec`: containers/podman#5339

martinpitt added a commit to cockpit-project/cockpit-podman that referenced this issue Aug 4, 2020

test: Update to conmon 2.0.18 in Debian unstable

3e40e8a

2.0.9 breaks `podman exec`: containers/podman#5339 Closes #483

gvalkov mentioned this issue Mar 16, 2021

Error: timed out waiting for file ... internal libpod error with 'podman exec' #9724

Closed

siretart pushed a commit to siretart/libpod that referenced this issue Nov 16, 2021

Bump conmon dependency

7030a13

Versions earlier than 2.0.13 break `podman exec` due to containers#5339

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flake: "timed out waiting for file" #5339

flake: "timed out waiting for file" #5339

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

mheon commented Feb 27, 2020

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

cevich commented Feb 27, 2020

baude commented Feb 27, 2020

edsantiago commented Feb 27, 2020

cevich commented Feb 28, 2020

cevich commented Feb 28, 2020

cevich commented Feb 28, 2020

edsantiago commented Mar 4, 2020

edsantiago commented Mar 4, 2020

rhatdan commented Mar 4, 2020

haircommander commented Mar 4, 2020

edsantiago commented Mar 4, 2020

haircommander commented Mar 4, 2020

flake: "timed out waiting for file" #5339

flake: "timed out waiting for file" #5339

Comments

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

mheon commented Feb 27, 2020

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

edsantiago commented Feb 27, 2020

cevich commented Feb 27, 2020

baude commented Feb 27, 2020

edsantiago commented Feb 27, 2020

cevich commented Feb 28, 2020

cevich commented Feb 28, 2020

cevich commented Feb 28, 2020

edsantiago commented Mar 4, 2020

special_testing_bindings fedora-31

testing fedora-30 fedora-30

edsantiago commented Mar 4, 2020

rhatdan commented Mar 4, 2020

haircommander commented Mar 4, 2020

edsantiago commented Mar 4, 2020

haircommander commented Mar 4, 2020