-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flake: "timed out waiting for file" #5339
Comments
Always the same test, seemingly - two execs, one after the other. |
And another one: https://api.cirrus-ci.com/v1/task/6533719171268608/logs/system_test.log This one is "special_testing_rootless" (I don't know if that's Fedora or Ubuntu) and, more interestingly, it's in the BATS tests instead of ginkgo. Common factor is, as @mheon pointed out, two |
and another: f30: https://api.cirrus-ci.com/v1/task/5078770545590272/logs/integration_test.log#t--podman-run---net-container--copies-hosts-and-resolv Have we seen this on f31? |
Extra info: yesterday, at @cevich's suggestion, I tried switching to # grep ^driver /etc/containers/storage.conf
driver = "vfs"
# echo "mq-deadline" > /sys/block/vda/queue/scheduler Tried an infinite-loop of the |
Could this be related to the runc/crun thing I'm trying to fix in #5342 (we're using crun in F30 in CI vs earlier we were using runc IIRC) |
@cevich yes are we ready to merge that? |
This task: https://cirrus-ci.com/task/5474270998429696 @edsantiago more data is almost never a bummer, it means we're guessing less 😄 So mostly F30 but possibly F31 then. In that task I show:
|
An example of this in F30 using the new images from #5342 (with crun -> runc fixed) |
(implication being: the problem does not appear to be impacted by anything changed/fixed in that PR) |
Oh wait, there's a new error now:
Happening in the same place the "timed out waiting for file" error happens. It, too, is a flake (goes away on retry). Any ideas? |
@haircommander @giuseppe Conmon? crun? |
hm this is from #5373, which was supposed to fix this flake. Maybe I missed a case |
Oh, thank you. It's a huge relief to know that this is getting attention. |
Unfortunately, making the change above only moved how podman failed here. There's still some issue where it seems conmon is crashing when consecutive execs happen on a system under load. That's all the details I have right now, unfortunately. I'll try to give this some love in the next couple of days, but there's a lot on my plate this week 😕 |
2.0.9 breaks `podman exec`: containers/podman#5339
2.0.9 breaks `podman exec`: containers/podman#5339 Closes #483
Versions earlier than 2.0.13 break `podman exec` due to containers#5339
Seeing this flake in CI periodically; so far, it always seems to be connected to 'podman exec':
This is a placeholder so we can track the problem and gather info.
The text was updated successfully, but these errors were encountered: