-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[systests] quadlet hang, no details #18514
Comments
Is the logged timestamp when the test ends? If so it is seems to be caused by |
I think I was able to reproduce locally.
This was the fourth try. Didn't took as long as in CI but still 3 mins??? Normal run is 2s. relevant process while it took that long:
|
Run $QUADLET and all systemctl/journalctl commands using 'timeout'. Nothing should ever, ever take more than the default 2 minutes. Followup to containers#18514, in which quadlet tests are found to be taking 9-10 minutes. Signed-off-by: Ed Santiago <[email protected]>
Another one, f38 aarch64 root |
I tried to get a reproducer but looks like I was only lucky once. I haven't been able to trigger this since then. |
Another one, and once again f38 aarch64 root (the usual pattern) (there's one on rawhide-amd64, but all the rest are aarch64, FWIW). This time we have fine-tune timestamps in the logs, in case that helps anyone. |
The stats so far. root only (so far), f38 and rawhide, aarch64 and regular amd64. And, yes, still happening.
|
@vrothberg To me this looks like something in the qualdet unit causes a deadlock on a single container. I was able to reproduce once we the cleanup process causes a deadlock but unfortunately I killed it before I got the stack trace. @edsantiago If we hit a timeout in the cleanup commands can you patch the tests to use the new |
for containers#18514: if we get a timeout in teardown(), run and show the output of podman system locks for containers#18831: if we hit unmount/EINVAL, nothing will ever work again, so signal all future tests to skip. Signed-off-by: Ed Santiago <[email protected]>
A friendly reminder that this issue had no activity for 30 days. |
Only one instance in my logs... but I'm >80% sure that I've seen it in other PRs in the past two weeks. (Quick reminder that my flake logger does not, and cannot, log every flake).
|
Still happening? |
Sigh. No new instances. |
Remco on IRC suggested that I hit this problem in openSUSE/Tumbleweed (podman 4.9.2) with this [Unit]
Description=Podman container-transmission.service
Documentation=https://github.com/linuxserver/docker-transmission/blob/master/README.md
Wants=network-online.target
After=network-online.target
# -e USER= `#optional` \
# -e PASS= `#optional` \
# -e WHITELIST= `#optional` \
# -e PEERPORT= `#optional` \
# -e HOST_WHITELIST= `#optional` \
[Container]
# Image=registry.opensuse.org/home/mcepl/containers/containers/opensuse/transmission:latest
# Image=lscr.io/linuxserver/transmission:latest
Image=docker.io/linuxserver/transmission
ContainerName=transmission
# HostName=my-syncthing
Label=io.containers.autoupdate=registry
Environment=PUID=1000 PGID=100 SEED_RATIO=1.3 TZ=Europe/Prague
Volume=/home/matej/.config/transmission:/config
Volume=/home/matej/Videa/2BSeen:/downloads
# -v /path/to/watch/folder:/watch \
PublishPort=127.0.0.1:9091:9091
PublishPort=127.0.0.1:51413:51413
PublishPort=127.0.0.1:51413:51413/udp
UserNS=keep-id:uid=1000,gid=100
PodmanArgs=--hostname my-transmission --privileged
[Service]
Restart=on-failure
[Install]
WantedBy=default.target on running |
Haven't seen this one in many months. |
Seen twice in system tests:
...and, game over. No more logs from there, Cirrus times out. Look at the timestamps, though: those are absurd. Nine minutes per test? This is probably the kind of thing that needs to be debugged by looking at journal logs. Or maybe by instrumenting.
The text was updated successfully, but these errors were encountered: