-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timeout in sctp? forward expose? #14331
Comments
Is this only on fedora 35? If so it uses CNI and not netavark. I agree that it looks like that the sctp test is causing issues however I do not understand why? The test never actually checks connection. It looks like it is a deadlock on the network backend lock. Therefore all other tests start timing out when they have to use the network. |
Ok so this is weird. I see many @mheon Any ideas? |
I don't think it's Podman deleting the DB, so the test harness seems a likely culprit - but I'm not sure exactly why it would be doing so. The cleanup code could be responsible, but that should only run after all instructions have completed. Is there an internal Ginkgo timeout that could fire the cleanup code if the test itself takes too long, maybe? |
So far it seems to be f35 only. Here's yet another one. I'm seeing a lot of yellow |
Failures seem to happen all at the same time. So I think one process is holding the network lock which then causes all other processes to be blocked and time out. I also found this message in one of the logs:
So maybe the iptables lock is blocked for some reason thus keeping the network lock blocked? Are these the only failed logs? If so the question is what happened on/before May 15th? |
Ya, that's within the 30-day window so the old images should still be around... |
...looking back in git-history, PR #14178 merged just before mine. Logs from that job are still available. Pick one of the platforms you're interested in, and (in the Cirrus-CI WebUI) there will be a |
Here's a failure in just the |
From the current
From PR #14178:
From the f35 Ed just posted:
|
This is our number one flake right now and it's happening multiple times per PR |
Thanks @cevich. Can I just replace the image id in cirrus.yaml and run the |
Sometimes 😜 It depends on if there were script changes associated with the image update, or not. The safest thing to do is check out the commit that has the image you want, and use the script from there. Also, if you just need to poke around in the VM and don't intend to execute |
Thanks I think I am fine. I was able to repo locally. Hopefully #14361 fixed the issue. We will know for sure in a couple of days. |
@edsantiago Is this fixed? |
The last instance I see is on May 25, so I'll treat it as fixed. Thank you! |
There's a flake that seems to be having cascading flake effects. I think, but am not sure, that the problem begins with the
podman run forward sctp protocol
test:When this happens, it seems to trigger flakes in
podman run network expose host port 80 to container port
andpodman run network expose duplicate host port results in error
testsI'm just going to link to the
sctp
failures here. In each of the logs below, thesctp
failure is not alone: there are always associatedexpose
failures. I don't know which causes which.Podman run networking [It] podman run forward sctp protocol
The text was updated successfully, but these errors were encountered: