-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression with kernel 6.9: user containers hang forever in pasta setup with podman-5.0.0~rc6 #22052
Comments
What worries me: This doesn't seem to turn up in any of "your" integration tests. Certainly at least some of them cover user podman and pasta? Do they set up some additional magic which circumvents this bug? That may give a clue what happened. Thanks! |
Well I doubt that either podman or pasta changed. I cannot see anything related to that on the podman side and the passt version is still the same that we use in our CI. Of course I have no proof yet but if I would think it is far more likely some kernel bug given that the version shows kernel 6.9 (unreleased) |
I reproduced with |
Confirmed. I tested it with kernel-core-6.8.0-63.fc41.x86_64 (i.e. |
Checking. @martinpitt could you tell me the upstream SHA that kernel package is sourced from? |
@sbrivio-rh I used a kernel from yesterday, commit seems to be |
running strace I see
and then nothing until I kill it with SIGINT, full log attached |
Reproduced using a procfs entry as reference, pasta properly sets up the timer:
and then also hangs on a recvfrom() on a |
Current suspicion of mine is commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families"). |
@sbrivio-rh I suppose your question to me is obsolete, but kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64 has a git reference in it. I suppose that's the upstream one? Otherwise it's somewhere in https://src.fedoraproject.org/rpms/kernel/c/77aed3137518b5c493265a6c9fa1eec2a13ea6ca?branch=rawhide |
@martinpitt yes |
Ah, sorry, it is! I was just using a net.git tree which didn't include b0546776ad3f yet. Switched to mainline torvalds/linux.git now. |
Another workaround for pasta: |
Patch at: https://archives.passt.top/passt-dev/[email protected]/. I think we should report this as a possible kernel regression, even if the new behaviour is correct and 87d381973e49 made its way to mainline through net-next, and I doubt it will be included in any -stable branch, so probably releasing a fixed version of pasta in the near future is enough. I didn't bisect though. |
Reported here: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth/. In any case, I'll prepare a new release in a bit. I'm not aware of any other distribution already shipping that commit in their latest kernel package. |
The pasta fix is in passt-0^20240318.g615d370 (https://bodhi.fedoraproject.org/updates/FEDORA-2024-4b5b35a749) Thus I am going to close this one. |
I'll note here that we had some other non-pasta related tests that were failing in CoreOS: coreos/fedora-coreos-tracker#1693 Though, it does appear these tests do now also pass with https://bodhi.fedoraproject.org/updates/FEDORA-2024-4b5b35a749 Not sure why container building would have been affected, but figured I'd mention it anyway. |
Because pasta is the default for rootless networking now. So |
Makes sense. Thanks |
That kernel fix for the original issue here seems to have landed. But very recent runs now fail differently:
I'll file this as a proper/separate issue tomorrow morning, just a little pre-warning in case that rings a bell, and to explain why rawhide runs are still red everywhere. |
|
It wasn't actually a kernel fix -- the kernel's behaviour was in fact correct, even though different from what it has been in the past ~5 (at least?) years.
It looks like either a kernel issue, or we fail to account for four messages from the kernel (most likely) after kernel commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families"). Some relevant information meanwhile: the kernel version was meanwhile bumped to
|
@martinpitt I've installed the rawhide kernel and have reproduced... well, not exactly the problem you describe, but something wrong in the netlink handling. I'm debugging now. |
FYI I just saw another user run into this on a fedora bz: https://bugzilla.redhat.com/show_bug.cgi?id=2270257#c9 He confirmed that the new passt version from today solves the issue for him. |
The rawhide log came back, and bad news: It did install the latest passt-0^20240319.gd35bcbe-1.fc41.x86_64, but it still has some failures with "netlink: Unexpected sequence number (6 != 10)". However, that bugzilla also mentions "with podman 5.0.0 |
The podman version rc 6 or 7 should not make a difference. |
"fun" -- I changed cockpit-project/cockpit-podman#1636 to also install the latest podman and passt koji builds for F40, and now both F40 and rawhide fail on the "Unexpected sequence number (6 != 10)" issue. |
Drat the @martinpitt @Luap99 have either of you seen this outside of the difficult to adjust CI context? At present my next step is to ask for some additional logging from the reporter of the BZ. |
I filed pasta upstream bug 83 to track the |
I now have a draft fix for upstream bug 83, here. I'm talking to @sbrivio-rh about a release. |
Meh, I first tried to reproduce this with "tmt" itself, but that is annoyingly difficult. I then tried to boot an F40 cloud image out of thin air with just curl and QEMU: Run F40 cloud VM:
Log in with SSH (more comfortable than the console), password "foobar":
inside the VM:
But that again works fine. But it seems @dgibson is ahead of that anyway -- can you reproduce it now? If not, the Testing Farm has a cli for "reserve" , which you can use to get an interactive real TF instance. It does require some initial approval by @thrix though. |
I have reproduced something that looks very similar, so I think so. To make it work, I've had to create a second network interface, which doesn't appear to be the case in the other cases we've seen. My working theory is that in those other cases something has created an additional interface after the |
Wednesday's release coming. |
There you go, |
I updated the test PR to that version. As expected, f40 still failed the same way, as there's no new passt there. But the rawhide run PASSED 💯 🥳 🎉 Thanks @dgibson and @sbrivio-rh ! Can you please release this to F40, too? |
Sure, here's |
Thanks all, I am going to close given it is fixed in the latest release |
Updated cockpit-project/cockpit-podman#1636 to that F40 update and it's squeaky green now 💚 Thanks all! |
Pasta 03-20 is now in CI VMs, PR #22082 |
Since f919dc7 ("conf, netlink: Don't require a default route to start"), if there is only one host interface with routes, we will pick that as the template interface, even if there are no default routes for an IP version. Unfortunately this selection had a serious flaw: in some cases it would 'return' in the middle of an nl_foreach() loop, meaning we wouldn't consume all the netlink responses for our query. This could cause later netlink operations to fail as we read leftover responses from the aborted query. Rewrite the interface detection to avoid this problem. While we're there: * Perform detection of both default and non-default routes in a single pass, avoiding an ugly goto * Give more detail on error and working but unusual paths about the situation (no suitable interface, multiple possible candidates, etc.). Fixes: f919dc7 ("conf, netlink: Don't require a default route to start") Link: https://bugs.passt.top/show_bug.cgi?id=83 Link: containers/podman#22052 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257 Signed-off-by: David Gibson <[email protected]> [sbrivio: Use info(), not warn() for somewhat expected cases where one IP version has no default routes, or no routes at all] Signed-off-by: Stefano Brivio <[email protected]>
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families") changed netlink behaviour so that the NLMSG_DONE terminating a bunch of responses can go in the same datagram as those responses, rather than in a separate one. Our netlink code is supposed to handle that behaviour, and indeed does so for most cases, using the nl_foreach() macro. However, there was a subtle error in nl_route_dup() which doesn't work with this change. f00b153 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own subtle error. The problem arises because nl_route_dup(), unlike other cases doesn't just make a single pass through all the responses to a netlink request. It needs to get all the routes, then make multiple passes through them. We don't really have anywhere to buffer multiple datagrams, so we only support the case where all the routes fit in a single datagram - but we need to fail gracefully when that's not the case. After receiving the first datagram of responses (with nl_next()) we have a first loop scanning them. It needs to exit when either we run out of messages in the datagram (!NLMSG_OK()) or when we get a message indicating the last response (nl_status() <= 0). What we do after the loop depends on which exit case we had. If we saw the last response, we're done, but otherwise we need to receive more datagrams to discard the rest of the responses. We attempt to check for that second case by re-checking NLMSG_OK(nh, status). However in the got-last-response case, we've altered status from the number of remaining bytes to the error code (usually 0). That means NLMSG_OK() now returns false even if it didn't during the loop check. To fix this we need separate variables for the number of bytes left and the final status code. We also checked status after the loop, but this was redundant: we can only exit the loop with NLMSG_OK() == true if status <= 0. Reported-by: Martin Pitt <[email protected]> Fixes: f00b153 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") Fixes: 4d6e9d0 ("netlink: Always process all responses to a netlink request") Link: containers/podman#22052 Signed-off-by: David Gibson <[email protected]> Signed-off-by: Stefano Brivio <[email protected]>
Issue Description
Since around yesterday, the "cockpit-podman revdeps" test has started to fail, in e.g. #22047, #22048, #22050 (these three were landed red). This happens both with the "podman-next" copr, e.g. this or this run, and last night it also started on pure rawhide runs without the -next COPR, as the newer podman versions are
being pushed into rawhide
Steps to reproduce the issue
Install the latest podman on rawhide. It's not yet on the mirrors, so need to install from koji:
As user:
Describe the results you received
It hangs forever without any message, and doesn't give a shell. There is no journal message or even a resource consuming process. It doesn't react to Control-C or Control-, the only way is to
kill -9
them.With
--log-level=debug
:So that "Received Stop()" is certainly unusual.
After that, podman is completely blocked. In a separate terminal,
podman ps
also hangs.Describe the results you expected
Well, it should work 😉
It does work fine with system containers.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Fedora rawhide cloud image in a VM, with latest
dnf update
:Additional information
No response
The text was updated successfully, but these errors were encountered: