[v3.2] Cannot get journald cursor when running in namespace with /var/log overmounted #10863
Labels
locked - please file new issue/PR
Assist humans wanting to comment on an old issue or PR with locked comments.
/kind bug
Description
v3.2.1 introduced a breakage in my environment where I run the vanilla Fedora
podman
(ie.with systemd support) under namespaces - specifically injected underneath a usernetes namespace which has its own mount namespace and overmounts/var/log/
. The issue manifests aspodman logs -f <somecontainer>
spitting out the following error:and then exiting non-zero.
An
strace
shows:The repro below (step 4) works on 3.2.0, breaks in 3.2.1 due to c3f6ef6 having been backported, and is not fixed in 3.2.2 (#10744) because the bug was actually only exposed by c3f6ef6 rather than being caused by it. I asked for the changes backported in #10744 to land in 3.2.2 and @vrothberg obliged (thanks!), but after seeing the same misbehaviour I had to do a bit more bisecting.
It turns out that the issue is triggered by c3f6ef6 (confirmed by reverting that change on top of the affected tags) which changes how we wait until a container exits by making use of runtime events instead of attempting to follow a file until EOF. However, on
main
cherry picking that change on top of any commit from the 3.2.0-rc0 divergence up to fb4a0c5 also causes the misbehaviour.In fb4a0c5 there are some changes which landed from containers/common in that change which actually appear to be the correct fix for this misbehaviour, rather than the changes landed in #10744. Specifically containers/common@7c0d472 and containers/common@a1b80b5 which fixed some missed imports in the other commit. Those changes landed in common v0.40.0 and don't appear to have been backported to 0.38.
Suggested fix from me is to backport containers/common@7c0d472 and containers/common@a1b80b5 into a common v0.38.15 and then include that in a podman 3.2.3. For the moment, I'm just going to move back to the tip of main to unblock my own work :)
Steps to reproduce the issue:
I've sanitised the following repro which I think demonstrates the behaviour portably enough.
Use a
podman
built withsystemd
supportUnshare a mount NS and overmount /var/log with tmpfs or something (you also need to unshare userns and map root)
unshare -Urm
thenmount -t tmpfs tmpfs /var/log
podman logs -f
it, observe the error message_CONTAINERS
env vars to let podman know that the userns is set up alreadyexport _CONTAINERS_ROOTLESS_UID=<youruid>
export _CONTAINERS_USERNS_CONFIGURED="true"
As a one liner:
This is enough to demonstrate how the breakage occurs and strace things, but not exactly the same as my environment (see step 4). I've separated this out since it may be an easier system to reason about along the way to step 4 when confirming my observations :)
unshare -Urmpf
/var/log
again and also mount the pidns'procfs
withmount -t proc proc /proc
As a one liner again:
On 3.2.1/3.2.2 and the current tip of the 3.2 branch (6f0bf16) this breaks.
(Expand) With the following diff (changes to the vendorised containers/common) applied to the tip of the 3.2 branch, it works:
Describe the results you received:
Failed to get a cursor for the journal at
/var/log/journal
because it does not exist (is overmounted)Describe the results you expected:
In step 3 of my repro, should have seen that pid 1 was still systemd and died because attempting to access
/var/log/journal
(and/run/log/journal
I suppose, which podman does try first) is the correct choice to make.In step 4 of my repro, pid 1 is no longer systemd and we should fall back to "file" logging rather than "journald" per the changes in common. The container should run and logs should be followed until the container exits.
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
The text was updated successfully, but these errors were encountered: