-
Notifications
You must be signed in to change notification settings - Fork 2.4k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Podman missinterprets %h symbol in the bind volume source when container created/started from a unit (BTRFS file system) #11547
Comments
Sorry! It may be unrelated because it happens regardless of the volume type. Simply, podman run and Podman container create return err code 125 when executed by rootless user, i.e. systemd unit on my Fedora WS in the user session, like in ##2197. Then, cid file is not created , ............. stop and rm don't work. It never happens in when podman runs in the terminal, only when from the unit using systemctl --user start, daemon-reexec, etc. Hint: Podman REST API service is running in the background. |
@vrothberg PTAL |
@giuseppe I think that it is exactly the ##2172 because the main difference between working case onWSL distro i.e. ext4 filesystem and failing Fedora34 WS i.e. btrfs filesystem. But I expected that in 3.1 the configuration file should be corrected. Please, look at attachement. |
could you share the systemd service file generated by Podman? |
The recent working in rootless WSL version for Theia IDE with Podman
backend enabled, slightly modified to reference project workspace in the
user's home and starting Chrome in the kiosk mode.
Can be used as a test scenario.I use a pre-pooled Image ID to avoid pull
delay and .interaction with the user that is impossible inside systemd
units.
Everything works in WSL!!!
…On Tue, Sep 14, 2021 at 11:34 AM Giuseppe Scrivano ***@***.***> wrote:
could you share the systemd service file generated by Podman?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11547 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM7YVKFB3JL5HUPYRXNQW5DUB4CK3ANCNFSM5D5MP75A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
The root cause is certainly related to the file system type: Fedora34 uses BTRFS by the default mounted at /home. If user data is located at /home too but not necessary on the btrfs subvolume the cross-fs mount will never work. The generated Unit checks that the Podman tree and container storage exist but it doesn't check or trigger mounting of subvolume or validate that user's data are in the same FS as container storage. The same problem appears in the anonymous volume scenario - nothing enforces that everything is on the same volume. storage.conf allows everything. WSL VM with its single ext4 filesystem works perfectly: no btrfs and Host folders are mounted using MS-specific mechanism. How bind mount should work on BTRFS and subvolumes??? |
@PavelSosin-320, please share the systemd service file. |
From the Podman on WSL instance with some comments and TODOs |
Thanks, can you also the contents of run-r8df511c2cb034c33a1ec70d63b670529.service? |
Sorry for the long delay due to the holidays. I tried to run theia image via systemd-run as a transient unit and the result is: |
Since all possible scenarios that I tested including Image volumes and anonymous volumes worked OK I suppose the root cause is that Podman parses -v option value exactly as described in the Documentation:
|
@vwbusguy Unfortunately, now I can say definitely that the issue is related to the btrfs filesystem because exactly the same syntax works correctly on the ext4 filesystem of the WSL-Fedora VM instance but doesn't work in the Fedora 34 Desktop with its default btrfs FS. Since BTRFS itself is a userspace FS and has its own Kernel module and mount utilities it can conflict with the FUSE-mount. In the Docker's documentation, this issue is addressed explicitly in Docker doc BTRFS storage driver. Although Podman info on Fedora reports that the backing FS is btrfs all other configurations look like the same as on ext4 FS. |
After eliminating BTRF-related issues via Podman with BTRFS I found very simple thing: When Podman tries to create a container from the systemd-unit that ran by rootless user it can't find storage configuration and it causes the "invalid reference" error. The testing using systemd-run results in |
I see the dependency on btrfs storage driver: it creates every container as a subvolume (!!!) in the $HOME/.local/share/containers/storage/btrfs/subvolumes/. So, the real ruunroot for the rootless containers has to be adjusted. It would better to use %h in the runroot option because HOME has to be imported into systemd environment according to systemctl --user value. It doesn't happen automatically. |
Indeed, it does, and that's generally not a problem for btrfs, unless you want to try to fsck all of them at once for some reason. Otherwise, subvols in btrfs are cheap. But yeah, the problem is that systemd won't grok the default PID file location and will assume the container isn't healthy and running when it is and will continuously restart it (depending on container/service restart policy) after a minute or so. Oddly enough, just commenting out the PIDFile line in the service file seems to make it work just fine, but I haven't tried this with a bunch of different container services on one host. |
Really sorry colleagues! But I found that Container-storage BTRFS driver code. intensievly uses OS.home to create path to the individual container's subvolume. Unfortunatelly, container that created as a part of generated systemd unit can't use HOME env variable safely because the service that invokes podman run, podman container create, etc. doesn't have session environment and runs as "homeless". Only units that create nothing in the storage can work safely. Session environment is created by pam or systemd generators with no synchronization with the unit execution. The same is true also for other XDG based env variables: XDG_HOME, CONFIG_HOME, DATA_HOME, ... env variables. %h placeholder doesn't mean that HOME exists in the environment. Everything is created during session creation. Systemd service needs statically created environment that comes from the unit.service file, environment file or unit config per user/service. |
@vwbusguy I don't think that IO throughput is so critical in the development environment - the nische of Podman. But it has a lot of benefits when used by a rootless user due to the features that it offers: isolation, data safety and ease of maintenence. The artifact volume's content is cleary visible without irrelevant high, low, merged details. Container's volume snapshot is ready-for-use in debugging of failure states container snapshot available without additional machinary. |
Some advance: after importing HOME and all CONTAINERS_* env variables into systemd environment using systemctl --user import-environment "Invalid reference" message disapears, i.e. storage driver works but the return code is still 125. Podman info runs OK out of the unit context but now container creation fails without any error message. Which data I can collect in the such situation? |
How would it know to use btrfs driver vs overlay if storage.conf is ignored? |
To eliminate "Invalid reference" message I after learning lessons from running "Podman container create" using systemd-run,
|
Hint: something went in the Docker's BTRFS driver too: #moby/moby#42253. Interesting what https://github.com/AkihiroSuda did here. Podman only describes failure in the wrong way. Indeed, some operations like subvolume create, show don't need root privileges but in the some cases /home subvolume is not accessible for the rootless user. The simple ls , read, and write as rootless user into /home..... subvolume work without mount ????? Does mount fail without FUSE outside User session? Maybe, Mount namespaces of conmon and created by systemd conflict? Systemd Unit created for Pod wthout conmon and --new option works OK. The Pod provides its own CG as a parent for inner containers. |
Finally, I suppose that I hit the ##4678 . This is 1.5 years old issue without solution. Only workaround was proposed. But it looks very similar to the starting Podman REST API zombie process issue. Systemd can't tolerate non-organized processes packs. Otherwise, The systemd based systemd will be filled with zombie processes and leaked CGs. Systemd tends to organize a group of processes and If long-living CG is needed Podman.scope under the user.slice managed by logind can be used. It creates CGs with predictable namesI I played with it to get rid zombie REST API server process and it worked well - everything that belong to the scope disappears. |
Please open a PR to fix this in containers/storage. |
@rhatdan The scope creation is purely crun duty. I don't think that "external" transient scope creation using systemd-run can be used in the production. I just upgraded crun for Fedora 34 to the recent version 1.2 for Fedora 34 and will test it as soon as possible to be sure that it works correctly. But meanwhile, can somebody from the Podman team check that Podman invoke Crun correctly with --rootless and --systemd-cgroup options values and then, processes exit code properly. |
Crun has been tested and exposes the same very old issue of cgroup manager for rootless user: Podman has to follow containers.conf configuration and use systemd as cgroup manager for root and rootless user. to manage container's running as a systemd service the system unit of kind scope either Systemd manager DBUS API or manually executed systemd-run is absolutely necessary. busctl works for rootless users, every user has own bus socket - there is no reason to suspect that API has some additional restrictions. |
are you using BTRFS as the storage backend or is the storage configured to use overlay? |
@giuseppe 1. Crun is not guilty!!! I reverted the Podman configuration to the old runc and got the same result - error 125/n/a. |
I am curious to know if this works when using overlay instead of btrs. Have you tried changing the storage driver? |
I've also had this happen with overlay.
|
I experienced some strange adverse effects as a result of Fedora update brought systemd, DBus upgrade with their utilities upgrade. They expect some environment variables and access rights in the scope of Systemd service: |
Playing with RunC vs CRun I found that RunC has strong requirement that RunRoot where the bundle is stored must be on the TMPFS, i.e. /Run?User/ ... But Fedora ( and the future possible WSL Fedora distro based on WinBTRFS driver boxes users inside distro's /home FS that is always BTRFS. Does somebody know how to mount bind different FS type? |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When container that assumes bind volume mounting is created or run from the generated by podman generate systemd command the source path can be expressed by the absolute path or via % builtin symbols, %h for example. All % builtin "path" symbols like %t and %h are expanded into an absolute path, i.e. must be accepted by podman as a valid bind volume source. All subdirectories related to the %h, %v, etc. should be accepted. I hope, local driver and fuse.mount support this scenario.
Steps to reproduce the issue:
Describe the results you received:
"Invalid reference" error message
Describe the results you expected:
It must work because systemd unit must be shareable between users
Additional information you deem important (e.g. issue happens only occasionally):
It works OK in WSL because the workspace is located in the host filesystem
Output of
podman version
:Podman Version 3.3.1
API Version: 3.3.1
(paste your output here)
(paste your output here)
The text was updated successfully, but these errors were encountered: