-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta #22197
Comments
You have to make sure your network is fully set up before the unit is started. |
This feel like it could be related to the same question in #22057 |
I have not been able to get a rootless user quadlet to wait for my network to be ready even adding
No issues on 4.9.3 |
@flyingfishflash You cannot wait for system units from user units, see systemd/systemd#3312 I wasn't aware that the user units start before the network is fully set up and that it causes such big trouble with pasta. Note you do not need to downgrade, you can just change the default back to slirp4netns in containers.conf, see the last part in the pasta section on https://blog.podman.io/2024/03/podman-5-0-breaking-changes-in-detail/ You could also do something like this #22190 (comment) Of course none of this is a proper solution but I am sure we will find something to address this in a better way soon. |
@Luap99 - thank you for this tip re containers.conf! |
No. It's as much of a bad practice today as it was 50 years ago. |
I ran into this issue today and finally learned that systemd user level units apparently can't depend on system level units (such as I've managed a workaround that satisfies my desire to avoid arbitrary timeouts by creating a user-level # ~/.config/systemd/user/network-online.service
[Unit]
Description=User-level proxy to system-level network-online.target
[Service]
type=oneshot
ExecStart=/bin/bash -c 'until systemctl --machine=%[email protected] is-active network-online.target; do sleep 1; done'
[Install]
WantedBy=default.target # ~/.config/systemd/user/network-online.target
[Unit]
Description=User-level network-online.target
Requires=network-online.service
Wants=network-online.service
After=network-online.service Then in your quadlet units:
|
seems it just work after you can ping an external ip (include gateway ip) |
I'll share my workaround, but it might be a good idea to have a #[Unit]
Description=Wait for network to be online via NetworkManager or Systemd-Networkd
[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.
Type=oneshot
# At least one of these should work depending if using NetworkManager or Systemd-Networkd
ExecStart=/bin/bash -c ' \
if command -v nm-online &>/dev/null; then \
nm-online -s -q; \
elif command -v /usr/lib/systemd/systemd-networkd-wait-online &>/dev/null; then \
/usr/lib/systemd/systemd-networkd-wait-online; \
else \
echo "Error: Neither nm-online nor systemd-networkd-wait-online found."; \
exit 1; \
fi'
ExecStartPost=ip -br addr
RemainAfterExit=yes
# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit <THIS SERVICE NAME>`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60
[Install]
WantedBy=default.target |
Another workaround: We can copy $ cat /etc/systemd/user/network-online.target
[Unit]
Description=Network online for systemd --user
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
#After=network.target
$ cat /etc/systemd/user/systemd-networkd-wait-online.service
[Unit]
Description=Wait network online for systemd --user
Documentation=man:systemd-networkd-wait-online.service(8)
Before=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online
RemainAfterExit=yes
[Install]
WantedBy=network-online.target or you can put these files to Then enable the service as a user: $ systemctl --user enable systemd-networkd-wait-online.service Finally we can wait network online for podman, like this: $ cat ~/.config/containers/systemd/my-app.container
[Unit]
Wants=network-online.target
After=network-online.target reference link: https://unix.stackexchange.com/questions/216919/how-can-i-make-my-user-services-wait-till-the-network-is-online |
Hi, Any idea for a workaround when using NetworkManager? I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the dog /etc/systemd/user/network-online.target:
/etc/systemd/user/NetworkManager-wait-online.service:
The above is the system log, 12:43:09 is the user service. As the user running the podman container,
Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:
As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround: |
I haven't used |
Please check this in the container service: [Unit]
Wants=network-online.target
After=network-online.target |
$ systemctl --user status my-app.service
● my-app.service - example deno/fresh app
Loaded: loaded (/var/home/fc-test/.config/containers/systemd/my-app.container; generated)
Drop-In: /usr/lib/systemd/user/service.d
└─10-timeout-abort.conf
Active: active (running) since Wed 2024-07-17 04:21:49 UTC; 20h ago
Main PID: 2026 (conmon) $ systemctl --user list-dependencies my-app
my-app.service
● ├─app.slice
● ├─basic.target
● │ ├─systemd-tmpfiles-setup.service
● │ ├─paths.target
● │ ├─sockets.target
● │ │ └─dbus.socket
● │ └─timers.target
● │ └─systemd-tmpfiles-clean.timer
● └─network-online.target
● └─systemd-networkd-wait-online.service |
Hi @secext2022 , The Unit section is defined correctly. As per my log, the problem is that NetoworkManager-wait-online user service finishes much too soon, much sooner that the system level one. I believe (meaning I'm not sure) that nm-online does not work correctly when run as a user (not designed to be run as a user?). As yet another workaround, I've added
|
After reading this thread and also the comments in systemd/systemd#3312 , I think that thread has much cleaner workarounds than many of the ones in this thread. The problems with the workaround in here are that they are often quite long and convoluted for this relatively simple issue, and may or will break if the system configuration changes, as they are not agnostic on the configuration. But the systemd issue has much cleaner and simpler workarounds:
I haven't tested those, but they should work judging from the thumbs =). I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue (which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!). I'm sorry for adding noise here myself, too =). EDIT: My chosen workaround for the issue (cleanest in my opinion, less prone to break; I chose to name it check-network-online.service but it could be whatever you want it to be): /etc/systemd/user/check-network-online.service:
Enable this service for the user. In badly behaving user services (such as podman quadlets), add:
Of course, YMMV! |
I personally don't find it distracting.
The thing is, pasta(1) picks host addresses and routes by default. This is by design as it allows you to avoid (implicit) NAT altogether. If there's nothing there, it doesn't know what to pick, so it exits. We're now considering to implement an optional netlink monitoring function that would dynamically create and delete routes and addresses as they come and go on the host, see also #22959 (comment). That should be robust enough. |
If the doc said "Quadlets are currently broken. Please see that bug report XXX we have with systemd.", at the top in red and bold, I guess the situation would be improved tremendously. Acknowledging current limits and bugs is a big part of establishing trust with users. As it is, users stumble across this again and again. I can't speak for the general industry but here, no one wants to hear about podman again for instance. |
#24305 implements the work around, would be great if some folks can test it. |
This service is meant to be used by quadlet as replacement for network-online.target as this does not work for rootless users. see containers#22197 Signed-off-by: Paul Holzinger <[email protected]>
As documented in the issue there is no way to wait for system units from the user session[1]. This causes problems for rootless quadlet units as they might be started before the network is fully up. TWhile this was always the case and thus was never really noticed the main thing that trigger a bunch of errors was the switch to pasta. Pasta requires the network to be fully up in order to correctly select the right "template" interface based on the routes. If it cannot find a suitable interface it just fails and we cannot start the container understandingly leading to a lot of frustration from users. As there is no sign of any movement on the systemd issue we work around here by using our own user unit that check if the system session network-online.target it ready. Now for testing it is a bit complicated. While we do now correctly test the root and rootless generator since commit ada75c0 the resulting Wants/After= lines differ between them and there is no logic in the testfiles themself to say if root/rootless to match specifics. One idea was to use `assert-key-is-rootless/root` but that seemed like more duplication for little reason so use a regex and allow both to make it pass always. To still have some test coverage add a check in the system test to ask systemd if we did indeed have the right depdendencies where we can check for exact root/rootless name match. [1] systemd/systemd#3312 Fixes containers#22197 Signed-off-by: Paul Holzinger <[email protected]>
Thanks! I think I have this issue because
When switching to Now I tested My machine is simply speaking built up from five network interfaces from which the machine is accessed. |
Why do you think 5.2.5 included this fix? The releases notes are very clear what it contains https://github.com/containers/podman/releases/tag/v5.2.5. |
Because I guessed that when a PR is merged and a release is created then those changes are in. I took the time to read through the release notes and of course didn't find the change listed there. Since it was missing I looked at a previous release, too, to find out how much I can l rely on the release notes. Some projects don't mention all the changes in there. And: people make mistakes. Things that should be in the release notes sometimes are forgotten to list. Then I looked at the branch that the fix was merged in. Since it wasn't merged in master or main (which I expected) I tried to find out how the merge strategy looks like. I didn't find a graph view on github and then gave up. I didn't want to spend the time to clone it, which I should've done - yes. So that's why. Thanks for letting me know in which release the fix is in. I just want to help and I'll try to check better next time. |
Tip, as I'm familiar with git but not with GitHub and it took me a while to spot this: information equivalent to:
is found, on "commits" pages, just after the end of the commit message. Say, at the page for 57b0227:
|
Yes it shows it on the commits page, however that only works for things going forward. Generally speaking fixes for a new patch (.z) release will not show up in there as it will not pick up the backport commits into the release branch. So for that you would manually need to check the backport commits in the release branch which of course is annoying but I would say the release notes for the patch releases should be complete and not miss stuff as we only do a few backports most of the time. But of course we are human and sometimes things are missed. |
I think this bug may need to be re-opened. I am on Podman 5.3 and I am still getting issues where my rootless containers are not properly starting when I log in.
This occurs when NetworkManager is set to connect to my Wifi on log in (Connection is set to be only available for my user and wifi password is stored in an encrypted form). If I set it to be available for all users with a key stored in plaintext, then the wifi connects long before I get to log in, and my containers restart properly. |
Which means that Podman/systemd units should wait quite a long time before bringing containers up. What would be your expectation? We could also decide that pasta, instead of refusing to start, would assign the container some fake address and routes (like slirp4netns used to do), but then you lose the (default) seamless/transparent addressing. Or would you expect that your containers start only as you log in and your WiFi password is decrypted?
I'm not sure, it covered a scenario that's different enough to be considered another issue altogether, I think. |
I think that's exactly what I would expect for rootless containers created by users that don't have linger enabled. Here's my situation. I have several containers that I start up using podman-compose. I enable the user-level podman-restart service in an attempt to have them restart whenever I log in. The problem is that I think podman-restart isn't waiting for network at all. It is attempting to start up the containers while my computer is connecting to the wifi. Container startup fails as a result. Essentially, the user level podman-restart service is functionally useless if the network takes a long time to come up. |
I have a server without wifi - and it does not work either. The systemd query as installed in I have tried many things to work around this issue. The only one working for me is waiting until the sshd is accessible at port 22. This is what I ended up with:
The difference is about 5s - meaning that the sshd is accessible on the outside statically assigned IP only about 5s after systemctl says that the network-online.target is active. I don't have NetworkManager and/or wifi on the server. Only ethernet with static IP. Nothing fancy. The containers are rootless with linger enabled. |
Any idea why? |
If network-online.target succeeds to early then this is out of scope for podman/quadlet. We cannot possible handle every network setup and know what done means which is exactly why I check for network-online.target because that is already such definition. You can manually fix the target or overwrite podman-user-wait-network-online.service with whatever command you want
I only fixed quadlet units, I forgot to change podman-restart.service and [email protected] as they also start containers. |
I would love it if similar patches were made to those user services as well, as my workflow currently involves dealing with a lot of Docker Compose files and not Quadlet. |
Yes I was not trying to imply that we should not fix them. They definitely need to be fixed the same way, I filled a new issue #24637 to not keep spamming this long issue. |
Not really. When it exits, the eth interface is up, the static IP is assigned. It seems online. It is that just pasta does not like it yet. I am not sure what the ultimate precondition for starting pasta is. Being online as defined by systemd seems not enough.
This is the simplest setup possible. Single eth interface, static IP defined in |
It's also looking for an interface with a route. Not even a default route, just a route, because it shows that that interface is not completely useless. |
In my case for this to work I had to just override the newly supplied
user service that waits for system level network-online with this:
[Service]
ExecStart=
ExecStart=sh -c 'until ping -c 1 google.com; do sleep 5; done'
only then did everything work as it should.
…-Greg
On Thu, 2024-11-21 at 06:13 -0800, sbrivio-rh wrote:
> What else is missing for pasta to start?
It's also looking for an interface with a route. Not even a default
route, just a route, because it shows that that interface is not
completely useless.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Luap99 thanks for the workaround. important note : I'm from nixos and the current release of podman is v5.2.3. Then i test other targets and the
|
Issue Description
Hi,
Since the upgrade to Fedora Silverblue 40 / Podman 5, systemd fail to launch containers at boot.
If I try to launch them manually through
systemctl --user start container.service
, it works as expected.Thanks you!
Steps to reproduce the issue
Steps to reproduce the issue
~/.config/containers/systemd
filesDescribe the results you received
Containers doesn't launch at boot, needs to be started manually
Describe the results you expected
Containers should start at boot.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Fedora Silverblue 40 up-to-date
Additional information
Logs of a container :
mars 28 12:15:09 homeserver jellyfin[7039]: Error: pasta failed with exit code 1:
mars 28 12:15:09 homeserver jellyfin[7039]: External interface not usable
The text was updated successfully, but these errors were encountered: