Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta #22197

Froggy232 · 2024-03-28T11:29:21Z

Issue Description

Hi,
Since the upgrade to Fedora Silverblue 40 / Podman 5, systemd fail to launch containers at boot.
If I try to launch them manually through systemctl --user start container.service, it works as expected.
Thanks you!

Steps to reproduce the issue

Automatize the gestion of container through quadlet / ~/.config/containers/systemd files
Restart the server and see that containers failed to launch

Describe the results you received

Containers doesn't launch at boot, needs to be started manually

Describe the results you expected

Containers should start at boot.

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-4.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 99.37
    systemPercent: 0.21
    userPercent: 0.42
  cpus: 32
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: silverblue
    version: "40"
  eventLogger: journald
  freeLocks: 2047
  hostname: homeserver
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1020
      size: 1
    - container_id: 1
      host_id: 1703936
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1020
      size: 1
    - container_id: 1
      host_id: 1703936
      size: 65536
  kernel: 6.8.1-300.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 64334761984
  memTotal: 67334115328
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-3.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1020/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240320.g71dd405-1.fc40.x86_64
    version: |
      pasta 0^20240320.g71dd405-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1020/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 146028879872
  swapTotal: 146028879872
  uptime: 0h 14m 2.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/srv/media-server/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /srv/media-server/.local/share/containers/storage
  graphRootAllocated: 3999065440256
  graphRootUsed: 1034920087552
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 14
  runRoot: /run/user/1020/containers
  transientStore: false
  volumePath: /var/srv/media-server/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.0
  Built: 1710806400
  BuiltTime: Tue Mar 19 01:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Fedora Silverblue 40 up-to-date

Additional information

Logs of a container :

mars 28 12:15:09 homeserver jellyfin[7039]: Error: pasta failed with exit code 1:
mars 28 12:15:09 homeserver jellyfin[7039]: External interface not usable

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-03-28T12:23:19Z

You have to make sure your network is fully set up before the unit is started.

rhatdan · 2024-03-29T11:08:43Z

This feel like it could be related to the same question in #22057

flyingfishflash · 2024-03-29T20:13:03Z

I have not been able to get a rootless user quadlet to wait for my network to be ready even adding

[Unit]
wants=nss-online.target
after=nss-online.target

No issues on 4.9.3

Luap99 · 2024-03-29T20:46:15Z

@flyingfishflash You cannot wait for system units from user units, see systemd/systemd#3312

I wasn't aware that the user units start before the network is fully set up and that it causes such big trouble with pasta. Note you do not need to downgrade, you can just change the default back to slirp4netns in containers.conf, see the last part in the pasta section on https://blog.podman.io/2024/03/podman-5-0-breaking-changes-in-detail/

You could also do something like this #22190 (comment)

Of course none of this is a proper solution but I am sure we will find something to address this in a better way soon.

flyingfishflash · 2024-03-29T22:00:29Z

@Luap99 - thank you for this tip re containers.conf!

gdonval · 2024-04-12T14:42:49Z

You could also do something like this #22190 (comment)

No. It's as much of a bad practice today as it was 50 years ago.

Klowner · 2024-04-25T20:18:56Z

I ran into this issue today and finally learned that systemd user level units apparently can't depend on system level units (such as network-online.target)

I've managed a workaround that satisfies my desire to avoid arbitrary timeouts by creating a user-level network-online.service and network-online.target

# ~/.config/systemd/user/network-online.service
[Unit]
Description=User-level proxy to system-level network-online.target

[Service]
type=oneshot
ExecStart=/bin/bash -c 'until systemctl --machine=%[email protected] is-active network-online.target; do sleep 1; done'

[Install]
WantedBy=default.target

# ~/.config/systemd/user/network-online.target
[Unit]
Description=User-level network-online.target
Requires=network-online.service
Wants=network-online.service
After=network-online.service

Then in your quadlet units:

[Unit]
After=network-online.target

soiamsoNG · 2024-04-27T03:03:07Z

seems it just work after you can ping an external ip (include gateway ip)

djarbz · 2024-05-06T21:37:47Z

I'll share my workaround, but it might be a good idea to have a podman network --health command to verify by driver and network and such.

#[Unit]
Description=Wait for network to be online via NetworkManager or Systemd-Networkd

[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.

Type=oneshot
# At least one of these should work depending if using NetworkManager or Systemd-Networkd
ExecStart=/bin/bash -c ' \
    if command -v nm-online &>/dev/null; then \
        nm-online -s -q; \
    elif command -v /usr/lib/systemd/systemd-networkd-wait-online &>/dev/null; then \
        /usr/lib/systemd/systemd-networkd-wait-online; \
    else \
        echo "Error: Neither nm-online nor systemd-networkd-wait-online found."; \
        exit 1; \
    fi'
ExecStartPost=ip -br addr
RemainAfterExit=yes

# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit <THIS SERVICE NAME>`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60

[Install]
WantedBy=default.target

secext2022 · 2024-06-23T23:15:49Z

Another workaround:

We can copy network-online.target from system to user, with a little modify, like this:

$ cat /etc/systemd/user/network-online.target
[Unit]
Description=Network online for systemd --user
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
#After=network.target

$ cat /etc/systemd/user/systemd-networkd-wait-online.service
[Unit]
Description=Wait network online for systemd --user
Documentation=man:systemd-networkd-wait-online.service(8)
Before=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online
RemainAfterExit=yes

[Install]
WantedBy=network-online.target

or you can put these files to ~/.config/systemd/user for only one user.

Then enable the service as a user:

$ systemctl --user enable systemd-networkd-wait-online.service

Finally we can wait network online for podman, like this:

$ cat ~/.config/containers/systemd/my-app.container
[Unit]
Wants=network-online.target
After=network-online.target

reference link: https://unix.stackexchange.com/questions/216919/how-can-i-make-my-user-services-wait-till-the-network-is-online

WildPenquin · 2024-07-17T10:30:08Z

Hi,

Any idea for a workaround when using NetworkManager?

I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the -s, but the behavior is still the same.

dog /etc/systemd/user/network-online.target:

#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
# After=network.target

/etc/systemd/user/NetworkManager-wait-online.service:

[Unit]
Description=Network Manager Wait Online for Users
Documentation=man:NetworkManager-wait-online.service(8)
Requires=NetworkManager.service
After=NetworkManager.service
Before=network-online.target

[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.

Type=oneshot
ExecStart=/usr/bin/nm-online -q
RemainAfterExit=yes

# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit NetworkManager-wait-online`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60

[Install]
WantedBy=network-online.target

journalctl -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[1]: Starting Network Manager Wait Online...
Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Jul 17 12:43:16 archnuke systemd[1]: Finished Network Manager Wait Online.
Jul 17 12:43:16 archnuke systemd[1]: Reached target Network is Online.

The above is the system log, 12:43:09 is the user service. As the user running the podman container, LANG=C journalctl --user -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.

Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:

systemctl --user status NetworkManager-wait-online.service 
○ NetworkManager-wait-online.service - Network Manager Wait Online for Users
     Loaded: loaded (/etc/xdg/systemd/user/NetworkManager-wait-online.service; enabled; preset: enabled)
     Active: inactive (dead)
       Docs: man:NetworkManager-wait-online.service(8)

As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround:
ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'

djarbz · 2024-07-17T10:46:00Z

I haven't used /etc/systemd/user, but my unit works, at least I haven't noticed an issue, when placed in ~/.config/Systemd/user.

secext2022 · 2024-07-18T00:48:10Z

Hi,

Any idea for a workaround when using NetworkManager?

I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the -s, but the behavior is still the same.

dog /etc/systemd/user/network-online.target:

#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
# After=network.target

/etc/systemd/user/NetworkManager-wait-online.service:

[Unit]
Description=Network Manager Wait Online for Users
Documentation=man:NetworkManager-wait-online.service(8)
Requires=NetworkManager.service
After=NetworkManager.service
Before=network-online.target

[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.

Type=oneshot
ExecStart=/usr/bin/nm-online -q
RemainAfterExit=yes

# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit NetworkManager-wait-online`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60

[Install]
WantedBy=network-online.target

journalctl -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[1]: Starting Network Manager Wait Online...
Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Jul 17 12:43:16 archnuke systemd[1]: Finished Network Manager Wait Online.
Jul 17 12:43:16 archnuke systemd[1]: Reached target Network is Online.

The above is the system log, 12:43:09 is the user service. As the user running the podman container, LANG=C journalctl --user -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.

Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:

systemctl --user status NetworkManager-wait-online.service 
○ NetworkManager-wait-online.service - Network Manager Wait Online for Users
     Loaded: loaded (/etc/xdg/systemd/user/NetworkManager-wait-online.service; enabled; preset: enabled)
     Active: inactive (dead)
       Docs: man:NetworkManager-wait-online.service(8)

As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround: ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'

@WildPenquin

Please check this in the container service:

[Unit]
Wants=network-online.target
After=network-online.target

secext2022 · 2024-07-18T00:56:48Z

$ systemctl --user status my-app.service
● my-app.service - example deno/fresh app
     Loaded: loaded (/var/home/fc-test/.config/containers/systemd/my-app.container; generated)
    Drop-In: /usr/lib/systemd/user/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Wed 2024-07-17 04:21:49 UTC; 20h ago
   Main PID: 2026 (conmon)

$ systemctl --user list-dependencies my-app
my-app.service
● ├─app.slice
● ├─basic.target
● │ ├─systemd-tmpfiles-setup.service
● │ ├─paths.target
● │ ├─sockets.target
● │ │ └─dbus.socket
● │ └─timers.target
● │   └─systemd-tmpfiles-clean.timer
● └─network-online.target
●   └─systemd-networkd-wait-online.service

WildPenquin · 2024-07-19T13:15:13Z

Hi @secext2022 ,

The Unit section is defined correctly.

As per my log, the problem is that NetoworkManager-wait-online user service finishes much too soon, much sooner that the system level one. I believe (meaning I'm not sure) that nm-online does not work correctly when run as a user (not designed to be run as a user?).

As yet another workaround, I've added ExecStartPre=/bin/sh -c 'until ping -c1 192.168.66.6; do sleep 1; done;' under [Service]. On the TODO list, I'm going to test if this works correctly if I change my interface to be managed by systemd-networkd with and use the systemd-networkd-wait-online service instead.

$ systemctl --user status pande-pmc.service

● pande-pmc.service - PandESportS MC-serveri
     Loaded: loaded (/home/minecraft/.config/containers/systemd/pande-pmc.container; generated)
     Active: active (running) since Fri 2024-07-19 16:04:56 EEST; 4min 55s ago
 Invocation: 9858022ff77a4dd38327d8c513324e7d
    Process: 829 ExecStartPre=/bin/sh -c until ping -c1 192.168.66.6; do sleep 1; done; (code=exited, status=0/SUCCESS)
   Main PID: 906 (conmon)
      Tasks: 82 (limit: 28525)
     Memory: 6.2G (peak: 6.2G)
        CPU: 1min 6.141s

$ systemctl --user list-dependencies pande-pmc.service

pande-pmc.service
● ├─app.slice
● ├─basic.target
● │ ├─paths.target
● │ ├─sockets.target
● │ │ ├─dbus.socket
● │ │ ├─dirmngr.socket
● │ │ ├─drkonqi-coredump-launcher.socket
● │ │ ├─gpg-agent-browser.socket
● │ │ ├─gpg-agent-extra.socket
● │ │ ├─gpg-agent-ssh.socket
● │ │ ├─gpg-agent.socket
● │ │ ├─keyboxd.socket
● │ │ ├─p11-kit-server.socket
● │ │ ├─pipewire-pulse.socket
● │ │ └─pipewire.socket
● │ └─timers.target
○ │   ├─drkonqi-coredump-cleanup.timer
○ │   └─drkonqi-sentry-postman.timer
● └─network-online.target
○   └─NetworkManager-wait-online.service

config/containers/systemd/pande-pmc.container:

[Unit]
Description=PandESportS MC-serveri

After=network-online.target
Wants=network-online.target


[Container]
AutoUpdate=registry
ContainerName=PandEPMC
Image=docker.io/gameservermanagers/gameserver:pmc
Volume=pandepmc:/data
LogDriver=k8s-file
PublishPort=25560:25560/tcp
PublishPort=25560:25560/udp
PodmanArgs=--log-opt=path=/home/minecraft/PandEPMClog.k8s
Timezone=local

[Service]
ExecStartPre=/bin/sh -c 'until ping -c1 192.168.66.6; do sleep 1; done;'
# Restart=always
Restart=no

[Install]
WantedBy=multi-user.target default.target

WildPenquin · 2024-07-19T14:59:21Z

After reading this thread and also the comments in systemd/systemd#3312 , I think that thread has much cleaner workarounds than many of the ones in this thread. The problems with the workaround in here are that they are often quite long and convoluted for this relatively simple issue, and may or will break if the system configuration changes, as they are not agnostic on the configuration. But the systemd issue has much cleaner and simpler workarounds:

Make the whole user@UID service depend on network-online (RFE: monitor system units from user manager systemd/systemd#3312 (comment)) - but read the whole comment for caveats! This will work if you have a dedicated user for running containers which are useless without a network (so the caveats don't matter). Rename the [email protected] to include the UID to not enable this for all users. 3 lines, changing user@ service.
Make one user service which checks the system level network-online.target (RFE: monitor system units from user manager systemd/systemd#3312 (comment)). Then make quadlets depends on this service. This is ~~4 lines of code~~ one simple service file which should work as long as system network-online.target is configured properly. You could replace the systemctl is-active with a ping to your GW or, say, Google, depending on what your services actually need to work around badly written software ("online" does not necessarily mean connection to Internet, nor, I presume, even to your default GW). But there's no need to "copy" *-wait-online to the user services, which is prone to break (and does not work for NM at all, it seems).

I haven't tested those, but they should work judging from the thumbs =).

I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue (which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!).

I'm sorry for adding noise here myself, too =).

EDIT: My chosen workaround for the issue (cleanest in my opinion, less prone to break; I chose to name it check-network-online.service but it could be whatever you want it to be):

/etc/systemd/user/check-network-online.service:

[Unit]
Description=Check for system level network-online.target (for users)

[Service]
Type=oneshot
ExecStart=bash -c 'until systemctl is-active network-online.target; do sleep 1; done'
RemainAfterExit=yes

[Install]
WantedBy=default.target

Enable this service for the user. In badly behaving user services (such as podman quadlets), add:

After=check-network-online.service

Of course, YMMV!

sbrivio-rh · 2024-07-25T21:25:49Z

I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue

I personally don't find it distracting.

(which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!).

The thing is, pasta(1) picks host addresses and routes by default. This is by design as it allows you to avoid (implicit) NAT altogether. If there's nothing there, it doesn't know what to pick, so it exits.

We're now considering to implement an optional netlink monitoring function that would dynamically create and delete routes and addresses as they come and go on the host, see also #22959 (comment). That should be robust enough.

vrothberg · 2024-09-20T13:51:19Z

@Luap99 @rhatdan @ygalblum shall we update the quadlet docs to point that out?

Sitting in a meeting where this issue was brought up.

gdonval · 2024-09-20T14:04:00Z

If the doc said "Quadlets are currently broken. Please see that bug report XXX we have with systemd.", at the top in red and bold, I guess the situation would be improved tremendously. Acknowledging current limits and bugs is a big part of establishing trust with users.

As it is, users stumble across this again and again. I can't speak for the general industry but here, no one wants to hear about podman again for instance.

Luap99 · 2024-10-17T13:10:45Z

#24305 implements the work around, would be great if some folks can test it.

This service is meant to be used by quadlet as replacement for network-online.target as this does not work for rootless users. see containers#22197 Signed-off-by: Paul Holzinger <[email protected]>

As documented in the issue there is no way to wait for system units from the user session[1]. This causes problems for rootless quadlet units as they might be started before the network is fully up. TWhile this was always the case and thus was never really noticed the main thing that trigger a bunch of errors was the switch to pasta. Pasta requires the network to be fully up in order to correctly select the right "template" interface based on the routes. If it cannot find a suitable interface it just fails and we cannot start the container understandingly leading to a lot of frustration from users. As there is no sign of any movement on the systemd issue we work around here by using our own user unit that check if the system session network-online.target it ready. Now for testing it is a bit complicated. While we do now correctly test the root and rootless generator since commit ada75c0 the resulting Wants/After= lines differ between them and there is no logic in the testfiles themself to say if root/rootless to match specifics. One idea was to use `assert-key-is-rootless/root` but that seemed like more duplication for little reason so use a regex and allow both to make it pass always. To still have some test coverage add a check in the system test to ask systemd if we did indeed have the right depdendencies where we can check for exact root/rootless name match. [1] systemd/systemd#3312 Fixes containers#22197 Signed-off-by: Paul Holzinger <[email protected]>

topas-rec · 2024-10-27T08:01:48Z

Thanks!

I think I have this issue because

I cannot access my container after host reboot
restarting the container without host reboot makes it accessible and
I have pasta[539]: External interface not usable in my hosts boot logs

When switching to slirp4netns as suggested in this issue makes the container access also work after reboot.

Now I tested podman 5.2.5 which should include #24305.
With pasta as the default backend the issue is not solved. Should it be solved? How can I help?

My machine is simply speaking built up from five network interfaces from which the machine is accessed.
(I also use a bond, and rate limiting which uses IFB interfaces and also VLAN interfaces, but all this is not the cause of the issue I think since "everything else works"™ and this issue is gone with slirp4netns.

Luap99 · 2024-10-27T13:10:45Z

Now I tested podman 5.2.5 which should include #24305.

Why do you think 5.2.5 included this fix? The releases notes are very clear what it contains https://github.com/containers/podman/releases/tag/v5.2.5.
It is in v5.3.0-rc1 so it will land in 5.3.0 final.

topas-rec · 2024-10-28T05:38:46Z

Because I guessed that when a PR is merged and a release is created then those changes are in.

I took the time to read through the release notes and of course didn't find the change listed there. Since it was missing I looked at a previous release, too, to find out how much I can l rely on the release notes. Some projects don't mention all the changes in there. And: people make mistakes. Things that should be in the release notes sometimes are forgotten to list.

Then I looked at the branch that the fix was merged in. Since it wasn't merged in master or main (which I expected) I tried to find out how the merge strategy looks like. I didn't find a graph view on github and then gave up. I didn't want to spend the time to clone it, which I should've done - yes.

So that's why.

Thanks for letting me know in which release the fix is in. I just want to help and I'll try to check better next time.

sbrivio-rh · 2024-10-28T17:42:07Z

Then I looked at the branch that the fix was merged in. Since it wasn't merged in master or main (which I expected) I tried to find out how the merge strategy looks like. I didn't find a graph view on github and then gave up. I didn't want to spend the time to clone it, which I should've done - yes.

Tip, as I'm familiar with git but not with GitHub and it took me a while to spot this: information equivalent to:

$ git describe --contains 57b022782bba8cd48865f9dd84e9fea8a1588e4c
v5.3.0-rc1~10^2~1

is found, on "commits" pages, just after the end of the commit message. Say, at the page for 57b0227:

main (#24305) 
v5.3.0-rc1

Luap99 · 2024-10-29T10:46:06Z

Yes it shows it on the commits page, however that only works for things going forward. Generally speaking fixes for a new patch (.z) release will not show up in there as it will not pick up the backport commits into the release branch. So for that you would manually need to check the backport commits in the release branch which of course is annoying but I would say the release notes for the patch releases should be complete and not miss stuff as we only do a few backports most of the time. But of course we are human and sometimes things are missed.

urbenlegend · 2024-11-21T01:32:29Z

I think this bug may need to be re-opened. I am on Podman 5.3 and I am still getting issues where my rootless containers are not properly starting when I log in.

Nov 20 17:24:13 arch-desktop podman[1042]: time="2024-11-20T17:24:13-08:00" level=error msg="Starting some container dependencies"
Nov 20 17:24:13 arch-desktop podman[1042]: time="2024-11-20T17:24:13-08:00" level=error msg="\"setting up Pasta: pasta failed with exit code 1:\\nExternal interface not usable\\n\""
Nov 20 17:24:13 arch-desktop podman[1042]: Error: unable to start container "658c7a404e78463fefbb4ecd8ae413efefdb0b49ce44af8c838edecd92f3084b": setting up Pasta: pasta failed with exit code 1:
Nov 20 17:24:13 arch-desktop podman[1042]: External interface not usable
Nov 20 17:24:13 arch-desktop podman[1042]: Error: unable to start container "67226d53cf17d652b1cedb5ad563e5a4416c8e119223231a83e9e33d61fd26d7": starting some containers: internal libpod error
Nov 20 17:24:13 arch-desktop podman[1042]: time="2024-11-20T17:24:13-08:00" level=info msg="Received shutdown.Stop(), terminating!" PID=1042
Nov 20 17:24:13 arch-desktop systemd[974]: podman-restart.service: Main process exited, code=exited, status=125/n/a
Nov 20 17:24:13 arch-desktop systemd[974]: podman-restart.service: Failed with result 'exit-code'.
Nov 20 17:24:13 arch-desktop systemd[974]: Failed to start Podman Start All Containers With Restart Policy Set To Always.
Nov 20 17:24:13 arch-desktop systemd[974]: podman-restart.service: Consumed 187ms CPU time, 100.1M memory peak.

This occurs when NetworkManager is set to connect to my Wifi on log in (Connection is set to be only available for my user and wifi password is stored in an encrypted form). If I set it to be available for all users with a key stored in plaintext, then the wifi connects long before I get to log in, and my containers restart properly.

sbrivio-rh · 2024-11-21T04:23:53Z

This occurs when NetworkManager is set to connect to my Wifi on log in

Which means that Podman/systemd units should wait quite a long time before bringing containers up.

What would be your expectation? We could also decide that pasta, instead of refusing to start, would assign the container some fake address and routes (like slirp4netns used to do), but then you lose the (default) seamless/transparent addressing.

Or would you expect that your containers start only as you log in and your WiFi password is decrypted?

I think this bug may need to be re-opened

I'm not sure, it covered a scenario that's different enough to be considered another issue altogether, I think.

urbenlegend · 2024-11-21T05:47:29Z

Or would you expect that your containers start only as you log in and your WiFi password is decrypted?

I think that's exactly what I would expect for rootless containers created by users that don't have linger enabled.

Here's my situation. I have several containers that I start up using podman-compose. I enable the user-level podman-restart service in an attempt to have them restart whenever I log in. The problem is that I think podman-restart isn't waiting for network at all. It is attempting to start up the containers while my computer is connecting to the wifi. Container startup fails as a result.

Essentially, the user level podman-restart service is functionally useless if the network takes a long time to come up.

zbynekwinkler · 2024-11-21T07:28:25Z

I have a server without wifi - and it does not work either. The systemd query as installed in /usr/lib/systemd/user/podman-user-wait-network-online.service where I find ExecStart=sh -c 'until systemctl is-active network-online.target; do sleep 0.5; done' exits very early in the boot process - certainly at a time when pasta still complains.

I have tried many things to work around this issue. The only one working for me is waiting until the sshd is accessible at port 22. This is what I ended up with:

ExecStart=/bin/bash -c 'until nc -vzw 1 $(hostname -I | cut -f1 -d" ") 22; do sleep 1; done'

The difference is about 5s - meaning that the sshd is accessible on the outside statically assigned IP only about 5s after systemctl says that the network-online.target is active.

I don't have NetworkManager and/or wifi on the server. Only ethernet with static IP. Nothing fancy. The containers are rootless with linger enabled.

sbrivio-rh · 2024-11-21T07:35:24Z

I find ExecStart=sh -c 'until systemctl is-active network-online.target; do sleep 0.5; done' exits very early in the boot process

Any idea why?

Luap99 · 2024-11-21T09:54:40Z

If network-online.target succeeds to early then this is out of scope for podman/quadlet. We cannot possible handle every network setup and know what done means which is exactly why I check for network-online.target because that is already such definition.

You can manually fix the target or overwrite podman-user-wait-network-online.service with whatever command you want

podman-restart.service

I only fixed quadlet units, I forgot to change podman-restart.service and [email protected] as they also start containers.
(Note we strongly recommend using quadlet units over the podman-restart.service) as systemd restart logic is much better that the podman run --restart flag.

urbenlegend · 2024-11-21T10:01:36Z

I forgot to change podman-restart.service and [email protected] as they also start containers.
(Note we strongly recommend using quadlet units over the podman-restart.service) as systemd restart logic is much better that the podman run --restart flag.

I would love it if similar patches were made to those user services as well, as my workflow currently involves dealing with a lot of Docker Compose files and not Quadlet.

Luap99 · 2024-11-21T10:21:00Z

Yes I was not trying to imply that we should not fix them. They definitely need to be fixed the same way, I filled a new issue #24637 to not keep spamming this long issue.

zbynekwinkler · 2024-11-21T14:09:24Z

I find ExecStart=sh -c 'until systemctl is-active network-online.target; do sleep 0.5; done' exits very early in the boot process

Any idea why?

Not really. When it exits, the eth interface is up, the static IP is assigned. It seems online. It is that just pasta does not like it yet. I am not sure what the ultimate precondition for starting pasta is. Being online as defined by systemd seems not enough.

If network-online.target succeeds to early then this is out of scope for podman/quadlet. We cannot possible handle every network setup and know what done means which is exactly why I check for network-online.target because that is already such definition.

This is the simplest setup possible. Single eth interface, static IP defined in /etc/network/interfaces. What else is missing for pasta to start?

sbrivio-rh · 2024-11-21T14:12:37Z

What else is missing for pasta to start?

It's also looking for an interface with a route. Not even a default route, just a route, because it shows that that interface is not completely useless.

flyingfishflash · 2024-11-25T14:31:21Z

In my case for this to work I had to just override the newly supplied user service that waits for system level network-online with this: [Service] ExecStart= ExecStart=sh -c 'until ping -c 1 google.com; do sleep 5; done' only then did everything work as it should.

…

-Greg

On Thu, 2024-11-21 at 06:13 -0800, sbrivio-rh wrote: > What else is missing for pasta to start? It's also looking for an interface with a route. Not even a default route, just a route, because it shows that that interface is not completely useless. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

stratdev3 · 2024-12-29T12:52:39Z

@Luap99 thanks for the workaround.

important note :

I'm from nixos and the current release of podman is v5.2.3.
Passing a couple of days testing your workaround, it never works : although network-online was active, pasta still throw error.

Then i test other targets and the multi-user.target was the only one working :

/bin/bash -c 'until systemctl is-active multi-user.target; do sleep 0.5; done;'

Froggy232 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2024

Luap99 added network Networking related issue or feature pasta pasta(1) bugs or features labels Mar 28, 2024

Luap99 mentioned this issue Mar 28, 2024

rootless networking error on container startup since v5.0.0 #22168

Closed

sbrivio-rh mentioned this issue Mar 29, 2024

Quadlet wont start on boot, pasta says "Couldn't set IPv4 route(s) in guest: Invalid argument" #22190

Closed

gdonval mentioned this issue Apr 12, 2024

Revert to slirp4netns #22363

Closed

Luap99 mentioned this issue Apr 22, 2024

User-level podman-restart.service does not work properly on reboot #22451

Closed

Luap99 mentioned this issue May 8, 2024

Rootless Podman networking not working on boot until after removing user rootless-netns folder and restarting service #22637

Closed

sbrivio-rh mentioned this issue Jun 11, 2024

Connections to the port exposed via --publish are dropped and do not reach the contained process #22959

Open

Luap99 mentioned this issue Jun 23, 2024

[Quadlet] [rootless] dependencies not working #23077

Closed

sbrivio-rh mentioned this issue Jul 25, 2024

Rootless containers do not start due to pasta failing after reboot #23406

Closed

Luap99 mentioned this issue Oct 17, 2024

quadlet: make user units wait for network #24305

Merged

openshift-merge-bot bot closed this as completed in #24305 Oct 18, 2024

sbrivio-rh mentioned this issue Nov 20, 2024

Latest version of Podman available for RHEL 9 fails when trying to run container on disconnected host #24614

Closed

Luap99 mentioned this issue Nov 21, 2024

rootless podman units should wait for podman-user-wait-network-online.service #24637

Open

Luap99 mentioned this issue Dec 9, 2024

User units and network-online.target #24796

Open

stratdev3 mentioned this issue Dec 28, 2024

podman error with slirp4netns/pasta in systemd user services [nixos 24.11] NixOS/nixpkgs#368830

Closed

Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta #22197

Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta #22197

Comments

Froggy232 commented Mar 28, 2024

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented Mar 28, 2024

rhatdan commented Mar 29, 2024

flyingfishflash commented Mar 29, 2024 • edited Loading

Luap99 commented Mar 29, 2024

flyingfishflash commented Mar 29, 2024 • edited Loading

gdonval commented Apr 12, 2024

Klowner commented Apr 25, 2024

soiamsoNG commented Apr 27, 2024

djarbz commented May 6, 2024 • edited Loading

secext2022 commented Jun 23, 2024

WildPenquin commented Jul 17, 2024

djarbz commented Jul 17, 2024

secext2022 commented Jul 18, 2024

secext2022 commented Jul 18, 2024

WildPenquin commented Jul 19, 2024 • edited Loading

WildPenquin commented Jul 19, 2024 • edited Loading

sbrivio-rh commented Jul 25, 2024

vrothberg commented Sep 20, 2024

gdonval commented Sep 20, 2024

Luap99 commented Oct 17, 2024

topas-rec commented Oct 27, 2024 • edited Loading

Luap99 commented Oct 27, 2024

topas-rec commented Oct 28, 2024

sbrivio-rh commented Oct 28, 2024

Luap99 commented Oct 29, 2024

urbenlegend commented Nov 21, 2024

sbrivio-rh commented Nov 21, 2024

urbenlegend commented Nov 21, 2024

zbynekwinkler commented Nov 21, 2024

sbrivio-rh commented Nov 21, 2024

Luap99 commented Nov 21, 2024

urbenlegend commented Nov 21, 2024

Luap99 commented Nov 21, 2024

zbynekwinkler commented Nov 21, 2024

sbrivio-rh commented Nov 21, 2024

flyingfishflash commented Nov 25, 2024 via email

stratdev3 commented Dec 29, 2024 • edited Loading

flyingfishflash commented Mar 29, 2024 •

edited

Loading

flyingfishflash commented Mar 29, 2024 •

edited

Loading

djarbz commented May 6, 2024 •

edited

Loading

WildPenquin commented Jul 19, 2024 •

edited

Loading

WildPenquin commented Jul 19, 2024 •

edited

Loading

topas-rec commented Oct 27, 2024 •

edited

Loading

stratdev3 commented Dec 29, 2024 •

edited

Loading