automatic start of rootless containers with systemd fails #4678

fansari · 2019-12-11T16:50:19Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Since some days my rootless containers do not start anymore automatically. This does not happen to my root containers.

Steps to reproduce the issue:

Describe the results you received:

After I boot my PC my root containers are started - my rootless containers not.

When I start the container after booting manually it starts with no errors.

This is what I find after booting:

[fansari@bat ~]$ systemctl --user status podman-ldap
● podman-ldap.service - Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
   Loaded: loaded (/var/home/fansari/.config/systemd/user/podman-ldap.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2019-12-11 17:38:33 CET; 39s ago
     Docs: man:podman-generate-systemd(1)
  Process: 2495 ExecStart=/usr/bin/podman start ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f (code=exited, status=125)
      CPU: 103ms

Dec 11 17:38:33 bat.localdomain systemd[1759]: podman-ldap.service: Control process exited, code=exited, status=125/n/a
Dec 11 17:38:33 bat.localdomain systemd[1759]: podman-ldap.service: Failed with result 'exit-code'.
Dec 11 17:38:33 bat.localdomain systemd[1759]: Failed to start Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service.
Dec 11 17:38:33 bat.localdomain systemd[1759]: podman-ldap.service: Service RestartSec=100ms expired, scheduling restart.
Dec 11 17:38:33 bat.localdomain systemd[1759]: podman-ldap.service: Scheduled restart job, restart counter is at 5.
Dec 11 17:38:33 bat.localdomain systemd[1759]: Stopped Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service.
Dec 11 17:38:33 bat.localdomain systemd[1759]: podman-ldap.service: Start request repeated too quickly.
Dec 11 17:38:33 bat.localdomain systemd[1759]: podman-ldap.service: Failed with result 'exit-code'.
Dec 11 17:38:33 bat.localdomain systemd[1759]: Failed to start Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service.

I use the autogenerated systemd service.

# container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
# autogenerated by Podman 1.6.2
# Tue Dec 10 18:48:56 CET 2019

[Unit]
Description=Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
Documentation=man:podman-generate-systemd(1)

[Service]
Restart=on-failure
ExecStart=/usr/bin/podman start ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f
ExecStop=/usr/bin/podman stop -t 10 ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f
KillMode=none
Type=forking
PIDFile=/run/user/1000/overlay-containers/ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f/userdata/conmon.pid

[Install]
WantedBy=multi-user.target

Describe the results you expected:

Alls containers should be started after booting my PC.

Additional information you deem important (e.g. issue happens only occasionally):

This started to happen some days ago (maybe one week). Before this it was working. I did not any changes to these containers.

Output of podman version:

Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.13.1
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.13.1
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 4961218560
  MemTotal: 8229089280
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 8589930496
  SwapTotal: 8589930496
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: bat.localdomain
  kernel: 5.3.15-300.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 6m 43.48s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  ConfigFile: /var/home/fansari/.config/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-2.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /var/mnt/data/podman/fansari/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 7
  RunRoot: /run/user/1000
  VolumePath: /var/home/fansari/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.6.2-2.fc31.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

rhatdan · 2019-12-11T20:26:45Z

@giuseppe @vrothberg Ideas?

smekkley · 2019-12-12T05:25:40Z

I use podman-run but I always have to do StartExecPre=podman system migrate on centos7. It might solve this problem as well. I'll investigate the cause soon for this.

mheon · 2019-12-12T06:01:49Z

More detailed logs from the service would be nice - using --log-level=debug for debug logs (and maybe --syslog to force them to print to the journal) would give us a lot more to work with.

vrothberg · 2019-12-12T08:20:50Z

Yes, logs would be hugely appreciated. I am under the impression that podman start fails, triggering the restart policy to shoot.

fansari · 2019-12-12T18:21:27Z

Here more logs. Something goes wrong with resolv.conf. Strange thing is when I start the container manually after boot everything works.

podman-ldap.log

vrothberg · 2019-12-13T09:31:28Z

Yes, logs would be hugely appreciated. I am under the impression that podman start fails, triggering the restart policy to shoot.

That's it. I am convinced that systemd has no influence as starting the container fails with:

[...] level=error msg="unable to start container \"ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f\": error creating resolv.conf for container ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f: lstat /var/../run/NetworkManager/resolv.conf: no such file or directory"

Starting the container manually with $ podman start ae3453a9ad should fail as well. @fansari can you try that to verify?

The lstat errors comes from the stdlib trying to follow the symlink which for some reasons doesn't exist. It could be a volume or mount issue, or something else. @fansari, can you share some information about the container (e.g., podman inspect ae3453a9ad)?

fansari · 2019-12-14T15:14:53Z

"podman start ae3453a9ada5" works without problems.

And yes - on the host system there is a link /etc/resolv.conf -> /var/run/NetworkManager/resolv.conf.

Is it perhpas possible that the container tries to start before this link is available?

podman-inpect.log

To check this I have removed the symbolic link and put the content to a normal file /etc/resolv.conf.

Also I had to put "dns=none" into the main section of /etc/NetworkManager/NetworkManager.conf.

After doing this the container starts automatically after boot as it should.

This is a workaround - not the solution.

rhatdan · 2019-12-16T14:35:14Z

Should the systemd unit file depend on the network being started then?

fansari · 2019-12-17T13:53:24Z

First of all I cannot remember that I had this issue before.So this should not happen this way.

I have now tried to add dependencies into the systemd files - but this does not help.

For example I tried this:

After=network-online.target
Requires=network-online.target

or this:

After=NetworkManager
Requires=NetworkManager

I do not understand why if I configure these dependencies the containers are not able to find /var/run/NetworkManager/resolv.conf. When is this file created?

With dependencies on NetworkManager as given here I check logfiles and stat on the file.

Here the error from podman:

Dec 17 15:08:32 bat.localdomain podman[2564]: time="2019-12-17T15:08:32+01:00" level=error msg="unable to start container \"ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f\": error creating resolv.conf for container ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f: lstat /run/NetworkManager/resolv.conf: no such file or directory"

Now resolv.conf:

[fansari@bat ~]$ stat /run/NetworkManager/resolv.conf 
  File: /run/NetworkManager/resolv.conf
  Size: 137             Blocks: 8          IO Block: 4096   regular file
Device: 19h/25d Inode: 46798       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:net_conf_t:s0
Access: 2019-12-17 15:08:51.390296781 +0100
Modify: 2019-12-17 15:08:35.515738983 +0100
Change: 2019-12-17 15:08:35.515738983 +0100
 Birth: -

So from this it is clear that podman tries to read resolv.conf before NetworkManager has created it.

To avoid this I have configured the dependency on NetworkManager (I also tried network-online.target which did not work either).

Obviously this is not the correct way to define a dependency to /run/NetworkManager/resolv.conf. So what is the correct way doing this?

This again a workaraound and should not be the solution:

# container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
# autogenerated by Podman 1.6.2
# Tue Dec 10 18:48:56 CET 2019

[Unit]
Description=Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
Documentation=man:podman-generate-systemd(1)
After=network-online.target
Requires=network-online.target

[Service]
Restart=on-failure
ExecStartPre=/bin/sleep 30
ExecStart=/usr/bin/podman start ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f
ExecStop=/usr/bin/podman stop -t 10 ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f
KillMode=none
Type=forking
PIDFile=/run/user/1000/overlay-containers/ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f/userdata/conmon.pid

[Install]
WantedBy=default.target

fansari · 2019-12-21T18:33:57Z

Here my next ugly workaround:

# container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
# autogenerated by Podman 1.6.2
# Tue Dec 10 18:48:56 CET 2019

[Unit]
Description=Podman container-ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f.service
Documentation=man:podman-generate-systemd(1)
After=monitor-resolv-conf.service
Requires=monitor-resolv-conf.service

[Service]
ExecStart=/usr/bin/podman start ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f
ExecStop=/usr/bin/podman stop -t 10 ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f
KillMode=none
Type=forking
PIDFile=/run/user/1000/overlay-containers/ae3453a9ada535ba4bd3c53be0bb756b0e4a1f9da87e3b5541c7aecb729e694f/userdata/conmon.pid

[Install]
WantedBy=default.target

Then I define the monitor-resolv-conf units:

monitor-resolv-config.service

[Unit]
Description=check whether resolv.conf exists

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/sleep 5

[Install]
WantedBy=default.target

monitor-resolv-config.path

[Unit]
Description=check whether resolv.conf exists

[Path]
PathExists=/etc/resolv.conf

[Install]
WantedBy=default.target

The path unit needs to be enabled:

systemctl --user enable monitor-resolv-conf.path

Even this solution needs a sleep. This should not be the case. But otherwise the container is still not able to find the resolv.conf file.

BTW: I noticed that there is indeed the same issue for my root containers. Only difference is that they come up after the second restart approach. I have now also for them configured this workaround.

From my point of view this is just a workaround. If podman containers need the /etc/resolv.conf of the host system there should be a more generic way to make sure that they wait until this file is available. Also ervery solution should keep in mind that the user might not use NetworkManager but some other solution (e.g. systemd-networkd).

rhatdan · 2019-12-23T13:52:27Z

@msekletar PTAL

smekkley · 2020-01-12T12:21:21Z

I had the same issue probably due to different cause, but I leave a comment any way for someone who comes across the same issue as the title.

In conclusion, it was due to cgroup kill by one of podman systemd unit.
I have multiple podman systemd units running. On the first container run, podman spawns the podman pause process. If you do this through systemd, on kill, it kill the pause process as well, leaving the pause pid file. Any podman command as the same user fails.

My workaround is to deploy a podman systemd unit that always runs before any other containers run.

github-actions · 2020-02-12T00:04:45Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-02-18T15:11:39Z

I think we have documented the issues here in the troublshooting guide. Please reopen if I am incorrect.

PavelSosin-320 · 2021-10-25T13:35:45Z

It was very hard to me to except the proposed workaround but I tested and can say that the route cause of the problem is the following: The systemd scope necessary to create and run container must be created beforehand. The use cases when the scope is created by systemd-run --scope and Pod service system units run OK work OK. In both cases either systemd-run or Podman pod create ensure that the scope already exists. The problems with scope are 1. that it can die without any kill, only because all inner processes have exited. 2. It can't be defined using unit file. Podman-pause can't promise it because it can be too late to party and podman-pause scope has created and died beforethe unit starts.
If the duty of Podman-pause is only to be housekeeper of the scope and CG it may be possible to run it at ExecReload=podman info >> /dev/null.

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 11, 2019

github-actions bot added the stale-issue label Feb 12, 2020

rhatdan closed this as completed Feb 18, 2020

fansari mentioned this issue Feb 23, 2020

rootless containers don't work anymore #5291

Closed

topas-rec mentioned this issue Nov 28, 2020

Rootless container doesn't start at systemstart: podman[29817]: Error: no container with name or ID <container ID> found: no such container #8504

Closed

PavelSosin-320 mentioned this issue Sep 2, 2021

Podman run --cidfile creates cid file containing cid in the long form but podman rm --cidfile expects the short form #11356

Closed

PavelSosin-320 mentioned this issue Oct 25, 2021

Podman missinterprets %h symbol in the bind volume source when container created/started from a unit (BTRFS file system) #11547

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic start of rootless containers with systemd fails #4678

automatic start of rootless containers with systemd fails #4678

fansari commented Dec 11, 2019

rhatdan commented Dec 11, 2019

smekkley commented Dec 12, 2019

mheon commented Dec 12, 2019

vrothberg commented Dec 12, 2019

fansari commented Dec 12, 2019

vrothberg commented Dec 13, 2019

fansari commented Dec 14, 2019 •

edited

Loading

rhatdan commented Dec 16, 2019

fansari commented Dec 17, 2019 •

edited

Loading

fansari commented Dec 21, 2019 •

edited

Loading

rhatdan commented Dec 23, 2019

smekkley commented Jan 12, 2020

github-actions bot commented Feb 12, 2020

rhatdan commented Feb 18, 2020

PavelSosin-320 commented Oct 25, 2021

automatic start of rootless containers with systemd fails #4678

automatic start of rootless containers with systemd fails #4678

Comments

fansari commented Dec 11, 2019

rhatdan commented Dec 11, 2019

smekkley commented Dec 12, 2019

mheon commented Dec 12, 2019

vrothberg commented Dec 12, 2019

fansari commented Dec 12, 2019

vrothberg commented Dec 13, 2019

fansari commented Dec 14, 2019 • edited Loading

rhatdan commented Dec 16, 2019

fansari commented Dec 17, 2019 • edited Loading

fansari commented Dec 21, 2019 • edited Loading

rhatdan commented Dec 23, 2019

smekkley commented Jan 12, 2020

github-actions bot commented Feb 12, 2020

rhatdan commented Feb 18, 2020

PavelSosin-320 commented Oct 25, 2021

fansari commented Dec 14, 2019 •

edited

Loading

fansari commented Dec 17, 2019 •

edited

Loading

fansari commented Dec 21, 2019 •

edited

Loading