Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadlet .image file should wait for dns name resolution #21873

Closed
jbtrystram opened this issue Feb 29, 2024 · 21 comments · Fixed by #22057
Closed

Quadlet .image file should wait for dns name resolution #21873

jbtrystram opened this issue Feb 29, 2024 · 21 comments · Fixed by #22057
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jbtrystram
Copy link
Contributor

Issue Description

When creating a .image quadlet file, the service fails to start if DNS name resolution is to early.

Steps to reproduce the issue

Reproducer :

/etc/containers/systemd/immich.image

[Image]
Image=ghcr.io/immich-app/immich-server:v1.94.1

Then have another quadlet container file which requires this image and start on boot :
/etc/containers/systemd/immich.container

[Container]

ContainerName=immich-server
EnvironmentFile=/etc/containers/systemd/immich.env
Image=immich.image 

[Service]
Restart=always

[Install]
WantedBy=multi-user.target default.target

Describe the results you received

The image fails to pull because it starts too early and DNS name resolution is not ready yet :

Starting immich-image.service...
Trying to pull ghcr.io/immich-app/immich-server:v1.94.1...
Error: initializing source docker://ghcr.io/immich-app/immich-server:v1.94.1: pinging container registry ghcr.io: Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io: Temporary failure in name resolution
immich-image.service: Main process exited, code=exited, status=125/n/a
immich-image.service: Failed with result 'exit-code'.
Failed to start immich-image.service.

Describe the results you expected

As my immich-server.container declares a WantedBy dependency on multi-user.target i'd expect my service to come up at boot time.
However this fails due to the name resolution failure when pulling the image.

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.5
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 98.35
    systemPercent: 0.75
    userPercent: 0.9
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: server
    version: "38"
  eventLogger: journald
  freeLocks: 1550
  hostname: nagao
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.7.6-100.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 306696192
  memTotal: 12418871296
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc38.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.9.0-1.fc38.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.9.0
  ociRuntime:
    name: crun
    package: crun-1.14.3-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.3
      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231230.gf091893-1.fc38.x86_64
    version: |
      pasta 0^20231230.gf091893-1.fc38.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc38.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8064856064
  swapTotal: 8589930496
  uptime: 5h 13m 46.00s (Approximately 0.21 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/jibou/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/jibou/.local/share/containers/storage
  graphRootAllocated: 62409146368
  graphRootUsed: 30237601792
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 7
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/jibou/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.3
  Built: 1708357383
  BuiltTime: Mon Feb 19 16:43:03 2024
  GitCommit: ""
  GoVersion: go1.20.13
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.3

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

systemd-253.15-2.fc38.x86_64

Additional information

A workaround that is to add a After requirement on nss-lookup.target, on the service.
So this could probably added by default on generated .image oneshot services.

@jbtrystram jbtrystram added the kind/bug Categorizes issue or PR as related to a bug. label Feb 29, 2024
@rhatdan
Copy link
Member

rhatdan commented Feb 29, 2024

@alexlarsson Do we need to force?

Wants=network-online.target
After=network-online.target

@jbtrystram
Copy link
Contributor Author

@rhatdan

network-online.target do not guarantee DNS resolution, only that interfaces are up and have an IP address.
See https://systemd.io/NETWORK_ONLINE/

For name resolution the target should be nss-lookup.target

@rhatdan
Copy link
Member

rhatdan commented Feb 29, 2024

Seems reasonable to me, I am not sure how quadlets setup those types of requirements right now.

@jbtrystram
Copy link
Contributor Author

yeah, I should have included that in the initial issue :

cat /run/systemd/generator/immich-server-image.service 
# Automatically generated by /usr/lib/systemd/system-generators/podman-system-generator
# 
[X-Image]
Image=ghcr.io/immich-app/immich-server:v1.94.1

[Service]
ExecStart=/usr/bin/podman image pull ghcr.io/immich-app/immich-server:v1.94.1
Type=oneshot
RemainAfterExit=yes
SyslogIdentifier=%N

[Unit]
SourcePath=/etc/containers/systemd/immich/immich-server.image
RequiresMountsFor=%t/containers

Note that I have not tried with podman 5

@rhatdan
Copy link
Member

rhatdan commented Mar 1, 2024

If you add to your image quadlet

[Unit]
Wants=nss-lookup.target
After=nss-lookup.target

Does this fix your issue?

@jbtrystram
Copy link
Contributor Author

jbtrystram commented Mar 1, 2024

@rhatdan just did some further testing :

Adding just this solves the issue in my case

[Unit]
Wants=network-online.target
After=network-online.target

I got used to put the nss-lookup because I had issues before where loaded CI systems would fail name resolution even if network-online is reached

Ordering after systemd-resolved.service could work as well

@rhatdan
Copy link
Member

rhatdan commented Mar 2, 2024

I guess the question is should we do this automatically or just document it?

@ygalblum
Copy link
Contributor

ygalblum commented Mar 3, 2024

I don't think it is for Quadlet to decide the dependencies of the unit files. Users should treat Quadlet files the same as they do any other systemd service file, just with an additional section (each file type with its own special section). What if DNS resolution is not required?
Documenting it is a good idea

@rhatdan
Copy link
Member

rhatdan commented Mar 3, 2024

SGTM
@jbtrystram Want to open a PR to document what you had to do to get t his to work?

@vrothberg
Copy link
Member

I think units should run out-of-the-box without further massaging by users, especially when all units needed editing to just run.

podman generate systemd adds the network dependency for exactly this reason based on user/customer feedback

Wants=network-online.target
After=network-online.target

Quadlet only adds RequiresMountsFor=%t/containers so there is a risk. I don't see a use case where a unit should run before the network is up, so I think it's safe to add the network dependency by default.

@jbtrystram
Copy link
Contributor Author

the .image could point to a locally available image, in which case waiting for network don't have any purpose

@vrothberg
Copy link
Member

the .image could point to a locally available image, in which case waiting for network don't have any purpose

Good point! Yet, .image on its own ships no functionality until being used by a .container which may still require network.

@ygalblum
Copy link
Contributor

ygalblum commented Mar 4, 2024

@vrothberg I understand your point. However, if you recall, Quadlet started out very opinionated and we have since removed most (if not all) of its opinions. I'd hate to add this functionality just to have to add a nob that overrides it later.
Keep in mind that in generate systemd the user owns the generated file and can edit it. This is not the case in Quadlet

@vrothberg
Copy link
Member

However, if you recall, Quadlet started out very opinionated and we have since removed most (if not all) of its opinions.

The opinions usually changed backed by data. Here we have a data point that can help refine earlier decisions.

I'd hate to add this functionality just to have to add a nob that overrides it later.

Do you have a use case in mind where a container would need to run before the network is online? If there are valid concerns of breaking important use cases, I am OK with documenting. At the moment, I cannot find one and fear of more users running into the problem.

@ygalblum
Copy link
Contributor

ygalblum commented Mar 4, 2024

Famous last words:

The user will never do X

:)

If you think it's OK, I don't mind adding it.

@vrothberg
Copy link
Member

Let's call in @rhatdan :^)

@Luap99
Copy link
Member

Luap99 commented Mar 5, 2024

I'd hate to add this functionality just to have to add a nob that overrides it later.

Do you have a use case in mind where a container would need to run before the network is online? If there are valid concerns of breaking important use cases, I am OK with documenting. At the moment, I cannot find one and fear of more users running into the problem.

Well depends there could be early boot stuff running in container that shouldn't wait for the network.
However I agree that by default waiting for the network should be the best thing.

Also you do no really need to implement an option to overwrite it, in systemd syntax something like this should work

After=

This would unset all previously set After's, so as long as quadlet adds the user provided dependencies after the default ones by quadlet users will have the option to unset the network dependency if they do not want it.

@rhatdan
Copy link
Member

rhatdan commented Mar 6, 2024

I would like to go with the least surprise here. Especially after @Luap99 found a get out of jail card for the users who want the container to run in early boot.

I like the idea of doing After network. And then adding docs about it, and the After= flag to disable it.

@vrothberg
Copy link
Member

SGTM 👍

@jbtrystram
Copy link
Contributor Author

I'll cook up a PR with the changes. Can someone assign the issue to me and put the corresponding label ?

@Luap99 Luap99 added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Mar 6, 2024
@dustymabe
Copy link
Contributor

just noting here that for user units (i.e. not system units) depending on network-online.target doesn't work. Upstream RFE for being able to do something like that:

I think in the very least we should make sure we don't generate user units that have this setting set if it doesn't work (could lead to confusion). We can mention the upstream RFE and say we should enable it if the RFE ever gets implemented maybe?

@dustymabe dustymabe changed the title Qualdet .image file should wait for dns name resolution Quadlet .image file should wait for dns name resolution Apr 1, 2024
jbtrystram added a commit to jbtrystram/podman that referenced this issue May 22, 2024
If a container unit starts on boot with a dependency on `default.target`
the image unit may start too soon, before network is ready. This cause
the unit to fail to pull the image.
- Add a dependency on `network-online.target` to make sure image pulls
don't fail.
See containers#21873

- Document the hardcoded dependency on `network-online.target` for images unit
and explain how it can be overriden if necessary.

- tests/e2e/quadlet: Add `assert-last-key-regex`

Required to test the `After=` override in [Unit] section
See containers#22057 (comment)

- quadlet/unitfile: add a prepenUnitLine method

Requirements on networks should be inserted at the top of the
section so the user can override them.

Signed-off-by: jbtrystram <[email protected]>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 22, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants