Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Listing network from Docker fails during container removal #17341

Closed
Agalin opened this issue Feb 2, 2023 · 4 comments · Fixed by #17376
Closed

[Bug]: Listing network from Docker fails during container removal #17341

Agalin opened this issue Feb 2, 2023 · 4 comments · Fixed by #17376
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@Agalin
Copy link

Agalin commented Feb 2, 2023

Issue Description

If you start Podman API server and try to inspect a network from Docker (or Docker-compatible library, e.g. one used by Gitlab Runner) you get also a list of containers in that network (for backward compatibility with Docker, Podman doesn't show that data).

But if container is currently being removed -or added, not sure here - this request fails with:

Error response from daemon: container <container id> does not exist in database: no such container

The interesting part is that you get the same error even if you run docker network ls instead of docker network inspect <network>.

I believe it may be the cause of this Gitlab Runner issue (or it's at least one of the causes) and one similar error that I believe has not been reported to Gitlab yet that I've only observed with Podman 4.4.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Start Podman API server (podman system service).
  2. Create a network (podman network create test).
  3. Configure native docker to use Podman's socket (export DOCKER_HOST=unix://<path to socket>).
  4. Loop container creation (check below the list for an example code).
  5. Watch either output of docker network list or docker network inspect test (check below the list for an example code).

Example creation loop:

while true
do
    podman run -d --rm -ti --network test fedora sleep 1;
done

Example watch (you need to open the file to find those lines, terminal control keys used to clear screan are stored in it so simple cat won't work):

watch -tn 0.1 --exec docker network ls | tee -a test.log

Describe the results you received

Podman server sometimes fails with container not found error. Log entry:

INFO[0052] Request Failed(Internal Server Error): container cf3b535ee60be15c9b5ed36240caa923119be4f06f9ff80bd98620d3c7e3ef3e does not exist in database: no such container 
@ - - [02/Feb/2023:17:15:09 +0000] "GET /v1.41/networks HTTP/1.1" 500 178 "" "Docker-Client/20.10.23 (linux)"

Describe the results you expected

No errors for either request.

Ff Podman finds it cannot retrieve container details because it does no longer exist it should just remove it from network inspect output.

In case of network list I'm not even sure if there is a reason to create this containers list in the first place - does JSON response contain that field? I don't see an option in docker's cli to show containers in this view.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpuset
  - cpu
  - memory
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /usr/local/libexec/podman/conmon
    version: 'conmon version 2.1.5, commit: 4cb1e4d73699ce0cef2c3d89b652b3d15be429b3'
  cpuUtilization:
    idlePercent: 98.16
    systemPercent: 0.57
    userPercent: 1.27
  cpus: 4
  distribution:
    distribution: fedora
    variant: cloud
    version: "37"
  eventLogger: file
  hostname: <cut>
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 993
      size: 1
    - container_id: 1
      host_id: 200000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 993
      size: 1
    - container_id: 1
      host_id: 200000
      size: 65536
  kernel: 6.1.8-200.fc37.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1381150720
  memTotal: 8329515008
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.7.2-3.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.7.2
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/993/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/993/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8328572928
  swapTotal: 8328835072
  uptime: 27h 58m 5.00s (Approximately 1.12 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors: <cut>
    Prefix: docker.io
    PullFromMirror: ""
    ...: <cut>
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/gitlab-runner/.config/containers/storage.conf
  containerStore:
    number: 10
    paused: 0
    running: 1
    stopped: 9
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/gitlab-runner/.local/share/containers/storage
  graphRootAllocated: 52527345664
  graphRootUsed: 5485887488
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/user/993/containers
  transientStore: false
  volumePath: /home/gitlab-runner/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.0
  Built: 1675343283
  BuiltTime: Thu Feb  2 13:08:03 2023
  GitCommit: ""
  GoVersion: go1.19.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Running on Fedora 37 in a VM with self-compiled Podman and conmon. SELinux enabled.

Same error observed earlier using latest Fedora 37 packages (Podman 4.3.1, conmon 2.1.5).

Same Gitlab Runner issue observed even earlier (on Podman 4.3.0, older conmon, runc, netavark, aardvark, etc.) although I don't have exact versions nor a way to confirm 100% that it means it's caused by the same problem. If this is the case then oldest report (author of that Gitlab issue) comes from Podman 3.4.2.

Additional information

No response

@Agalin Agalin added the kind/bug Categorizes issue or PR as related to a bug. label Feb 2, 2023
@vrothberg
Copy link
Member

Thanks for reaching out, @Agalin!

I bookmarked the issue to fix it on Monday unless others beat me to it :)

@Agalin
Copy link
Author

Agalin commented Feb 3, 2023

Thanks for a quick reaction. For now I can confirm that simply skipping non-existent containers in convertLibpodNetworktoDockerNetwork seems to fix the issue.

@rhatdan
Copy link
Member

rhatdan commented Feb 3, 2023

Care to open a PR?

@Agalin
Copy link
Author

Agalin commented Feb 3, 2023

Don't believe my golang skills are good enough to do it right, it's an ugly hack right now that doesn't even pass linter checks.

vrothberg added a commit to vrothberg/libpod that referenced this issue Feb 7, 2023
Handle a race condition in the REST API when listing networks.
In between listing all containers and inspecting them, they may have
already been removed, so handle this case gracefully.

[NO NEW TESTS NEEDED] as it's a race condition.

Fixes: containers#17341
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Feb 7, 2023
Handle a race condition in the REST API when listing networks.
In between listing all containers and inspecting them, they may have
already been removed, so handle this case gracefully.

[NO NEW TESTS NEEDED] as it's a race condition.

Fixes: containers#17341
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 2, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants