Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot down docker-compose environment with shared network #17990

Closed
cristianrgreco opened this issue Mar 30, 2023 · 8 comments
Closed

Cannot down docker-compose environment with shared network #17990

cristianrgreco opened this issue Mar 30, 2023 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. network Networking related issue or feature

Comments

@cristianrgreco
Copy link

Issue Description

I have 2 docker-compose apps running with a shared network.

With Docker, I am able to stop 1 app, and then down the 2nd, because stopping the 1st app disconnects the container from the shared network.

With Podman, downing the 2nd app fails because it thinks containers are still connected to the network.

It seems similar to this issue #9632, but in a docker-compose setting.

Steps to reproduce the issue

docker-compose file:

version: "3.5"

services:
  container:
    image: alpine
    command: ["sleep", "infinity"]
    networks:
      - shared-network

networks:
  shared-network:
    name: test-network
export DOCKER_HOST=unix://${XDG_RUNTIME_DIR}/podman/podman.sock

$ docker-compose -p app1 up -d
Creating app1_container_1 ... done

$ docker-compose -p app2 up -d
Creating app2_container_1 ... done

$ docker-compose -p app1 stop 
Stopping app1_container_1 ... done

$ docker-compose -p app2 down
Stopping app2_container_1 ... done
Removing app2_container_1 ... done
Removing network test-network
ERROR: "test-network" has associated containers with it. Use -f to forcibly delete containers and pods: network is being used

Describe the results you received

ERROR: "test-network" has associated containers with it. Use -f to forcibly delete containers and pods: network is being used

Describe the results you expected

Without exporting the DOCKER_HOST thus using Docker, no error message is logged. Containers and networks are successfully removed.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2:2.1.7-0debian12+obs15.12_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 97.49
    systemPercent: 0.58
    userPercent: 1.93
  cpus: 32
  distribution:
    codename: kinetic
    distribution: ubuntu
    version: "22.10"
  eventLogger: journald
  hostname: cristian
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.19.0-38-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 877772800
  memTotal: 16771223552
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_101:1.8.3-0debian12+obs54.1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.3
      commit: 59f2beb7efb0d35611d5818fd0311883676f6f7e
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 1326510080
  swapTotal: 2147479552
  uptime: 8h 11m 55.00s (Approximately 0.33 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/cristian/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/cristian/.local/share/containers/storage
  graphRootAllocated: 262553440256
  graphRootUsed: 38168629248
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 72
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/cristian/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.4
  Built: 0
  BuiltTime: Thu Jan  1 01:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.6
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.4

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

No response

@cristianrgreco cristianrgreco added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2023
@Luap99
Copy link
Member

Luap99 commented Apr 3, 2023

So what does docker do here?
It looks like app1_container_1 still exists and it is connected to the network so it looks correct to me that we refuse to delete it. Does docker remove the container as well?

@cristianrgreco
Copy link
Author

cristianrgreco commented Apr 3, 2023

Hi @Luap99, when I docker-compose -p app1 stop, the container app1_container_1 is in an Exited state. Inspecting this container I see the following for the network:

"NetworkSettings": {
    "Bridge": "",
    "SandboxID": "8d3d86971e5ecbce54c08a682a8ae277f20130ddda4b019bdd257919d3dd00dd",
    "HairpinMode": false,
    "LinkLocalIPv6Address": "",
    "LinkLocalIPv6PrefixLen": 0,
    "Ports": {},
    "SandboxKey": "/var/run/docker/netns/8d3d86971e5e",
    "SecondaryIPAddresses": null,
    "SecondaryIPv6Addresses": null,
    "EndpointID": "",
    "Gateway": "",
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "IPAddress": "",
    "IPPrefixLen": 0,
    "IPv6Gateway": "",
    "MacAddress": "",
    "Networks": {
        "test-network": {
            "IPAMConfig": null,
            "Links": null,
            "Aliases": [
                "de28b4734aa0",
                "container"
            ],
            "NetworkID": "ab4cfafa9d7ed4b70de5307e1443960eb6ebebb7957ac248939c9935052a7b52",
            "EndpointID": "",
            "Gateway": "",
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "MacAddress": "",
            "DriverOpts": null
        }
    }
}

When I docker-compose -p app2 down, container app1_container_1 remains in an Exited state and the network is gone. Indeed if I try to start the exited container, Docker logs this error:

➜  ~ docker start app1_container_1
Error response from daemon: network ab4cfafa9d7ed4b70de5307e1443960eb6ebebb7957ac248939c9935052a7b52 not found
Error: failed to start containers: app1_container_1

@Luap99
Copy link
Member

Luap99 commented Apr 3, 2023

What docker-compose version are you using?
docker-compose v2.17.2 doesn't remove the network on the down, which to me seems the right thing to do as it is still used by app1.

I observe the following behaviour, docker lets you remove networks when the container is stopped:

[root@fedora ~]# docker network create test
723d3616a7eb3a9b7380da6814796ac911c68fb51c9705335e129b8e2bd2b4e7
[root@fedora ~]# docker run --network test -d alpine top
251d6b1c410e80cfe090983109f0116b3e119f203a426b2c9522331b88c1eca6
[root@fedora ~]# docker network rm test 
Error response from daemon: error while removing network: network test id 723d3616a7eb3a9b7380da6814796ac911c68fb51c9705335e129b8e2bd2b4e7 has active endpoints
[root@fedora ~]# docker stop 251d6b1c410e80cfe090983109f0116b3e119f203a426b2c9522331b88c1eca6
251d6b1c410e80cfe090983109f0116b3e119f203a426b2c9522331b88c1eca6
[root@fedora ~]# docker network rm test 
test
[root@fedora ~]# docker start 251d6b1c410e80cfe090983109f0116b3e119f203a426b2c9522331b88c1eca6
Error response from daemon: network 723d3616a7eb3a9b7380da6814796ac911c68fb51c9705335e129b8e2bd2b4e7 not found
Error: failed to start containers: 251d6b1c410e80cfe090983109f0116b3e119f203a426b2c9522331b88c1eca6

This looks like a bug to me, I am using docker v20.10.23. I haven't check with the latest dockerd version.


Therefore I would argue podman is correct here, removing this network just results in a broken container.

Is there a specific case where you need this to work?

@Luap99 Luap99 added the network Networking related issue or feature label Apr 3, 2023
@cristianrgreco
Copy link
Author

I can confirm the same behaviour as you observed using docker-compose 2.17.2, the network is not removed.
My original message was using docker-compose 1.29.2.

The issue I have is within a library called testcontainers. With Docker you are able to stop/down the environment, whereas with Podman it throws an error. I do not know which is correct, but the difference in behaviour is causing me an issue

@Luap99
Copy link
Member

Luap99 commented Apr 3, 2023

AFAICT compose v1 is deprecated so so you need to move to v2 eventually.

With Docker you are able to stop/down the environment

Would switching it to make down in both cases work?

I understand that this is a difference in behaviour but I cannot see why anyone would want that behaviour, it leaves your container in a broken state. The docker docs even say that you need to disconnect the containers before you can remove the network:

To remove a network, you must first disconnect any containers connected to it.

@cristianrgreco
Copy link
Author

cristianrgreco commented Apr 3, 2023

AFAICT compose v1 is deprecated so so you need to move to v2 eventually.

Agreed, however both v1 and v2 "work" for my case (error is not thrown from either).

Would switching it to make down in both cases work?

Sorry I don't understand what you mean here

I cannot see why anyone would want that behaviour, it leaves your container in a broken state

I agree with you, but it seems this is only the case for docker-compose v1. In this issue scenario, for docker-compose v2, Docker leaves the network alone even after down, so the container is not broken.

Summary:

  • Docker-compose v1: Down incorrectly removes the network. The stopped container is broken. No error thrown.
  • Docker-compose v2: Down does not remove the network. The stopped container can continue. No error thrown. (What does down even mean here if it leaves resources around?)
  • Podman: Down throws an error because it sees the stopped container is attached to the network.

I guess the question is whether Podman should adopt docker-compose v2 behaviour, and "successfully" down the environment, leaving behind the network.

@Luap99
Copy link
Member

Luap99 commented Apr 3, 2023

Would switching it to make down in both cases work?

Sorry I don't understand what you mean here

Your reproducer uses stop && down. Why not use down && down? The first container was broken anyway so it was never used again so you can just remove it via down directly.

$ docker-compose -p app1 down 
$ docker-compose -p app2 down

Summary:

* Docker-compose v1: Down incorrectly removes the network. The stopped container is broken. No error thrown.

* Docker-compose v2: Down does not remove the network. The stopped container can continue. No error thrown. (What does down even mean here if it leaves resources around?)

You want it to leave things around, you only stopped the container so it still exists therefore the network is still required by it. The network is only removed when you down both apps.

* Podman: Down throws an error because it sees the stopped container is attached to the network.

I guess the question is whether Podman should adopt docker-compose v2 behaviour, and "successfully" down the environment, leaving behind the network.

compose v2 works against the podman socket because it never calls network rm on it when it knows that the network is still in use.

The problem is compose just talk via docker API to us, we cannot know what the intention is.
If the call is DELETE /v1.40/networks/<ID> then we try that and error if it did not work, we cannot change it otherwise we break other stuff.

@cristianrgreco
Copy link
Author

Why not use down && down?

The test I have is specifically testing stop, to show that the environment can be downed after another environment with shared components is stopped.

compose v2 works against the podman socket because it never calls network rm on it when it knows that the network is still in use.

I think this is the answer for me. I need the change in behaviour in compose v2 for this test to work with Podman. This issue can be closed, thank you for your help @Luap99

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 28, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. network Networking related issue or feature
Projects
None yet
Development

No branches or pull requests

2 participants