Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static macvlan IP is duplicated after container restart #302

Closed
nivekuil opened this issue May 24, 2022 · 18 comments
Closed

static macvlan IP is duplicated after container restart #302

nivekuil opened this issue May 24, 2022 · 18 comments
Assignees

Comments

@nivekuil
Copy link

repro:

sudo podman network create macvtest -d macvlan --subnet=fd22::/16
sudo podman run --net=macvtest:ip=fd22::3 docker.io/nginx 

In a second window run:

sudo nsenter -t $(pgrep nginx | tail -n1) -p -n ip -6 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
27: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 censored-slaac-address/64 scope global tentative dynamic mngtmpaddr 
       valid_lft 2592000sec preferred_lft 604800sec
    inet6 fd22::3/16 scope global tentative 
       valid_lft forever preferred_lft forever
    inet6 fe80::28a7:41ff:feb3:9497/64 scope link 
       valid_lft forever preferred_lft forever

looks ok, now ctrl+c the nginx and start it again. sudo nsenter -t $(pgrep nginx | tail -n1) -p -n ip -6 a now shows:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
28: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 censored-slaac-address/64 scope global dynamic mngtmpaddr 
       valid_lft 2591990sec preferred_lft 604790sec
    inet6 fd22::3/16 scope global dadfailed tentative 
       valid_lft forever preferred_lft forever
    inet6 fe80::54c8:36ff:fe70:2b5b/64 scope link 
       valid_lft forever preferred_lft forever

Notice the dadfailed on eth0. Does this need to be released somehow?

@mheon
Copy link
Member

mheon commented May 25, 2022

Please provide the full output of podman info.

@mheon
Copy link
Member

mheon commented May 25, 2022

(I wonder if we shouldn't make that part of the template if the reproducer includes Podman, having a full podman info really helps reproducing).

@nivekuil
Copy link
Author

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.0-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: bdb4f6e56cd193d40b75ffc9725d4b74a18cb33c'
  cpuUtilization:
    idlePercent: 84.99
    systemPercent: 5.1
    userPercent: 9.91
  cpus: 8
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  hostname: machina
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.17.9-zen1-1-zen
  linkmode: dynamic
  logDriver: journald
  memFree: 28018368512
  memTotal: 67311255552
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.4.5-1
    path: /usr/bin/crun
    version: |-
      crun version 1.4.5
      commit: c381048530aa750495cf502ddb7181f2ded5b400
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 13462249472
  swapTotal: 13462249472
  uptime: 15h 23m 33.87s (Approximately 0.62 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 14
    paused: 0
    running: 14
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 491787206656
  graphRootUsed: 282405359616
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 10
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1651861110
  BuiltTime: Fri May  6 11:18:30 2022
  GitCommit: e4b03902052294d4f342a185bb54702ed5bed8b1
  GoVersion: go1.18.1
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.0

@nivekuil
Copy link
Author

nivekuil commented Jun 3, 2022

Repro'd on another machine, podman info:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 92.02
    systemPercent: 2.64
    userPercent: 5.33
  cpus: 2
  distribution:
    distribution: fedora
    variant: coreos
    version: "36"
  eventLogger: journald
  hostname: lax-1
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.17.9-300.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1947394048
  memTotal: 8331001856
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4165464064
  swapTotal: 4165464064
  uptime: 84h 36m 19.53s (Approximately 3.50 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 35
    paused: 0
    running: 34
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 42401247232
  graphRootUsed: 7014465536
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 24
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1651853754
  BuiltTime: Fri May  6 16:15:54 2022
  GitCommit: ""
  GoVersion: go1.18
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.0

@cathay4t
Copy link

I have reproduced this problem locally, hopefully I could find a fix to it.

@mheon
Copy link
Member

mheon commented Jun 13, 2022

@flouthoc PTAL

@flouthoc
Copy link
Collaborator

Since @cathay4t is interested in working on this issue as per his availability so assigning this issue to him.

@nivekuil
Copy link
Author

any luck so far?

@nivekuil
Copy link
Author

nivekuil commented Aug 2, 2022

@cathay4t are you blocked on troubleshooting or implementation? How can I help? As it stands netavark cannot use assigned IPs with macvlan at all in practice.

@nivekuil
Copy link
Author

nivekuil commented Sep 5, 2022

Is this fixed now?

@cathay4t
Copy link

Sorry for the long delay. Investigating.

@cathay4t
Copy link

The root cause of this problem is left over of network namespace holding the macvlan after container stopped. Next podman start will create new network namespace and new macvlan interface in it. With this macvlan mode, both left over and new eth0 can communicate with each other, then the DAD on the same IPv6 address will fail in the new eth0.

This issue is fixed in latest main branch( e466e39).

Please let me know if you want me to biset out which commit fixed the problem.

@Luap99
Copy link
Member

Luap99 commented Sep 16, 2022

The root cause of this problem is left over of network namespace holding the macvlan after container stopped. Next podman start will create new network namespace and new macvlan interface in it. With this macvlan mode, both left over and new eth0 can communicate with each other, then the DAD on the same IPv6 address will fail in the new eth0.

This issue is fixed in latest main branch( e466e39).

Please let me know if you want me to biset out which commit fixed the problem.

Do you know what is keeping the netns open? Podman should remove the bind mount and close the open fd on container stop and netavark cannot keep anything open since it is short lived and will exit.

@cathay4t
Copy link

I got this error in netavark 1.1:

ERRO[0000] Unable to clean up network for container 15d1ec7d1f5287953327cd85b348ac7c6f5bc47a802b22d1dcf2f942536d4490: "error tearing down network namespace configuration for container 15d1ec7d1f5287953327cd85b348ac7c6f5bc47a802b22d1dcf2f942536d4490: netavark (exit code 1): IO error: Unable to resolve bridge interface : Received a netlink error message No such device (os error 19)"

The network namespace is created at /run/netns/ folder in stead of bind to any PID. That's why network namespace still exist after container stopped.

@Luap99
Copy link
Member

Luap99 commented Sep 16, 2022

Podman should still remove the netns even if netavark or cni cleanup fail IMO, so this would be a podman bug. cc @mheon

For the netavark error message you linked I do not see how the tokio update would fix that but if it does we should make a new release.

@Luap99
Copy link
Member

Luap99 commented Sep 16, 2022

Looking at the error I think this was fixed with my netavark restructure in #360

@mheon
Copy link
Member

mheon commented Sep 16, 2022

It looks like the Podman cleanup logic actually refuses to unmount the NS if tearing down network setup failed - ref: https://github.com/containers/podman/blob/main/libpod/networking_linux.go#L699-L711

Luap99 added a commit to Luap99/libpod that referenced this issue Sep 30, 2022
We should not keep the netns if there was a cleanup problem. Deleting
the netns will also delete the virtual links inside and thus make the IPs
available again for the next use.

context: containers/netavark#302

[NO NEW TESTS NEEDED] This is very hard to trigger reliable and it would
need to work with cni and netavark. This mostly happens because of
specic bugs but those will be fixed and then this test would fail.

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member

Luap99 commented Sep 30, 2022

I created containers/podman#16016 for the podman cleanup issue
Either way I believe this specific issue is fixed in the latest netavark version so I will close it.

@Luap99 Luap99 closed this as completed Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants