CI flake: podman run with new:pod and static-ip: Error adding network: failed to set bridge addr #7583

edsantiago · 2020-09-10T16:54:27Z

The symptom is:

  podman run with new:pod and static-ip
 ...
Running: podman [options] network create --subnet 10.25.40.0/24 podmantestnetwork2
/etc/cni/net.d/podmantestnetwork2.conflist
Running: podman [options] run -t -i --rm --pod new:testpod --net podmantestnetwork2 --ip 10.25.40.128 docker.io/library/alpine:latest ip addr
time="2020-09-03T15:37:25-05:00" level=error msg="Error adding network: failed to set bridge addr: \"cni-podman1\" already has an IP address different from 10.25.40.1/24"
time="2020-09-03T15:37:25-05:00" level=error msg="Error while adding pod to CNI network \"podmantestnetwork2\": failed to set bridge addr: \"cni-podman1\" already has an IP address different from 10.25.40.1/24"
time="2020-09-03T15:37:25-05:00" level=error msg="error starting some container dependencies"
time="2020-09-03T15:37:25-05:00" level=error msg="\"error configuring network namespace for container 6091376fe654ca512bc885c89c0026c76549f8a47e174c37819e77f8c5678cfe: failed to set bridge addr: \\\"cni-podman1\\\" already has an IP address different from 10.25.40.1/24\""
Error: error starting some containers: internal libpod error

These are all three-time failures, causing the entire CI job to fail, probably because the first failure leaves things in a bad state for the test reruns -- subsequent attempts fail with:

Running: podman [options] network create --subnet 10.25.40.0/24 podmantestnetwork2
Error: network 10.25.40.0/24 is already being used by a cni configuration

So, at a minimum, I would suggest fixing the test so as to clean up after itself.

Flake history:

fedora-32 : test fedora-32
- PR Update nix pin with make nixpkgs #7408
fedora-c5809900649447424 : test fedora
- PR Update VM images for new crun; adapt Cap tests to work with new kernel #7538
ubuntu-19 : test ubuntu-19
- PR Update vendor of buildah to latest code #7335

The text was updated successfully, but these errors were encountered:

Problem: if either of the two "podman network create" tests fail, all subsequent retries will also fail because the created network has not been cleaned up (so "network create" will fail with EEXIST). Solution: run "podman network rm" as deferred cleanup instead of in each test. This is NOT a fix for containers#7583 - it is just a way to allow ginkgo to retry a failing test. Signed-off-by: Ed Santiago <[email protected]>

Luap99 · 2020-09-11T12:51:49Z

I will take a look.

Luap99 · 2020-09-13T15:25:21Z

OK, I don't thing there is anything wrong with the "new:pod and static-ip" command itself. I ran this test for hours on a slow and fast computer without any problems.

Looking at the error it seems that podman is trying to set a ip on an existing bridge interface (cni-podman1) which already has a different ip address.

You could force this error by doing something like this:

$ podman network create --subnet 10.25.10.0/24 testnet1
/etc/cni/net.d/testnet1.conflist
$ podman network inspect testnet1 | grep bridge
        "bridge": "cni-podman1",
        "type": "bridge"
$ podman run --rm --name test1 --net testnet1 --ip 10.25.10.10 alpine ip addr 
...

Don't delete this network to force this error.
Create the second network:
conf=$(podman network create --subnet 10.25.20.0/24 testnet2)
Edit the config bridge name to the same as testnet1 (cni-podman1)
vi $conf
Now run this and you get the error:

podman run --rm --name test2 --net testnet2 --ip 10.25.20.10 alpine ip addr
ERRO[0000] Error adding network: failed to set bridge addr: "cni-podman1" already has an IP address different from 10.25.20.1/24 
ERRO[0000] Error while adding pod to CNI network "testnet2": failed to set bridge addr: "cni-podman1" already has an IP address different from 10.25.20.1/24 
Error: error configuring network namespace for container 9f39e72bbf437b2a3bf59da51f0d1d4ce7e350cb04a0d8013d8250bc3754b9f7: failed to set bridge addr: "cni-podman1" already has an IP address different from 10.25.20.1/24

So the real problem could be that podman network create creates a config with an already used bridge interface name. I'm not sure if there is a relation to the previous test which also creates and removes a network.

PavelSosin-320 · 2020-09-14T06:20:18Z

Similar in my Podman running on WSL CentOS 8.1 when I try to run Theia container:
failed to set bridge addr: "cni-podman0" already has an IP address different from 10.88.2.1/24
While podman network inspect podman | grep bridge
"bridge": "cni-podman0",
"type": "bridge"
and IP -a shows:
6: cni-podman0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0
valid_lft forever preferred_lft forever
inet6 fe80::84db:ff:fe4e:2a72/64 scope link
valid_lft forever preferred_lft forever
But in my case IP can be in use on the host via WSL2 Virtual switch or WSL's localhost. The IP -a says:
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:5d:b7:2f:ec brd ff:ff:ff:ff:ff:ff
inet 172.22.74.96/20 brd 172.22.79.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::215:5dff:feb7:2fec/64 scope link
valid_lft forever preferred_lft forever
6: cni-podman0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0
valid_lft forever preferred_lft forever
inet6 fe80::84db:ff:fe4e:2a72/64 scope link
valid_lft forever preferred_lft forever

Which IP the Podman bridge listens? - Must be 0.0.0.0 because WSL 2 V-switch implements NIC only.

Luap99 · 2020-09-14T11:28:24Z

@PavelSosin-320 What command are you using?

PavelSosin-320 · 2020-09-14T12:14:17Z

I use
podman run -P --pod new:cheWsServerNetLog --log-driver="json-file" --net podman docker.io/theiaide/theia

on WSL2 Centos 8.1 image.
But it looks like a generic issue because I see the Podman bridge interface in ip -j -4 a show cni-podman0 output as exists in my VM
[{"ifindex":6,"ifname":"cni-podman0" ....... "addr_info":[{"family":"inet","local":"10.88.0.1","prefixlen":16,"broadcast":"10.88.255.255","scope":"global"
I suppose that in WSL VM or other VM scenario with V-switch used only one bridge can exist. I expect [INFO] "already exists message.

mheon · 2020-09-14T13:41:20Z

@PavelSosin-320 I'm confused as to the question about what IP the CNI bridge listens on - bridges don't listen on any IP, because they're a layer 2 construct. There's an interface on the host in the bridge and another interface for each container to allow communications to the gateway, and then we configure iptables in masquerade mode to NAT the subnet in use on the bridge. It seems like we're not even getting that far, as it's complaining that it's unable to configure the bridge?

PavelSosin-320 · 2020-09-14T16:15:04Z

Yes, it is true! I don't see how to configure the bridge. There is the long-lasting discussion about the bridge vs NIC in the WSL 2 VM. The is neither a bridge nor even a fully-functional V-switch in the WSL2. The eth0 in the VM is created on-the-fly and IP address is generated every time when Windows LXSS service is started. The VM itself is a singleton and I see cni-podman0 interface from the CentOS7 distro running Docker side. And vise-verse, I see Docker0 bridge from the CentOS8.1 distro's side running Podman.
This is a very challenging WSL limitation :( from my point of view because it requires hard network separation between Docker and Podman.

edsantiago · 2020-10-05T18:52:02Z

Still happening: log (on #7926)

github-actions · 2020-11-07T00:13:51Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-11-07T12:18:55Z

@edsantiago @Luap99 Any movement on this issue?

edsantiago · 2020-11-09T13:23:11Z

Still happening, although it looks like the cleanup has worked so it's not triple-failing any more (and hence not causing complete CI-run failures). Logs below are November and October only, I choose to skip the many late-September instances:

gce_instance:prior-fedora : int podman fedora-31 root host
- PR fedora rootless cpu settings #8231
  - 11-03 11:13
gce_instance:prior-ubuntu : int podman ubuntu-19 root host
- PR Change http ConnState actions between new and active #8209
  - 10-31 23:13
- PR Cirrus: Workaround F32 BFQ Kernel bug #8188
  - 10-30 10:00
prior-fedora : int podman fedora-31 root container
- PR Cirrus: Skip deep testing on branches #7926
prior-fedora : int podman fedora-31 root host
- PR Add hostname to /etc/hosts for --net=none #8101
  - 10-21 14:32
prior-ubuntu : int podman ubuntu-19 root host
- PR Don't error if resolv.conf does not exists #8111
  - 10-22 14:17
- PR Tests: Check different log driver can work with podman logs #8096
  - 10-22 03:23
ubuntu : int podman ubuntu-20 root host
- PR APIv2 compatibility network connect|disconnect #8078
  - 10-22 10:14

github-actions · 2020-12-10T00:17:51Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-12-10T22:02:39Z

@edsantiago Does this still happen, or was this fixed by #7943 ?

edsantiago · 2020-12-14T12:36:46Z

Looks like #7943 merged on October 7. In addition to the October/November incidents mentioned in a comment above, we have the following since then:

gce_instance:prior-ubuntu : int podman ubuntu-19 root host
- PR Handle --rm when starting a container #8688
  - 12-11 06:50
- PR Allow multiple --network flags for podman run/create #8410
gce_instance:ubuntu : int podman ubuntu-20 root host
- PR Bump master to v3.0.0-dev #8523
  - 11-30 17:40

github-actions · 2021-01-14T00:52:43Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2021-01-14T13:06:50Z

Only one instance in the last month, and it was a one-off (i.e. didn't cause a CI failure). Guess I'll close and hope for the best.

gce_instance:prior-ubuntu : int podman ubuntu-19 root host
- PR Handle --rm when starting a container #8688
  - 12-11 06:50

edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Sep 10, 2020

edsantiago mentioned this issue Sep 10, 2020

run_networking e2e test: add cleanup to some tests #7593

Merged

Luap99 self-assigned this Sep 11, 2020

edsantiago mentioned this issue Sep 17, 2020

Evict containers before removing via V2 API #7662

Merged

Luap99 mentioned this issue Oct 5, 2020

podman network create: race: error reading */conflist: ENOENT #7807

Closed

rhatdan added the kind/test-flake Categorizes issue or PR as related to test flakes. label Oct 7, 2020

github-actions bot added the stale-issue label Nov 7, 2020

edsantiago removed the stale-issue label Nov 9, 2020

github-actions bot added the stale-issue label Dec 10, 2020

edsantiago removed the stale-issue label Dec 14, 2020

github-actions bot added the stale-issue label Jan 14, 2021

edsantiago closed this as completed Jan 14, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI flake: podman run with new:pod and static-ip: Error adding network: failed to set bridge addr #7583

CI flake: podman run with new:pod and static-ip: Error adding network: failed to set bridge addr #7583

edsantiago commented Sep 10, 2020

Luap99 commented Sep 11, 2020

Luap99 commented Sep 13, 2020

PavelSosin-320 commented Sep 14, 2020

Luap99 commented Sep 14, 2020

PavelSosin-320 commented Sep 14, 2020

mheon commented Sep 14, 2020

PavelSosin-320 commented Sep 14, 2020

edsantiago commented Oct 5, 2020

github-actions bot commented Nov 7, 2020

rhatdan commented Nov 7, 2020

edsantiago commented Nov 9, 2020

github-actions bot commented Dec 10, 2020

rhatdan commented Dec 10, 2020

edsantiago commented Dec 14, 2020

github-actions bot commented Jan 14, 2021

edsantiago commented Jan 14, 2021

CI flake: podman run with new:pod and static-ip: Error adding network: failed to set bridge addr #7583

CI flake: podman run with new:pod and static-ip: Error adding network: failed to set bridge addr #7583

Comments

edsantiago commented Sep 10, 2020

Luap99 commented Sep 11, 2020

Luap99 commented Sep 13, 2020

PavelSosin-320 commented Sep 14, 2020

Luap99 commented Sep 14, 2020

PavelSosin-320 commented Sep 14, 2020

mheon commented Sep 14, 2020

PavelSosin-320 commented Sep 14, 2020

edsantiago commented Oct 5, 2020

github-actions bot commented Nov 7, 2020

rhatdan commented Nov 7, 2020

edsantiago commented Nov 9, 2020

github-actions bot commented Dec 10, 2020

rhatdan commented Dec 10, 2020

edsantiago commented Dec 14, 2020

github-actions bot commented Jan 14, 2021

edsantiago commented Jan 14, 2021