Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI flake: podman run with new:pod and static-ip: Error adding network: failed to set bridge addr #7583

Closed
edsantiago opened this issue Sep 10, 2020 · 16 comments
Assignees
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@edsantiago
Copy link
Member

The symptom is:

  podman run with new:pod and static-ip
 ...
Running: podman [options] network create --subnet 10.25.40.0/24 podmantestnetwork2
/etc/cni/net.d/podmantestnetwork2.conflist
Running: podman [options] run -t -i --rm --pod new:testpod --net podmantestnetwork2 --ip 10.25.40.128 docker.io/library/alpine:latest ip addr
time="2020-09-03T15:37:25-05:00" level=error msg="Error adding network: failed to set bridge addr: \"cni-podman1\" already has an IP address different from 10.25.40.1/24"
time="2020-09-03T15:37:25-05:00" level=error msg="Error while adding pod to CNI network \"podmantestnetwork2\": failed to set bridge addr: \"cni-podman1\" already has an IP address different from 10.25.40.1/24"
time="2020-09-03T15:37:25-05:00" level=error msg="error starting some container dependencies"
time="2020-09-03T15:37:25-05:00" level=error msg="\"error configuring network namespace for container 6091376fe654ca512bc885c89c0026c76549f8a47e174c37819e77f8c5678cfe: failed to set bridge addr: \\\"cni-podman1\\\" already has an IP address different from 10.25.40.1/24\""
Error: error starting some containers: internal libpod error

These are all three-time failures, causing the entire CI job to fail, probably because the first failure leaves things in a bad state for the test reruns -- subsequent attempts fail with:

Running: podman [options] network create --subnet 10.25.40.0/24 podmantestnetwork2
Error: network 10.25.40.0/24 is already being used by a cni configuration

So, at a minimum, I would suggest fixing the test so as to clean up after itself.

Flake history:

@edsantiago edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Sep 10, 2020
edsantiago added a commit to edsantiago/libpod that referenced this issue Sep 10, 2020
Problem: if either of the two "podman network create" tests
fail, all subsequent retries will also fail because the
created network has not been cleaned up (so "network create"
will fail with EEXIST).

Solution: run "podman network rm" as deferred cleanup instead
of in each test.

This is NOT a fix for containers#7583 - it is just a way to allow
ginkgo to retry a failing test.

Signed-off-by: Ed Santiago <[email protected]>
@Luap99
Copy link
Member

Luap99 commented Sep 11, 2020

I will take a look.

@Luap99 Luap99 self-assigned this Sep 11, 2020
@Luap99
Copy link
Member

Luap99 commented Sep 13, 2020

OK, I don't thing there is anything wrong with the "new:pod and static-ip" command itself. I ran this test for hours on a slow and fast computer without any problems.

Looking at the error it seems that podman is trying to set a ip on an existing bridge interface (cni-podman1) which already has a different ip address.

You could force this error by doing something like this:

$ podman network create --subnet 10.25.10.0/24 testnet1
/etc/cni/net.d/testnet1.conflist
$ podman network inspect testnet1 | grep bridge
        "bridge": "cni-podman1",
        "type": "bridge"
$ podman run --rm --name test1 --net testnet1 --ip 10.25.10.10 alpine ip addr 
...

Don't delete this network to force this error.
Create the second network:
conf=$(podman network create --subnet 10.25.20.0/24 testnet2)
Edit the config bridge name to the same as testnet1 (cni-podman1)
vi $conf
Now run this and you get the error:

podman run --rm --name test2 --net testnet2 --ip 10.25.20.10 alpine ip addr
ERRO[0000] Error adding network: failed to set bridge addr: "cni-podman1" already has an IP address different from 10.25.20.1/24 
ERRO[0000] Error while adding pod to CNI network "testnet2": failed to set bridge addr: "cni-podman1" already has an IP address different from 10.25.20.1/24 
Error: error configuring network namespace for container 9f39e72bbf437b2a3bf59da51f0d1d4ce7e350cb04a0d8013d8250bc3754b9f7: failed to set bridge addr: "cni-podman1" already has an IP address different from 10.25.20.1/24

So the real problem could be that podman network create creates a config with an already used bridge interface name. I'm not sure if there is a relation to the previous test which also creates and removes a network.

@PavelSosin-320
Copy link

Similar in my Podman running on WSL CentOS 8.1 when I try to run Theia container:
failed to set bridge addr: "cni-podman0" already has an IP address different from 10.88.2.1/24
While podman network inspect podman | grep bridge
"bridge": "cni-podman0",
"type": "bridge"
and IP -a shows:
6: cni-podman0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0
valid_lft forever preferred_lft forever
inet6 fe80::84db:ff:fe4e:2a72/64 scope link
valid_lft forever preferred_lft forever
But in my case IP can be in use on the host via WSL2 Virtual switch or WSL's localhost. The IP -a says:
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:5d:b7:2f:ec brd ff:ff:ff:ff:ff:ff
inet 172.22.74.96/20 brd 172.22.79.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::215:5dff:feb7:2fec/64 scope link
valid_lft forever preferred_lft forever
6: cni-podman0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 brd 10.88.255.255 scope global cni-podman0
valid_lft forever preferred_lft forever
inet6 fe80::84db:ff:fe4e:2a72/64 scope link
valid_lft forever preferred_lft forever

Which IP the Podman bridge listens? - Must be 0.0.0.0 because WSL 2 V-switch implements NIC only.

@Luap99
Copy link
Member

Luap99 commented Sep 14, 2020

@PavelSosin-320 What command are you using?

@PavelSosin-320
Copy link

I use
podman run -P --pod new:cheWsServerNetLog --log-driver="json-file" --net podman docker.io/theiaide/theia

on WSL2 Centos 8.1 image.
But it looks like a generic issue because I see the Podman bridge interface in ip -j -4 a show cni-podman0 output as exists in my VM
[{"ifindex":6,"ifname":"cni-podman0" ....... "addr_info":[{"family":"inet","local":"10.88.0.1","prefixlen":16,"broadcast":"10.88.255.255","scope":"global"
I suppose that in WSL VM or other VM scenario with V-switch used only one bridge can exist. I expect [INFO] "already exists message.

@mheon
Copy link
Member

mheon commented Sep 14, 2020

@PavelSosin-320 I'm confused as to the question about what IP the CNI bridge listens on - bridges don't listen on any IP, because they're a layer 2 construct. There's an interface on the host in the bridge and another interface for each container to allow communications to the gateway, and then we configure iptables in masquerade mode to NAT the subnet in use on the bridge. It seems like we're not even getting that far, as it's complaining that it's unable to configure the bridge?

@PavelSosin-320
Copy link

Yes, it is true! I don't see how to configure the bridge. There is the long-lasting discussion about the bridge vs NIC in the WSL 2 VM. The is neither a bridge nor even a fully-functional V-switch in the WSL2. The eth0 in the VM is created on-the-fly and IP address is generated every time when Windows LXSS service is started. The VM itself is a singleton and I see cni-podman0 interface from the CentOS7 distro running Docker side. And vise-verse, I see Docker0 bridge from the CentOS8.1 distro's side running Podman.
This is a very challenging WSL limitation :( from my point of view because it requires hard network separation between Docker and Podman.

@edsantiago
Copy link
Member Author

Still happening: log (on #7926)

@rhatdan rhatdan added the kind/test-flake Categorizes issue or PR as related to test flakes. label Oct 7, 2020
@github-actions
Copy link

github-actions bot commented Nov 7, 2020

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 7, 2020

@edsantiago @Luap99 Any movement on this issue?

@edsantiago
Copy link
Member Author

Still happening, although it looks like the cleanup has worked so it's not triple-failing any more (and hence not causing complete CI-run failures). Logs below are November and October only, I choose to skip the many late-September instances:

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 10, 2020

@edsantiago Does this still happen, or was this fixed by #7943 ?

@edsantiago
Copy link
Member Author

Looks like #7943 merged on October 7. In addition to the October/November incidents mentioned in a comment above, we have the following since then:

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Member Author

Only one instance in the last month, and it was a one-off (i.e. didn't cause a CI failure). Guess I'll close and hope for the best.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

5 participants