ci: unable to find network with name or ID podman-default-kube-network #17946

edsantiago · 2023-03-27T17:35:53Z

podman play kube --no-host
...
podman [options] play kube --no-hosts /tmp/podman_test3817601232/kube.yaml
...
starting container <sha>: unable to find network with name or ID podman-default-kube-network: network not found
starting container <sha>: a dependency of container <sha> failed to start: container state improper
Error: failed to start 2 containers

Probably a collision between multiple tests. Predicted solution: rewrite tests to stop using default network, or at least so there's at most one test that does so.

The text was updated successfully, but these errors were encountered:

edsantiago · 2023-03-27T17:45:09Z

...but then again, there's this flake:

  podman network create with name and IPv6 flag (dual-stack)
...
# podman [options] run -it --rm --network dual-36384dcdcad634f5feb9a53eadb5e202d18e479679d0d891d61b6cf7340b1a56 quay.io/libpod/alpine:latest sh -c ip addr show eth0 |  grep global | awk ' /inet6 / {print $2}'
time="2023-03-24T18:57:46-05:00" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
fd00:4:3:2::2/64

# podman [options] run -it --rm --network dual-36384dcdcad634f5feb9a53eadb5e202d18e479679d0d891d61b6cf7340b1a56 quay.io/libpod/alpine:latest sh -c ip addr show eth0 |  awk ' /inet / {print $2}'
time="2023-03-24T18:57:47-05:00" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
Error: unable to find network with name or ID dual-36384dcdcad634f5feb9a53eadb5e202d18e479679d0d891d61b6cf7340b1a56: network not found

The string "3638" does not appear anywhere else in this log. And it's generated via stringid.GenerateRandomID(), hence is unlikely to be a collision. I'm wondering if some test is doing podman network rm -a (could not find that in test dir), or maybe the system reset test is not being properly locked?

Luap99 · 2023-03-27T17:50:09Z

There is no --all flag for podman network rm, but yes the system reset or prune commands can cause issues.
In the past I fixed these test to use their own custom network config dir but I guess there are still some without that fix.

edsantiago · 2023-03-28T11:38:09Z

Yet another possibly-similar failure

  podman verify network scoped DNS server and also verify updating network dns server
...
# podman-remote [options] network update IntTestf438e02a77 --dns-add 7.7.7.7
Error: unable to find network with name or ID IntTestf438e02a77: network not found
# podman-remote [options] network rm -f IntTestf438e02a77
time="2023-03-27T17:05:55-05:00" level=error msg="IPAM error: could not find network \"IntTestf438e02a77\""
time="2023-03-27T17:05:55-05:00" level=error msg="Unable to clean up network for
      container 528ec6ba9f4e0d266b138bb10579250e83dd9d7fa5ce6fbc77e8bee3ce367d7d:
      \"tearing down network namespace configuration for
      container 528ec6ba9f4e0d266b138bb10579250e83dd9d7fa5ce6fbc77e8bee3ce367d7d: 
     failed to convert net opts: unable to find network with name or ID IntTestf438e02a77: network not found\""

Luap99 · 2023-03-29T13:17:15Z

I think it is time to go with the big hammer and make every test case use its own config dir just likes --root and --runroot.
This will make sure there are no conflicts and we can in theory remove this stupid extra defer podmanTest.removeNetwork(...) that we have to use in every single test.

vrothberg · 2023-03-29T13:19:47Z

That sounds very reasonable, @Luap99.

The e2e test are isolated and have their own --root/--runroot arguments. However networks were always shared, this causes problem with tests that do a prune or reset because they can effect other parallel running tests. Over the time I fixed some of of these cases to use their own config dir but containers#17946 suggests that this is not enough. Instead of trying to find and fix these tests just go with the big hammer and make every test use a new clean network config directory. This will also make the use of `defer podmanTest.removeNetwork(...)` unnecessary. This is required at the moment for every test which creates a network. However to keep the diff small and to see if it is even working I will do it later in a follow up commit. Fixes containers#17946 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2023-03-30T15:42:54Z

Just linking #17975 (comment) here again, my change will not work so we actually have to go through all test which do prune or reset.

edsantiago · 2023-04-06T11:42:21Z

Flakes in the past six days, am reporting in case it's helpful to see which tests are failing so you can at least target those:

debian-12 : int remote debian-12 root host sqlite [remote]
- PR Ed's pet PR with no flake retries #17831
  - 04-05 19:32 in Podman run networking [It] podman verify network scoped DNS server and also verify updating network dns server
fedora-36 : int podman fedora-36 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 04-05 18:33 in Podman play kube with build [It] Check that image is built using Containerfile
fedora-37 : int podman fedora-37 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 04-05 18:29 in Podman run networking [It] Aardvark Test 2: Two containers, same subnet
fedora-37 : int remote fedora-37 root host boltdb [remote]
- PR Ed's pet PR with no flake retries #17831
  - 04-05 18:35 in Podman run networking [It] podman verify network scoped DNS server and also verify updating network dns server

Since commit f250560 the play kube command uses its own network. this is racy be design because we create the network followed by creating/running pod/containers. This means in the meantime another prune or reset process could wipe out the network config because we have to share the network config directory by design in the test. The problem is we only have one host netns which is shared between tests. If the network config dir is not shared we cannot make conflict checks for interface names and ip address. This results in different tests trying to use the same interface and/or ip address which will cause runtime failures in CNI and netavark. The only solution I see is to make sure only the reset/prune tests are using a custom network dir. This makes sure they do not wipe configs that are otherwise required by other parallel running tests. Fixes containers#17946 Signed-off-by: Paul Holzinger <[email protected]>

...in "built using Dockerfile" test and "play kube fail with custom selinux label" test. The latter, since it's in a test file with lots of other kube tests, I just put into BeforeEach(). References: Issue containers#17946, PR containers#18085 Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2023-04-20T11:14:23Z

Seen yesterday, in a fully-rebased PR, f36 root. Reopening.

Luap99 · 2023-04-20T11:33:11Z

I found two prune tests which were missing the custom network dir.

Luap99 · 2023-04-20T11:34:31Z

podman system prune --volume is directly logged after the failing test which deletes the config so it explains the flake. I create a PR.

Adds two custom config dirs to tests that were missed in commit dc9a65e. Fixes containers#17946 (hopefully finally) Signed-off-by: Paul Holzinger <[email protected]>

edsantiago added the flakes Flakes from Continuous Integration label Mar 27, 2023

edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2023

edsantiago mentioned this issue Mar 28, 2023

Epic: ginkgo: remove -flakeAttempts 3 #17967

Closed

Luap99 self-assigned this Mar 29, 2023

Luap99 mentioned this issue Mar 29, 2023

test/e2e: use fresh network config dir for each test #17975

Closed

Luap99 mentioned this issue Apr 6, 2023

test/e2e: use custom network config dir where needed #18085

Merged

openshift-merge-robot closed this as completed in #18085 Apr 6, 2023

edsantiago mentioned this issue Apr 19, 2023

Fix more "podman-default-kube-network" flakes #18273

Closed

edsantiago reopened this Apr 20, 2023

Luap99 added a commit to Luap99/libpod that referenced this issue Apr 20, 2023

test/e2e: use custom network config v2

3aaa279

Adds two custom config dirs to tests that were missed in commit dc9a65e. Fixes containers#17946 (hopefully finally) Signed-off-by: Paul Holzinger <[email protected]>

Luap99 mentioned this issue Apr 20, 2023

test/e2e: use custom network config v2 #18281

Merged

edsantiago closed this as completed in #18281 Apr 20, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 26, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: unable to find network with name or ID podman-default-kube-network #17946

ci: unable to find network with name or ID podman-default-kube-network #17946

edsantiago commented Mar 27, 2023

edsantiago commented Mar 27, 2023

Luap99 commented Mar 27, 2023

edsantiago commented Mar 28, 2023

Luap99 commented Mar 29, 2023

vrothberg commented Mar 29, 2023

Luap99 commented Mar 30, 2023

edsantiago commented Apr 6, 2023

edsantiago commented Apr 20, 2023

Luap99 commented Apr 20, 2023

Luap99 commented Apr 20, 2023

ci: unable to find network with name or ID podman-default-kube-network #17946

ci: unable to find network with name or ID podman-default-kube-network #17946

Comments

edsantiago commented Mar 27, 2023

edsantiago commented Mar 27, 2023

Luap99 commented Mar 27, 2023

edsantiago commented Mar 28, 2023

Luap99 commented Mar 29, 2023

vrothberg commented Mar 29, 2023

Luap99 commented Mar 30, 2023

edsantiago commented Apr 6, 2023

edsantiago commented Apr 20, 2023

Luap99 commented Apr 20, 2023

Luap99 commented Apr 20, 2023