test/system: pasta_test_do add explicit port check #23561

Luap99 · 2024-08-09T13:47:33Z

Do not rely on an arbitrary delay in order to ensure the port was bound in the container. Instead this approach checks if the port is bound in the netns and only then starts the client. This speeds up the entire test file by 50% but more importantly in parallel testing it solves hangs as the timeout there was unreliable.

Fixes #23471

Does this PR introduce a user-facing change?

None

Luap99 · 2024-08-09T13:47:59Z

@edsantiago @sbrivio-rh PTAL

sbrivio-rh · 2024-08-09T14:23:19Z

Hah, it's similar to what I was drafting (but I was nowhere near the point you are, that is, ...completion). Using ss is great, I didn't know it was available in the images and I was trying to come up with decent helpers for /proc/net/tcp and friends.

Review coming in a bit.

sbrivio-rh · 2024-08-09T14:25:21Z

test/system/505-networking-pasta.bats

-                   sh -c 'for port in $(seq '"${xseq}"'); do '\
-'                             socat -u '"${bind}"' '"${recv}"' & '\
-'                         done; wait'
+    # Wait for the client childs to finish.


"children" (those weird plurals that look like German :D)

... maybe I say "kids" then :)
Will fix that up on the next push

Luap99 · 2024-08-09T15:00:16Z

Hah, it's similar to what I was drafting (but I was nowhere near the point you are, that is, ...completion). Using ss is great, I didn't know it was available in the images and I was trying to come up with decent helpers for /proc/net/tcp and friends.

I used podman unshare nsenter to join only the users+netns so I can use the hosts executables. But well that doesn't seem to work because the podman unshare is not supported via remote client. But sounds like you are right I Can us ss in the image which allows me to convert that to podman exec which works via remote

sbrivio-rh

Github is telling me: "This diff is outdated, please refresh and try again." and it doesn't let me add any further comment, but I was in the middle of the review. So here comes a first part of it.

sbrivio-rh · 2024-08-09T14:49:21Z

test/system/505-networking-pasta.bats

-        # With Podman up, and socat still starting, UDP clients send to nowhere
-        delay=2
-    fi
+    # start server


# Start server

sbrivio-rh · 2024-08-09T14:50:37Z

test/system/505-networking-pasta.bats

+    run_podman container inspect --format "{{.NetworkSettings.SandboxKey}}" $cname
+    netns="$output"
+
+    # Make sure all ports in the contianer are bound


sbrivio-rh · 2024-08-09T14:58:41Z

test/system/505-networking-pasta.bats

+    for cport in $(seq ${xseq}); do
+        retries=50
+        while [[ $retries -gt 0 ]]; do
+            run_podman unshare nsenter -n"$netns" ss -Htuln -$ip_ver sport = $cport


I'm not sure if this is robust enough in practice: you pick both UDP sockets and TCP sockets. What happens if a UDP port, with the same number as the TCP port we're using for a TCP test, happens to be bound, while our TCP port isn't?

I guess you could do, before this, something like [ "${proto}" = "tcp" ] && ss_opts="Htln" || ss_opts="Huln".

Or even, above, where proto_upper is assigned, proto_short="$(expr substr "${proto}" 1 1)", and then ss -H"${proto_short}"ln ...

yes I ignored that case because there doesn't seem to be any test using both tcp and udp so this isn't a concern in practise. I find that easier than adding extra conditions to filter for tcp/udp

True, but this is on the "fast path", and ss isn't always cheap, because it has to fetch the whole procfs entry:

$ unshare -rUn # time for i in $(seq 1 1000); do ss -Htuln -4 sport = 80 >/dev/null; done real 0m10.573s user 0m0.248s sys 0m10.356s # time for i in $(seq 1 1000); do ss -Huln -4 sport = 80 >/dev/null; done real 0m1.178s user 0m0.869s sys 0m0.343s # time for i in $(seq 1 1000); do ss -Htln -4 sport = 80 >/dev/null; done real 0m10.576s user 0m0.318s sys 0m10.289s

And it's not negligible either for 50 retries:

# time for i in $(seq 1 50); do ss -Htln -4 sport = 80 >/dev/null; done real 0m0.522s user 0m0.022s sys 0m0.502s # time for i in $(seq 1 50); do ss -Huln -4 sport = 80 >/dev/null; done real 0m0.073s user 0m0.048s sys 0m0.027s

ss doesn't fetch anything from procfs? It uses netlink (NETLINK_SOCK_DIAG) to get the sockets.

Ok I agree if there are timing benefits, actually no need for special parsing in the test I can add --$proto instead of using the short option.

Aah, right, it uses NETLINK_SOCK_DIAG for everything as long as tcp_show_netlink() succeeds. Still, the cost doesn't come from the interface, but rather from the kernel preparing the TCP socket table dump (UDP is somewhat more straightforward).

test/system/505-networking-pasta.bats

sbrivio-rh · 2024-08-09T15:03:42Z

test/system/505-networking-pasta.bats

+        retries=50
+        while [[ $retries -gt 0 ]]; do
+            run_podman unshare nsenter -n"$netns" ss -Htuln -$ip_ver sport = $cport
+            if [[ $output =~ $cport ]]; then


For some reason, this isn't working on IPv6 "tap" tests. It works for IPv4, and for IPv6 loopback tests. I'm not really familiar with non-POSIX shell syntax, so here comes the relevant output of one failing test as example:

✗ [505] Single UDP port forwarding, IPv6, tap (from function `run_podman' in file helpers.bash, line 527, from function `pasta_test_do' in file 505-networking-pasta.bats, line 247, in test file 505-networking-pasta.bats, line 634) `pasta_test_do' failed [16:59:47.268721829] $ podman info --format {{.Host.Pasta.Executable}} [16:59:48.118448097] /usr/bin/pasta [16:59:48.179590796] $ podman run -d --name=c-socat-t13-1bowvp8k --net=pasta -p [2a01:4f8:222:904::2]:5095:5095/udp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5095 5095); do socat -u UDP6-LISTEN:${port},null-eof STDOUT & done; wait [16:59:48.291610727] d07505ed44b255ca84dff14d3fd0f7c527fb2c93bbcafc1f8aebb320686cb9e8 [16:59:48.296102453] $ podman container inspect --format {{.NetworkSettings.SandboxKey}} c-socat-t13-1bowvp8k [16:59:48.325410606] /run/user/1000/netns/netns-760759d3-9a23-882f-bb67-63c86c556a40 [16:59:48.332287966] $ podman unshare nsenter -n/run/user/1000/netns/netns-760759d3-9a23-882f-bb67-63c86c556a40 ss -Htuln -6 sport = 5095 [16:59:48.398745612] udp UNCONN 0 0 *:5095 *:* [16:59:48.411588810] $ podman logs --follow c-socat-t13-1bowvp8k [17:01:48.439216198] timeout: sending signal TERM to command ‘podman’ [17:01:48.442469452] [ rc=124 (** EXPECTED 0 **) ] *** TIMED OUT *** # [teardown]

perhaps you could simply check if the output is non-empty [ -n "${output}" ] && break instead?

I think the issue is that I didn't quote the vars

Even if I quote them ("$output" and "$cport"):

✗ [505] Single TCP port forwarding, IPv6, tap (from function `run_podman' in file helpers.bash, line 527, from function `pasta_test_do' in file 505-networking-pasta.bats, line 247, in test file 505-networking-pasta.bats, line 534) `pasta_test_do' failed [17:26:22.335618426] $ podman info --format {{.Host.Pasta.Executable}} [17:26:23.199697073] /usr/bin/pasta [17:26:23.290020913] $ podman run -d --name=c-socat-t1-wk8cq5z3 --net=pasta -p [2a01:4f8:222:904::2]:5782:5782/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5782 5782); do socat -u TCP6-LISTEN:${port} STDOUT & done; wait [17:26:23.398206770] 557a806bc97e4d61782e6fd20f58aef75085a7723b938ed94144d53595879cb5 [17:26:23.402789587] $ podman container inspect --format {{.NetworkSettings.SandboxKey}} c-socat-t1-wk8cq5z3 [17:26:23.473008406] /run/user/1000/netns/netns-8f9ed356-7b29-dd1e-af2a-e905164d429e [17:26:23.479608036] $ podman unshare nsenter -n/run/user/1000/netns/netns-8f9ed356-7b29-dd1e-af2a-e905164d429e ss -Htuln -6 sport = 5782 [17:26:23.514649930] tcp LISTEN 0 5 *:5782 *:* [17:26:23.527285450] $ podman logs --follow c-socat-t1-wk8cq5z3 [17:28:23.551330353] timeout: sending signal TERM to command ‘podman’ [17:28:23.554501527] [ rc=124 (** EXPECTED 0 **) ] *** TIMED OUT *** # [teardown]

by the way, I'm not sure if the assert for the 5-second timeout is actually working: the test times out after approximately two minutes.

[ -n "${output}" ] && break doesn't work either, actually :(

I cannot test this on my v6 setup it seems the test still require a public routeable address even though pasta doesn't anymore.

ok so that means we still need to work around here until it gets fixed on your side

Well, it's not a workaround and it wouldn't be a fix :) ...but yes.

@Luap99 with Fedora's passt-0^20240821.g1d6142f-1.fc40, passt-0^20240821.g1d6142f-1.fc41, passt-0^20240821.g1d6142f-1.fc42, and Debian's passt-0.0~git20240814.61c0b0d-1, we can drop the additional delay here. I can also submit a pull request once the CI images are updated.

As of yesterday pasta was 2024-08-14 on debian, f40, and rawhide; but still 06-24 on f39. I've just +1'ed its bodhi.

but still 06-24 on f39

Oops, I kept pushing updates and obsoleting the previous ones. At least I avoided that this time.

sbrivio-rh · 2024-08-09T15:10:21Z

I used podman unshare nsenter to join only the users+netns so I can use the hosts executables.

Oh, I missed this "subtlety" altogether.

But well that doesn't seem to work because the podman unshare is not supported via remote client. But sounds like you are right I Can us ss in the image which allows me to convert that to podman exec which works via remote

I'm happy my uninformed comment led to something useful. :)

By the way, it was also like that in the approach I attempted (but simply because I didn't know I could easily run host stuff in the namespace -- I was going through procfs entries with grep).

test/system/505-networking-pasta.bats

Do not rely on an arbitrary delay in order to ensure the port was bound in the container. Instead this approach checks if the port is bound in the netns and only then starts the client. This speeds up the entire test file by 50% but more importantly in parallel testing it solves hangs as the timeout there was unreliable. Fixes containers#23471 Signed-off-by: Paul Holzinger <[email protected]>

edsantiago · 2024-08-12T12:34:33Z

This is one of the flakes I've been seeing but I don't think I've filed yet. f39 EDIT: no, I did file it, #23482

<+060ms> # $ podman logs --follow c-socat-t612-o9ooo3we
<+155ms> # af4739753578c80b5f9f151c3c2db3fe  -
         # 2024/08/12 12:24:36 socat[3] W waitpid(0, 0x7ffd6999885c, 0): No child process
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: Mismatch between data sent and received
         # #| expected: = af4739753578c80b5f9f151c3c2db3fe  -
         # #|   actual:   af4739753578c80b5f9f151c3c2db3fe  -
         # #|         >   2024/08/12 12:24:36 socat\[3\] W waitpid\(0\, 0x7ffd6999885c\, 0\): No child process
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Luap99 · 2024-08-12T12:37:50Z

#23482

I have not seen this locally running running over one hour but I don't think it is related to this here.

edsantiago · 2024-08-13T17:13:05Z

Finally got around to conflict-resolving this onto my bats-parallel. I've seen no failures either in a long while loop.

LGTM.

openshift-ci · 2024-08-13T17:30:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, sbrivio-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mheon · 2024-08-13T18:00:40Z

/lgtm

…erface It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address. While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6. This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches. As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep track of the last sequence we sent (last address we changed), and deal with the answers to those NLM_F_REPLACE requests in a separate loop, later. Link: containers/podman#23561 (comment) Signed-off-by: Stefano Brivio <[email protected]> Reviewed-by: David Gibson <[email protected]>

openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 9, 2024

sbrivio-rh reviewed Aug 9, 2024

View reviewed changes

Luap99 force-pushed the test-pasta-port branch from 3c918af to f75979a Compare August 9, 2024 15:03

sbrivio-rh reviewed Aug 9, 2024

View reviewed changes

test/system/505-networking-pasta.bats Outdated Show resolved Hide resolved

Luap99 force-pushed the test-pasta-port branch 2 times, most recently from 306df3c to 58141d0 Compare August 9, 2024 15:47

Luap99 force-pushed the test-pasta-port branch from 58141d0 to 20f3e89 Compare August 12, 2024 11:47

sbrivio-rh approved these changes Aug 13, 2024

View reviewed changes

openshift-ci bot assigned mheon Aug 13, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024

openshift-merge-bot bot merged commit 17baab0 into containers:main Aug 13, 2024
58 checks passed

Luap99 deleted the test-pasta-port branch August 13, 2024 18:18

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 20, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/system: pasta_test_do add explicit port check #23561

test/system: pasta_test_do add explicit port check #23561

Luap99 commented Aug 9, 2024

Luap99 commented Aug 9, 2024

sbrivio-rh commented Aug 9, 2024

sbrivio-rh Aug 9, 2024

Luap99 Aug 9, 2024

Luap99 commented Aug 9, 2024

sbrivio-rh left a comment

sbrivio-rh Aug 9, 2024

sbrivio-rh Aug 9, 2024

sbrivio-rh Aug 9, 2024

Luap99 Aug 9, 2024

sbrivio-rh Aug 9, 2024

Luap99 Aug 9, 2024

sbrivio-rh Aug 9, 2024

sbrivio-rh Aug 9, 2024

Luap99 Aug 9, 2024

sbrivio-rh Aug 9, 2024

sbrivio-rh Aug 9, 2024

Luap99 Aug 9, 2024

Luap99 Aug 12, 2024

sbrivio-rh Aug 12, 2024

sbrivio-rh Aug 21, 2024

edsantiago Aug 21, 2024

sbrivio-rh Aug 21, 2024

sbrivio-rh commented Aug 9, 2024

edsantiago commented Aug 12, 2024 •

edited

Loading

Luap99 commented Aug 12, 2024

edsantiago commented Aug 13, 2024

openshift-ci bot commented Aug 13, 2024

mheon commented Aug 13, 2024

test/system: pasta_test_do add explicit port check #23561

test/system: pasta_test_do add explicit port check #23561

Conversation

Luap99 commented Aug 9, 2024

Does this PR introduce a user-facing change?

Luap99 commented Aug 9, 2024

sbrivio-rh commented Aug 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luap99 commented Aug 9, 2024

sbrivio-rh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbrivio-rh commented Aug 9, 2024

edsantiago commented Aug 12, 2024 • edited Loading

Luap99 commented Aug 12, 2024

edsantiago commented Aug 13, 2024

openshift-ci bot commented Aug 13, 2024

mheon commented Aug 13, 2024

edsantiago commented Aug 12, 2024 •

edited

Loading