-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make encryption-related connectivity tests more reliable #2035
Conversation
27108e5
to
2ee4044
Compare
2ee4044
to
16e2425
Compare
5bb2293
to
51d7a88
Compare
Closed and reopened to retrigger tests. |
51d7a88
to
33b2591
Compare
33b2591
to
98147cf
Compare
EKS (tunnel) failure is addressed by #2045 |
Converting back to draft while investigating one failure surfaced in https://github.com/cilium/cilium/actions/runs/6544264527/job/17770427183 |
98147cf
to
48a0d80
Compare
E2E and ipsec tests completed correctly in cilium/cilium#28635 |
Back in review state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Small nit.
48a0d80
to
472531f
Compare
The EKS failure looks legit - https://github.com/cilium/cilium-cli/actions/runs/6550230402/job/17788873914?pr=2035. |
Yes, it is addressed by #2045 (that's why this PR is still blocked). Essentially node to pod connectivity is broken due to the leftover iptables chains, and apparently we are not testing that path anywhere else. |
Currently, we always create the host-netns daemonset in a single cluster only, even when clustermesh is enabled. Yet, this can cause issues when connectivity tests depend on its presence, such as node-to-node encryption. Hence, let's adapt the logic to create it in both clusters. Signed-off-by: Marco Iorio <[email protected]>
When running in tunnel mode, we currently validate that pod to pod connectivity does not leak any unencrypted packet (if either IPSec or WireGuard are enabled) by constructing a tcpdump filter that uses as source IP address the one of the interface the traffic is routed to, instead of the source IP of the client. According to the commit message of f512df4, this was required to correctly test node to node connectivity, which leverages the same logic. Yet, this breaks the check for pod to pod traffic, which does not get masquerated. Indeed, the current check would not fail when forcefully run on a cluster with encryption disabled. Let's instead modify the filter, so that we 'OR' together both the original client IP and the one of the outgoing interface, accounting for both scenarios. Signed-off-by: Marco Iorio <[email protected]>
When running in ENI mode, the outgoing interface of pod originating traffic is different from the one that would be used by host originating traffic towards the same destination. This breaks the current pod-to-pod encryption validation, as the source interface for the tcpdump filter is determined based on the routes towards the given destination only. Let's update the source interface determination to additionally consider the source address. This approach had been initially suggested by Paul Chaignon in cilium#1241, but then reverted in [1] because `ip route get` returns and error in case the `from` address is not assigned to any local interface. We can work around this by specifying an input interface: let's use lo as it should be always present. [1]: 2fc0835 ("connectivity: Fix iface derivation in encrypt tests") Signed-off-by: Marco Iorio <[email protected]>
When both tunneling and host firewall is enabled, traffic from a pod to a remote node gets forwarded through the tunnel to preserve the source identity. This breaks the current node-to-node encryption tests, though, which expect the packets to be seen on the native device. Let's adapt the source interface determination to override the detected device in this situation, and always sniff on the tunnel interface. Signed-off-by: Marco Iorio <[email protected]>
Currently, encryption checks are run only when encryption (either IPSec or WireGuard) is enabled. While this approach is sensible, it also led to cases in which the tests were broken in certain circumstances, and we didn't notice [1,2,3,4]. Let's instead always run these tests, asserting that unencrypted packets are either seen or not based on the feature set. Hence, we can immediately notice if there's a regression, and no unencrypted packets are detected even if encryption is not enabled. [1]: f512df4 ("connectivity: fix node-to-node encryption tests") [2]: HEAD~3 ("connectivity: fix pod-to-pod encryption validation") [3]: HEAD~2 ("connectivity: fix encryption validation when running in ENI mode") [4]: HEAD~1 ("connectivity: fix encryption validation when host firewall is enabled") Signed-off-by: Marco Iorio <[email protected]>
Currently, the encryption tests validate that no unencrypted packets are leaked by capturing them on the source only. Since we can capture only egressing packets (as the XFRM stack recirculates the ingressing ones during decryption), this means that we are checking only one direction. Let's extend this check to validate both directions, starting a tcpdump capture both on the source and destination hosts. We leverage the bidirectional validation for the pod-to-pod encryption case though, as with the node-to-node one it is particularly tricky to construct the correct filter without additional information (as packets might be masquerated at the source, and in that case they should be sniffed on a different interface). Signed-off-by: Marco Iorio <[email protected]>
Disable the encryption tests when running the connectivity tests on a single node cluster, as they require that source and destination are hosted on different nodes. Signed-off-by: Marco Iorio <[email protected]>
472531f
to
e2431dc
Compare
Kind / installation-and-connectivity (helm) hit #1894 |
Please refer to the individual commit messages for additional details.