Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
connectivity/check: Fix wrong NodePort service selection on validation
It is possible for the tuples of node IP and port to be mismatched in the case of NodePort services, causing the connectivity test to try to establish a connection to an non-existent tuple. For example, see the following output: ``` ⌛ [gke_cilium-dev_us-west2-a_chris] Waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node) to become ready... 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 1: 🐛 Error waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node): command terminated with exit code 130: Connectivity test failed: timeout reached waiting for NodePort 10.168.0.14:30774 (cilium-test/echo-other-node) ``` Nodes: ``` $ k get nodes -o wide NAME INTERNAL-IP gke-chris-default-pool-1602ae11-bn2n 10.168.0.14 gke-chris-default-pool-1602ae11-ffsh 10.168.0.3 ``` Cilium pods: ``` $ ks get pods -o wide | rg cilium cilium-7sq59 10.168.0.14 gke-chris-default-pool-1602ae11-bn2n cilium-mbvxl 10.168.0.3 gke-chris-default-pool-1602ae11-ffsh ``` Services: ``` $ k -n cilium-test get svc NAME TYPE CLUSTER-IP PORT(S) echo-other-node NodePort 10.28.29.66 8080:30774/TCP echo-same-node NodePort 10.28.23.18 8080:32186/TCP ``` Echo pods: ``` $ k -n cilium-test get pods -o wide NAME READY STATUS IP NODE client-6488dcf5d4-bxlcp 1/1 Running 10.32.1.176 gke-chris-default-pool-1602ae11-bn2n client2-5998d566b4-lgxrt 1/1 Running 10.32.1.191 gke-chris-default-pool-1602ae11-bn2n echo-other-node-f4d46f75b-rgzbk 1/1 Running 10.32.0.11 gke-chris-default-pool-1602ae11-ffsh echo-same-node-745bd5c77-mxwp7 1/1 Running 10.32.1.63 gke-chris-default-pool-1602ae11-bn2n ``` If we take the pod "echo-other-node-f4d46f75b-rgzbk", it resides on node "gke-chris-default-pool-1602ae11-ffsh", which has node IP of 10.168.0.3. However, if we look at the CLI output, it is trying to establish a connection to the other node IP, 10.168.0.14, which is obviously wrong. Fix this by checking if the echo pod resides on the same node as the node for the service. Fixes: #342 Signed-off-by: Chris Tarazi <[email protected]>
- Loading branch information