Kubernetes LoadBalancer Service + Requests towards Load Balancer external IP from within same cluster = Connection refused #6244

tnn-simon · 2024-04-19T10:00:17Z

Describe the bug
Traffic does not reach the external IP associated with v1.Service (type: LoadBalancer) when source workloads and the v1.Service are hosted in same cluster, unless the source workloads are running on the same nodes as the endpoints associated with the v1.Service. Despite running Antrea in networkPolicyOnly mode.

To Reproduce

Provision AKS cluster with network plugin azure and network policy none.
Install Antrea in networkPolicyOnly mode.
Deploy an application exposed through a v1.Service of type LoadBalancer, with externalTrafficPolicy local and the following annotation: "service.beta.kubernetes.io/azure-load-balancer-internal: "true".
Deploy a new application to another node than the application from the previous step. No need for exposing it. The image should contain a HTTP client (e.g curl).
Open a shell into the application deployed in step 4 and try to call the service of step 3 through its .status.loadBalancer.ingress.ip[0]. Add verbose flag if using curl.
You should now receive an immediate error, e.g connection refused.

Expected
Expected Antrea not to drop traffic destined for an IP outside both the Pod CIDR and the Service CIDR (ClusterIP) of the cluster, despite of the externalTrafficPolicy set to local.

Actual behavior
Traffic gets dropped when the node of the source workload does not host any endpoints associated with the target service.

Versions:
Antrea version 1.15.1
Kubernetes version: 1.28.5
Container runtime: 1.7.14-1
Linux kernel version: 5.15.0-1059-azure

Additional context

The text was updated successfully, but these errors were encountered:

tnqn · 2024-04-19T10:33:56Z

@tnn-simon thank you for the report.

As we discussed in slack, the problem here is Antrea relies on PodCIDR of Node objects to differentiate whether the traffic comes from local Pods or not, but AKS doesn't set it in this mode. So the solution we may apply is to use a more generic way to differentiate local Pod generated traffic. But I want to mention that, with the solution, Pod to LoadBalancerIP traffic will be processed by Antrea and DNATed to a random endpoint of the Service in the cluster, instead of going to the external load balancer (if there is one) to perform load balancing, will it work for you?

tnn-simon · 2024-04-19T10:41:22Z

@tnqn: Sounds like something that could work, but I'd still have to test it to be completely certain. What is the rationale behind not sending the traffic through the external load balancer?

tnqn · 2024-04-19T10:49:32Z

It's the default behavior we assume most users would expect as it's the shorter path, and it's also the same behavior how kube-proxy would handle it. It's configurable and can be disabled by setting proxyLoadBalancerIPs to false, mainly useful for users who want the external load balancer to do some extra work like TLS termination.

When you tried proxyLoadBalancerIPs=false, actually kube-proxy took over the traffic (if you keep its default configuration), and it wouldn't reach the external load balancer either. However, I'm not sure why it doesn't get to the endpoint eventually at the moment, @hongliangl tried to a similar topology but couldn't reproduce it.

tnn-simon · 2024-04-19T12:54:24Z

If I disable Antrea and reboot the nodes, the issues disappear - just cross-checking. I'm testing using Helm (chart version 1.15.1) with these values:

trafficEncapMode: "networkPolicyOnly"
nodeIPAM:
  enable: false
antreaProxy:
  proxyLoadBalancerIPs: false

When running with proxyLoadBalancerIPs: false, the flow towards the load balancer IP does not show up in the flow-dump anymore (table=ServiceLB), but this entry is new:
cookie=0x6000000000000, duration=5138.787s, table=ServiceLB, n_packets=8, n_bytes=592, idle_age=2899, priority=0 actions=goto_table:EndpointDNAT

Inspected the dump of table=EndpointDNAT where I found this entry, corresponding to the endpoint associated with the target service:
cookie=0x6030000000000, duration=5146.591s, table=EndpointDNAT, n_packets=0, n_bytes=0, idle_age=5146, priority=200,tcp,reg3=0xaf40103,reg4=0x20050/0x7ffff actions=ct(commit,table=AntreaPolicyEgressRule,zone=65520,nat(dst=10.244.1.3:80),exec(set_field:0x10/0x10->ct_mark,move:NXM_NX_REG0[0..3]->NXM_NX_CT_MARK[0..3]))

Not sure what this indicates. My cluster does not have any network policy resources.

I'll attach a complete dump (ovs-ofctl dump-flows br-int).
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.0.0.0/16
Source Pod IP: 10.244.0.244
Target Pod IP: 10.244.1.3
Target Service ILB IP: 10.34.208.4

dump.log

tnn-simon · 2024-04-19T23:28:33Z

@tnqn: I'm attempting to get my head around this issue. Just out of curiosity, can you point to the source showing that kube-proxy does influence the traffic path for egress traffic destined for IPs assigned in .status of v1.Service of type LoadBalancer?

As you mentioned, one will typically send traffic through the external load-balancer to piggyback on features implemented by the load-balancer - in our case cluster-independent DNS. If one doesn't want the traffic to flow through the external load-balancer, wouldn't one use the ClusterIP of the service instead(?). I also struggle to understand why externalTrafficPolicy has significance here, I thought this field was intended for traffic arriving through the NodePort.

tnqn · 2024-04-20T03:50:53Z

Just out of curiosity, can you point to the source showing that kube-proxy does influence the traffic path for egress traffic destined for IPs assigned in .status of v1.Service of type LoadBalancer?

You can find how it handles "locally-originated pod -> external destination" by combining the following code:
https://github.com/kubernetes/kubernetes/blob/7f68d014e5d785472ba148c983c9d0abc6df9a36/pkg/proxy/iptables/proxier.go#L1245-L1253
https://github.com/kubernetes/kubernetes/blob/7f68d014e5d785472ba148c983c9d0abc6df9a36/pkg/proxy/iptables/proxier.go#L1110-L1119

one will typically send traffic through the external load-balancer to piggyback on features implemented by the load-balancer - in our case cluster-independent DNS. If one doesn't want the traffic to flow through the external load-balancer, wouldn't one use the ClusterIP of the service instead(?).

My understanding is, the handling is to assume users' applications are provided an unified address (the LB IP) for a service regardless of the location of the clients, but the intention is always the same Endpoints, so it short-circuits the traffic as it will come back anyway. However, since K8s 1.29, a feature called LoadBalancerIPMode was added to change the behavior, users can set .status.loadBalancer.ingress.ipMode to Proxy to prevent kube-proxy from handling the traffic towards LB IP, you can get more details from https://kubernetes.io/blog/2023/12/18/kubernetes-1-29-feature-loadbalancer-ip-mode-alpha/.

I also struggle to understand why externalTrafficPolicy has significance here, I thought this field was intended for traffic arriving through the NodePort.

externalTrafficPolicy and internalTrafficPolicy applies to service traffic based on the destination address. You can find the following explainations in their API spec:

ServiceExternalTrafficPolicy describes how nodes distribute service traffic they receive on one of the Service's "externally-facing" addresses (NodePorts, ExternalIPs, and LoadBalancer IPs).
ServiceInternalTrafficPolicy describes how nodes distribute service traffic they receive on the ClusterIP.

If I disable Antrea and reboot the nodes, the issues disappear - just cross-checking.

I saw the CIDR of Pod IPs have changed, is this the same cluster? By disabling Antrea, you mean removing antrea from the cluster or disabling Antrea's proxyLoadBalancerIPs? And the dump.log is collected when it works or not?

tnn-simon · 2024-04-21T10:11:04Z

I saw the CIDR of Pod IPs have changed, is this the same cluster? By disabling Antrea, you mean removing antrea from the cluster or disabling Antrea's proxyLoadBalancerIPs? And the dump.log is collected when it works or not?

This is a different cluster from the one referred to in Slack. This is based on the Azure CNI Overlay network plugin, but the issue is the same. Created it to get a dedicated test environment for this issue. By disabling Antrea, I mean removing Antrea from the cluster and reboot the nodes. The dump.log was collected when running Antrea with proxyLoadBalancerIPs: false, and this does not work - the connection times out.

Thank you so much for sharing your insights on the kube-proxy behaviour. I can confirm that traffic originated in the same cluster does not flow through the external load-balancer regardless of if Antrea is installed or not.

Regarding your first suggestion:

But I want to mention that, with the solution, Pod to LoadBalancerIP traffic will be processed by Antrea and DNATed to a random endpoint of the Service in the cluster, instead of going to the external load balancer (if there is one) to perform load balancing, will it work for you?

I think this sounds even better now that I know more about kube-proxy. Guess the suggested solution is to ignore the externalTrafficPolicy - which I guess is the current behaviour of kube-proxy for traffic originated internally (from our experience).

Still digging for the root cause of the timeouts. Our experience so far is as follows:

proxyLoadBalancerIPs: true, externalTrafficPolicy: Local = Connection refused
proxyLoadBalancerIPs: false, externalTrafficPolicy: Local = Connection timeout

When proxyLoadBalancerIPs is false the traffic hits both the veth of the Pod netns and the antrea-gw, but never reaches the eth0. I'm comparing this to my local Kind setup, with identical configuration, where the traffic reaches eth0 and connects successfully to the LB-associated endpoint on another node.

Before this commit, in AntreaProxy, to respect short-circuiting, when installing flows for an external Services, an extra flow with higher priority to match traffic sourced from local (local Pods or local Node) and destined for the external Service will be installed. This is achieved by matching the local Pod CIDR obtained from the local Node object. However, when Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node object is nil, resulting in the failure of install the extra flow mentioned above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic from local Pods or the local Node is introduced to mark the traffic from local. This reg mark can be used in all traffic mode. Fix antrea-io#6244 Signed-off-by: Hongliang Liu <[email protected]>

tnqn · 2024-04-22T12:29:03Z

I think this sounds even better now that I know more about kube-proxy. Guess the suggested solution is to ignore the externalTrafficPolicy - which I guess is the current behaviour of kube-proxy for traffic originated internally (from our experience).

Yes, Antrea is implemented to be identical to kube-proxy for most Service features. Other modes already work in this way, it's just NetworkPolicyOnly mode misses the required PodCIDR to implement this.

When proxyLoadBalancerIPs is false the traffic hits both the veth of the Pod netns and the antrea-gw, but never reaches the eth0. I'm comparing this to my local Kind setup, with identical configuration, where the traffic reaches eth0 and connects successfully to the LB-associated endpoint on another node.

Could you share the output of iptables-save -c of that Node? If the traffic reaches antrea-gw0, it has passed Antrea's datapath, and should be handled by iptables rules installed by kube-proxy.

tnn-simon · 2024-04-22T13:09:51Z

Could you share the output of iptables-save -c of that Node? If the traffic reaches antrea-gw0, it has passed Antrea's datapath, and should be handled by iptables rules installed by kube-proxy.

Here is the output: iptables.log

IP of LB-Service: 10.34.208.4
IP of source workload: 10.244.0.61 (banana/nginx)
IP of target endpoint: 10.244.1.138 (apple/nginx)

tnqn · 2024-04-22T13:23:29Z

Thanks for the output. I figured out why it's dropped by iptables: Like Antrea, kube-proxy needs to know whether the traffic comes from local Pods to short-circuit the traffic. In this cluster, the way it detects locally-originated pod by checking the interface name prefix. If you look at the kube-proxy configmap, there should be a field like the following:

interfaceNamePrefix: "azv"

However, in Antrea NetworkPolicyOnly mode, the Pod interfaces are not connected to the host network but via an OVS bridge (so Antrea can enforce NetworkPolicy), and the locally-originated pod will arrive the host network via "antrea-gw0", instead of "azv+" expected by kube-proxy. The relevant rules are as below:

# Check -i azv+ for pod traffic
[0:0] -A KUBE-EXT-YDDIPJNZZET3UCBJ -i azv+ -m comment --comment "pod traffic for apple/nginx external destinations" -j KUBE-SVC-YDDIPJNZZET3UCBJ
# Dropped because it's not classified as pod traffic
[1:60] -A KUBE-SERVICES -d 10.34.208.4/32 -p tcp -m comment --comment "apple/nginx loadbalancer IP" -m tcp --dport 80 -j KUBE-EXT-YDDIPJNZZET3UCBJ

I think updating interfaceNamePrefix to antrea should fix it.

tnn-simon · 2024-04-22T14:19:23Z

Thanks for the clarification! Will give this some thought. Kube-proxy is managed by Azure AKS and something we will not tamper with without proper research.

Before this commit, in AntreaProxy, to respect short-circuiting, when installing flows for an external Services, an extra flow with higher priority to match traffic sourced from local (local Pods or local Node) and destined for the external Service will be installed. This is achieved by matching the local Pod CIDR obtained from the local Node object. However, when Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node object is nil, resulting in the failure of install the extra flow mentioned above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic from local Pods or the local Node is introduced to mark the traffic from local. This reg mark can be used in all traffic mode. Fix antrea-io#6244 Signed-off-by: Hongliang Liu <[email protected]>

…#6251) Before this commit, in AntreaProxy, to respect short-circuiting, when installing flows for an external Services, an extra flow with higher priority to match traffic sourced from local (local Pods or local Node) and destined for the external Service will be installed. This is achieved by matching the local Pod CIDR obtained from the local Node object. However, when Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node object is nil, resulting in the failure of install the extra flow mentioned above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic from local Pods or the local Node is introduced to mark the traffic from local. This reg mark can be used in all traffic mode. Fix #6244 Signed-off-by: Hongliang Liu <[email protected]>

…antrea-io#6251) Before this commit, in AntreaProxy, to respect short-circuiting, when installing flows for an external Services, an extra flow with higher priority to match traffic sourced from local (local Pods or local Node) and destined for the external Service will be installed. This is achieved by matching the local Pod CIDR obtained from the local Node object. However, when Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node object is nil, resulting in the failure of install the extra flow mentioned above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic from local Pods or the local Node is introduced to mark the traffic from local. This reg mark can be used in all traffic mode. Fix antrea-io#6244 Signed-off-by: Hongliang Liu <[email protected]>

…#6251) (#6268) Before this commit, in AntreaProxy, to respect short-circuiting, when installing flows for an external Services, an extra flow with higher priority to match traffic sourced from local (local Pods or local Node) and destined for the external Service will be installed. This is achieved by matching the local Pod CIDR obtained from the local Node object. However, when Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node object is nil, resulting in the failure of install the extra flow mentioned above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic from local Pods or the local Node is introduced to mark the traffic from local. This reg mark can be used in all traffic mode. Fix #6244 Signed-off-by: Hongliang Liu <[email protected]>

…#6251) (#6269) Before this commit, in AntreaProxy, to respect short-circuiting, when installing flows for an external Services, an extra flow with higher priority to match traffic sourced from local (local Pods or local Node) and destined for the external Service will be installed. This is achieved by matching the local Pod CIDR obtained from the local Node object. However, when Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node object is nil, resulting in the failure of install the extra flow mentioned above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic from local Pods or the local Node is introduced to mark the traffic from local. This reg mark can be used in all traffic mode. Fix #6244 Signed-off-by: Hongliang Liu <[email protected]>

tnn-simon added the kind/bug Categorizes issue or PR as related to a bug. label Apr 19, 2024

tnqn added area/proxy Issues or PRs related to proxy functions in Antrea reported-by/end-user Issues reported by end users. labels Apr 19, 2024

hongliangl mentioned this issue Apr 22, 2024

Fix that local traffic cannot be identified in networkPolicyOnly mode #6251

Merged

tnqn closed this as completed in #6251 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes LoadBalancer Service + Requests towards Load Balancer external IP from within same cluster = Connection refused #6244

Kubernetes LoadBalancer Service + Requests towards Load Balancer external IP from within same cluster = Connection refused #6244

tnn-simon commented Apr 19, 2024

tnqn commented Apr 19, 2024

tnn-simon commented Apr 19, 2024

tnqn commented Apr 19, 2024 •

edited

Loading

tnn-simon commented Apr 19, 2024 •

edited

Loading

tnn-simon commented Apr 19, 2024 •

edited

Loading

tnqn commented Apr 20, 2024 •

edited

Loading

tnn-simon commented Apr 21, 2024 •

edited

Loading

tnqn commented Apr 22, 2024

tnn-simon commented Apr 22, 2024

tnqn commented Apr 22, 2024

tnn-simon commented Apr 22, 2024

Kubernetes LoadBalancer Service + Requests towards Load Balancer external IP from within same cluster = Connection refused #6244

Kubernetes LoadBalancer Service + Requests towards Load Balancer external IP from within same cluster = Connection refused #6244

Comments

tnn-simon commented Apr 19, 2024

tnqn commented Apr 19, 2024

tnn-simon commented Apr 19, 2024

tnqn commented Apr 19, 2024 • edited Loading

tnn-simon commented Apr 19, 2024 • edited Loading

tnn-simon commented Apr 19, 2024 • edited Loading

tnqn commented Apr 20, 2024 • edited Loading

tnn-simon commented Apr 21, 2024 • edited Loading

tnqn commented Apr 22, 2024

tnn-simon commented Apr 22, 2024

tnqn commented Apr 22, 2024

tnn-simon commented Apr 22, 2024

tnqn commented Apr 19, 2024 •

edited

Loading

tnn-simon commented Apr 19, 2024 •

edited

Loading

tnn-simon commented Apr 19, 2024 •

edited

Loading

tnqn commented Apr 20, 2024 •

edited

Loading

tnn-simon commented Apr 21, 2024 •

edited

Loading