Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes LoadBalancer Service + Requests towards Load Balancer external IP from within same cluster = Connection refused #6244

Closed
tnn-simon opened this issue Apr 19, 2024 · 11 comments · Fixed by #6251
Labels
area/proxy Issues or PRs related to proxy functions in Antrea kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.

Comments

@tnn-simon
Copy link

Describe the bug
Traffic does not reach the external IP associated with v1.Service (type: LoadBalancer) when source workloads and the v1.Service are hosted in same cluster, unless the source workloads are running on the same nodes as the endpoints associated with the v1.Service. Despite running Antrea in networkPolicyOnly mode.

To Reproduce

  1. Provision AKS cluster with network plugin azure and network policy none.
  2. Install Antrea in networkPolicyOnly mode.
  3. Deploy an application exposed through a v1.Service of type LoadBalancer, with externalTrafficPolicy local and the following annotation: "service.beta.kubernetes.io/azure-load-balancer-internal: "true".
  4. Deploy a new application to another node than the application from the previous step. No need for exposing it. The image should contain a HTTP client (e.g curl).
  5. Open a shell into the application deployed in step 4 and try to call the service of step 3 through its .status.loadBalancer.ingress.ip[0]. Add verbose flag if using curl.
  6. You should now receive an immediate error, e.g connection refused.

Expected
Expected Antrea not to drop traffic destined for an IP outside both the Pod CIDR and the Service CIDR (ClusterIP) of the cluster, despite of the externalTrafficPolicy set to local.

Actual behavior
Traffic gets dropped when the node of the source workload does not host any endpoints associated with the target service.

Versions:
Antrea version 1.15.1
Kubernetes version: 1.28.5
Container runtime: 1.7.14-1
Linux kernel version: 5.15.0-1059-azure

Additional context

@tnn-simon tnn-simon added the kind/bug Categorizes issue or PR as related to a bug. label Apr 19, 2024
@tnqn
Copy link
Member

tnqn commented Apr 19, 2024

@tnn-simon thank you for the report.

As we discussed in slack, the problem here is Antrea relies on PodCIDR of Node objects to differentiate whether the traffic comes from local Pods or not, but AKS doesn't set it in this mode. So the solution we may apply is to use a more generic way to differentiate local Pod generated traffic. But I want to mention that, with the solution, Pod to LoadBalancerIP traffic will be processed by Antrea and DNATed to a random endpoint of the Service in the cluster, instead of going to the external load balancer (if there is one) to perform load balancing, will it work for you?

@tnqn tnqn added area/proxy Issues or PRs related to proxy functions in Antrea reported-by/end-user Issues reported by end users. labels Apr 19, 2024
@tnn-simon
Copy link
Author

@tnqn: Sounds like something that could work, but I'd still have to test it to be completely certain. What is the rationale behind not sending the traffic through the external load balancer?

@tnqn
Copy link
Member

tnqn commented Apr 19, 2024

It's the default behavior we assume most users would expect as it's the shorter path, and it's also the same behavior how kube-proxy would handle it. It's configurable and can be disabled by setting proxyLoadBalancerIPs to false, mainly useful for users who want the external load balancer to do some extra work like TLS termination.

When you tried proxyLoadBalancerIPs=false, actually kube-proxy took over the traffic (if you keep its default configuration), and it wouldn't reach the external load balancer either. However, I'm not sure why it doesn't get to the endpoint eventually at the moment, @hongliangl tried to a similar topology but couldn't reproduce it.

@tnn-simon
Copy link
Author

tnn-simon commented Apr 19, 2024

If I disable Antrea and reboot the nodes, the issues disappear - just cross-checking. I'm testing using Helm (chart version 1.15.1) with these values:

trafficEncapMode: "networkPolicyOnly"
nodeIPAM:
  enable: false
antreaProxy:
  proxyLoadBalancerIPs: false

When running with proxyLoadBalancerIPs: false, the flow towards the load balancer IP does not show up in the flow-dump anymore (table=ServiceLB), but this entry is new:
cookie=0x6000000000000, duration=5138.787s, table=ServiceLB, n_packets=8, n_bytes=592, idle_age=2899, priority=0 actions=goto_table:EndpointDNAT

Inspected the dump of table=EndpointDNAT where I found this entry, corresponding to the endpoint associated with the target service:
cookie=0x6030000000000, duration=5146.591s, table=EndpointDNAT, n_packets=0, n_bytes=0, idle_age=5146, priority=200,tcp,reg3=0xaf40103,reg4=0x20050/0x7ffff actions=ct(commit,table=AntreaPolicyEgressRule,zone=65520,nat(dst=10.244.1.3:80),exec(set_field:0x10/0x10->ct_mark,move:NXM_NX_REG0[0..3]->NXM_NX_CT_MARK[0..3]))

Not sure what this indicates. My cluster does not have any network policy resources.

I'll attach a complete dump (ovs-ofctl dump-flows br-int).
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.0.0.0/16
Source Pod IP: 10.244.0.244
Target Pod IP: 10.244.1.3
Target Service ILB IP: 10.34.208.4

dump.log

@tnn-simon
Copy link
Author

tnn-simon commented Apr 19, 2024

@tnqn: I'm attempting to get my head around this issue. Just out of curiosity, can you point to the source showing that kube-proxy does influence the traffic path for egress traffic destined for IPs assigned in .status of v1.Service of type LoadBalancer?

As you mentioned, one will typically send traffic through the external load-balancer to piggyback on features implemented by the load-balancer - in our case cluster-independent DNS. If one doesn't want the traffic to flow through the external load-balancer, wouldn't one use the ClusterIP of the service instead(?). I also struggle to understand why externalTrafficPolicy has significance here, I thought this field was intended for traffic arriving through the NodePort.

@tnqn
Copy link
Member

tnqn commented Apr 20, 2024

Just out of curiosity, can you point to the source showing that kube-proxy does influence the traffic path for egress traffic destined for IPs assigned in .status of v1.Service of type LoadBalancer?

You can find how it handles "locally-originated pod -> external destination" by combining the following code:
https://github.com/kubernetes/kubernetes/blob/7f68d014e5d785472ba148c983c9d0abc6df9a36/pkg/proxy/iptables/proxier.go#L1245-L1253
https://github.com/kubernetes/kubernetes/blob/7f68d014e5d785472ba148c983c9d0abc6df9a36/pkg/proxy/iptables/proxier.go#L1110-L1119

one will typically send traffic through the external load-balancer to piggyback on features implemented by the load-balancer - in our case cluster-independent DNS. If one doesn't want the traffic to flow through the external load-balancer, wouldn't one use the ClusterIP of the service instead(?).

My understanding is, the handling is to assume users' applications are provided an unified address (the LB IP) for a service regardless of the location of the clients, but the intention is always the same Endpoints, so it short-circuits the traffic as it will come back anyway. However, since K8s 1.29, a feature called LoadBalancerIPMode was added to change the behavior, users can set .status.loadBalancer.ingress.ipMode to Proxy to prevent kube-proxy from handling the traffic towards LB IP, you can get more details from https://kubernetes.io/blog/2023/12/18/kubernetes-1-29-feature-loadbalancer-ip-mode-alpha/.

I also struggle to understand why externalTrafficPolicy has significance here, I thought this field was intended for traffic arriving through the NodePort.

externalTrafficPolicy and internalTrafficPolicy applies to service traffic based on the destination address. You can find the following explainations in their API spec:

  • ServiceExternalTrafficPolicy describes how nodes distribute service traffic they receive on one of the Service's "externally-facing" addresses (NodePorts, ExternalIPs, and LoadBalancer IPs).
  • ServiceInternalTrafficPolicy describes how nodes distribute service traffic they receive on the ClusterIP.

If I disable Antrea and reboot the nodes, the issues disappear - just cross-checking.

I saw the CIDR of Pod IPs have changed, is this the same cluster? By disabling Antrea, you mean removing antrea from the cluster or disabling Antrea's proxyLoadBalancerIPs? And the dump.log is collected when it works or not?

@tnn-simon
Copy link
Author

tnn-simon commented Apr 21, 2024

I saw the CIDR of Pod IPs have changed, is this the same cluster? By disabling Antrea, you mean removing antrea from the cluster or disabling Antrea's proxyLoadBalancerIPs? And the dump.log is collected when it works or not?

This is a different cluster from the one referred to in Slack. This is based on the Azure CNI Overlay network plugin, but the issue is the same. Created it to get a dedicated test environment for this issue. By disabling Antrea, I mean removing Antrea from the cluster and reboot the nodes. The dump.log was collected when running Antrea with proxyLoadBalancerIPs: false, and this does not work - the connection times out.

Thank you so much for sharing your insights on the kube-proxy behaviour. I can confirm that traffic originated in the same cluster does not flow through the external load-balancer regardless of if Antrea is installed or not.

Regarding your first suggestion:

But I want to mention that, with the solution, Pod to LoadBalancerIP traffic will be processed by Antrea and DNATed to a random endpoint of the Service in the cluster, instead of going to the external load balancer (if there is one) to perform load balancing, will it work for you?

I think this sounds even better now that I know more about kube-proxy. Guess the suggested solution is to ignore the externalTrafficPolicy - which I guess is the current behaviour of kube-proxy for traffic originated internally (from our experience).

Still digging for the root cause of the timeouts. Our experience so far is as follows:

  • proxyLoadBalancerIPs: true, externalTrafficPolicy: Local = Connection refused
  • proxyLoadBalancerIPs: false, externalTrafficPolicy: Local = Connection timeout

When proxyLoadBalancerIPs is false the traffic hits both the veth of the Pod netns and the antrea-gw, but never reaches the eth0. I'm comparing this to my local Kind setup, with identical configuration, where the traffic reaches eth0 and connects successfully to the LB-associated endpoint on another node.

hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 22, 2024
Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
@tnqn
Copy link
Member

tnqn commented Apr 22, 2024

I think this sounds even better now that I know more about kube-proxy. Guess the suggested solution is to ignore the externalTrafficPolicy - which I guess is the current behaviour of kube-proxy for traffic originated internally (from our experience).

Yes, Antrea is implemented to be identical to kube-proxy for most Service features. Other modes already work in this way, it's just NetworkPolicyOnly mode misses the required PodCIDR to implement this.

When proxyLoadBalancerIPs is false the traffic hits both the veth of the Pod netns and the antrea-gw, but never reaches the eth0. I'm comparing this to my local Kind setup, with identical configuration, where the traffic reaches eth0 and connects successfully to the LB-associated endpoint on another node.

Could you share the output of iptables-save -c of that Node? If the traffic reaches antrea-gw0, it has passed Antrea's datapath, and should be handled by iptables rules installed by kube-proxy.

@tnn-simon
Copy link
Author

Could you share the output of iptables-save -c of that Node? If the traffic reaches antrea-gw0, it has passed Antrea's datapath, and should be handled by iptables rules installed by kube-proxy.

Here is the output: iptables.log

IP of LB-Service: 10.34.208.4
IP of source workload: 10.244.0.61 (banana/nginx)
IP of target endpoint: 10.244.1.138 (apple/nginx)

@tnqn
Copy link
Member

tnqn commented Apr 22, 2024

Thanks for the output. I figured out why it's dropped by iptables: Like Antrea, kube-proxy needs to know whether the traffic comes from local Pods to short-circuit the traffic. In this cluster, the way it detects locally-originated pod by checking the interface name prefix. If you look at the kube-proxy configmap, there should be a field like the following:

interfaceNamePrefix: "azv"

However, in Antrea NetworkPolicyOnly mode, the Pod interfaces are not connected to the host network but via an OVS bridge (so Antrea can enforce NetworkPolicy), and the locally-originated pod will arrive the host network via "antrea-gw0", instead of "azv+" expected by kube-proxy. The relevant rules are as below:

# Check -i azv+ for pod traffic
[0:0] -A KUBE-EXT-YDDIPJNZZET3UCBJ -i azv+ -m comment --comment "pod traffic for apple/nginx external destinations" -j KUBE-SVC-YDDIPJNZZET3UCBJ
# Dropped because it's not classified as pod traffic
[1:60] -A KUBE-SERVICES -d 10.34.208.4/32 -p tcp -m comment --comment "apple/nginx loadbalancer IP" -m tcp --dport 80 -j KUBE-EXT-YDDIPJNZZET3UCBJ

I think updating interfaceNamePrefix to antrea should fix it.

@tnn-simon
Copy link
Author

Thanks for the clarification! Will give this some thought. Kube-proxy is managed by Azure AKS and something we will not tamper with without proper research.

hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 23, 2024
Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 23, 2024
Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 23, 2024
Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
tnqn pushed a commit that referenced this issue Apr 24, 2024
…#6251)

Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix #6244

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 28, 2024
…antrea-io#6251)

Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 28, 2024
…antrea-io#6251)

Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
hongliangl added a commit to hongliangl/antrea that referenced this issue Apr 28, 2024
…antrea-io#6251)

Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix antrea-io#6244

Signed-off-by: Hongliang Liu <[email protected]>
antoninbas pushed a commit that referenced this issue Apr 30, 2024
…#6251) (#6268)

Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix #6244

Signed-off-by: Hongliang Liu <[email protected]>
antoninbas pushed a commit that referenced this issue May 1, 2024
…#6251) (#6269)

Before this commit, in AntreaProxy, to respect short-circuiting, when
installing flows for an external Services, an extra flow with higher priority
to match traffic sourced from local (local Pods or local Node) and destined
for the external Service will be installed. This is achieved by matching
the local Pod CIDR obtained from the local Node object. However, when
Antrea is deployed in networkPolicyOnly mode, the Pod CIDR in the local Node
object is nil, resulting in the failure of install the extra flow mentioned
above. To fix the issue, a new reg mark `FromLocalRegMark` identifying traffic
from local Pods or the local Node is introduced to mark the traffic from
local. This reg mark can be used in all traffic mode.

Fix #6244

Signed-off-by: Hongliang Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Issues or PRs related to proxy functions in Antrea kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants