-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix intra-Node service access when both Egress and AntreaProxy is enabled #2332
Conversation
/test-all |
Codecov Report
@@ Coverage Diff @@
## main #2332 +/- ##
==========================================
+ Coverage 61.81% 62.08% +0.26%
==========================================
Files 281 281
Lines 21738 21768 +30
==========================================
+ Hits 13437 13514 +77
+ Misses 6899 6849 -50
- Partials 1402 1405 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/test-conformance |
ca9ad10
to
995bc65
Compare
/test-all |
995bc65
to
701292f
Compare
/test-all |
// traffic from a local server Pod, but this case is also | ||
// covered by other flows (the flows matching the local and | ||
// remote Pod subnets) anyway. | ||
l3FwdTable.BuildFlow(priorityNormal). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this flow?
See:
// This flow is for the return traffic of connections to a local // Pod through the gateway interface (so gatewayCTMark is set). // For example, the return traffic of a connection from an IP // address (not the Node's management IP or gateway interface IP // which are covered by other flows already) of the local Node // to a local Pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned the reason in the commit message "It also removes a flow in L3Forwarding table that specially handles gatewayCT related traffic, which has been taken care of by another more generic flow in same table." It's added in #2318.
antrea/pkg/agent/openflow/pipeline.go
Lines 1184 to 1204 in 4405aaa
// Rewrite the destination MAC address with the local host gateway MAC if the packet is in the reply direction and | |
// is marked with gatewayCTMark. This is for connections which were initiated through the gateway, to ensure that | |
// this reply traffic gets forwarded correctly (back to the host network namespace, through the gateway). In | |
// particular, it is necessary in the following 2 cases: | |
// 1) reply traffic for connections from a local Pod to a ClusterIP Service (when AntreaProxy is disabled and | |
// kube-proxy is used). In this case the destination IP address of the reply traffic is the Pod which initiated the | |
// connection to the Service (no SNAT). We need to make sure that these packets are sent back through the gateway | |
// so that the source IP can be rewritten (Service backend IP -> Service ClusterIP). | |
// 2) when hair-pinning is involved, i.e. connections between 2 local Pods, for which NAT is performed. This | |
// applies regardless of whether AntreaProxy is enabled or not, and thus also applies to Windows Nodes (for which | |
// AntreaProxy is enabled by default). One example is a Pod accessing a NodePort Service for which | |
// externalTrafficPolicy is set to Local, using the local Node's IP address. | |
for _, proto := range c.ipProtocols { | |
flows = append(flows, l3FwdTable.BuildFlow(priorityHigh).MatchProtocol(proto). | |
MatchCTMark(gatewayCTMark, nil). | |
MatchCTStateRpl(true).MatchCTStateTrk(true). | |
Action().SetDstMAC(localGatewayMAC). | |
Action().GotoTable(l3FwdTable.GetNext()). | |
Cookie(c.cookieAllocator.Request(category).Raw()). | |
Done()) | |
} |
I think currently the deleted flow will never be hit:
table=70, priority=210,ct_state=+rpl+trk,ct_mark=0x20,ip actions=mod_dl_dst:e2:e5:a4:9b:1c:b1,goto_table:80
table=70, priority=200,ip,reg0=0x2/0xffff,ct_mark=0x20 actions=goto_table:80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did not see that! But would you update the comments for the new flow to include the removed case too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to keep the comments in this func, and saying this case is covered by another flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to only send non reply traffic to snatTable so we don't need to have special handling for various return traffic? I see the deleted flow is for gatewayCT related return traffic, and the comment also said Service return traffic may be caught by it. I think generally no reply traffic would need to go to snatTable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but how we identify non-reply traffic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "-rpl+trk" should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me, but I did not think through all cases to be confident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was trying to ensure there would be no regression if only forwarding "-rpl+trk" only traffic to snatTable, I didn't figure out whether it would really affect windows case. As the patch needs to be backported, to minimize the risk, I would just make minimal change to the case of local pod subnet to fix the specific issue. Maybe let's consider the windows case together later.
I have added a comment here to say the case of gatewayCT flow has been covered by other flow, please take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strategy works for me.
6030e98
to
afe3423
Compare
/test-all |
docs/design/windows-design.md
Outdated
table=31, priority=200, ip,reg0=0x4/0xffff actions=output:br-int | ||
|
||
L3Forwarding Table: 70 | ||
// Forward the packet to L2ForwardingCalculation table if it is traffic to the local Pods. | ||
table=70, priority=200, ip,reg0=0x2/0xffff,nw_dst=10.10.0.0/24 actions=goto_table:80 | ||
// Forward the packet to L2ForwardingCalculation table if it is traffic to the local Pods and don't require MAC rewriting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't -> doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks
…bled When Egress enabled, extra flows will be added to L3Forwarding table, one of which make the packets to local Pods jump to L2ForwardingCalculation directly to prevent them from entering SNAT table. However, it would also prevent the packets' MAC from being rewritten even when they are marked as requiring it, which leads to local Pods cannot access local Pods via their Services' ClusterIPs. This patch fixes it by making the SNAT skipping flow apply to packets that don't have macRewriteMark set only, with which all traffic to local Pods will either be forwarded to L2ForwardingCalculation directly or be MAC rewritten first before going to L2ForwardingCalculation if they are required to do so. It also removes a flow in L3Forwarding table that specially handles gatewayCT related traffic, which has been taken care of by another more generic flow in same table. Signed-off-by: Quan Tian <[email protected]>
afe3423
to
36883f1
Compare
/test-all |
/skip-e2e |
When Egress enabled, extra flows will be added to L3Forwarding table,
one of which make the packets to local Pods jump to
L2ForwardingCalculation directly to prevent them from entering SNAT
table. However, it would also prevent the packets' MAC from being
rewritten even when they are marked as requiring it, which leads to
local Pods cannot access local Pods via their Services' ClusterIPs.
This patch fixes it by making the SNAT skipping flow apply to packets
that don't have macRewriteMark set only, with which all traffic to local
Pods will either be forwarded to L2ForwardingCalculation directly or be
MAC rewritten first before going to L2ForwardingCalculation if they are
required to do so. It also removes a flow in L3Forwarding table that
specially handles gatewayCT related traffic, which has been taken care
of by another more generic flow in same table.
Signed-off-by: Quan Tian [email protected]
Fixes #2330