Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intra-Node service access when both Egress and AntreaProxy is enabled #2332

Merged
merged 1 commit into from
Jul 5, 2021

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jun 30, 2021

When Egress enabled, extra flows will be added to L3Forwarding table,
one of which make the packets to local Pods jump to
L2ForwardingCalculation directly to prevent them from entering SNAT
table. However, it would also prevent the packets' MAC from being
rewritten even when they are marked as requiring it, which leads to
local Pods cannot access local Pods via their Services' ClusterIPs.

This patch fixes it by making the SNAT skipping flow apply to packets
that don't have macRewriteMark set only, with which all traffic to local
Pods will either be forwarded to L2ForwardingCalculation directly or be
MAC rewritten first before going to L2ForwardingCalculation if they are
required to do so. It also removes a flow in L3Forwarding table that
specially handles gatewayCT related traffic, which has been taken care
of by another more generic flow in same table.

Signed-off-by: Quan Tian [email protected]

Fixes #2330

@tnqn
Copy link
Member Author

tnqn commented Jun 30, 2021

/test-all

@codecov-commenter
Copy link

codecov-commenter commented Jun 30, 2021

Codecov Report

Merging #2332 (36883f1) into main (4405aaa) will increase coverage by 0.26%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2332      +/-   ##
==========================================
+ Coverage   61.81%   62.08%   +0.26%     
==========================================
  Files         281      281              
  Lines       21738    21768      +30     
==========================================
+ Hits        13437    13514      +77     
+ Misses       6899     6849      -50     
- Partials     1402     1405       +3     
Flag Coverage Δ
e2e-tests ∅ <ø> (?)
kind-e2e-tests 52.45% <0.00%> (+0.31%) ⬆️
unit-tests 41.67% <0.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/openflow/pipeline.go 72.63% <0.00%> (+1.75%) ⬆️
pkg/agent/flowexporter/connections/connections.go 75.00% <0.00%> (-12.81%) ⬇️
...erinformation/v1beta1/clusterinformation_client.go 50.00% <0.00%> (-8.34%) ⬇️
pkg/flowaggregator/flowaggregator.go 62.92% <0.00%> (-0.89%) ⬇️
pkg/agent/flowexporter/exporter/exporter.go 79.30% <0.00%> (-0.29%) ⬇️
...agent/flowexporter/connections/deny_connections.go 92.68% <0.00%> (-0.18%) ⬇️
pkg/agent/route/route_linux.go 44.68% <0.00%> (+0.53%) ⬆️
pkg/controller/networkpolicy/mutate.go 72.41% <0.00%> (+0.64%) ⬆️
...ntroller/networkpolicy/networkpolicy_controller.go 71.06% <0.00%> (+0.96%) ⬆️
pkg/controller/networkpolicy/status_controller.go 86.45% <0.00%> (+1.29%) ⬆️
... and 9 more

@tnqn
Copy link
Member Author

tnqn commented Jun 30, 2021

/test-conformance
/test-networkpolicy
/test-windows-e2e

@tnqn tnqn force-pushed the fix-intra-node-service branch from ca9ad10 to 995bc65 Compare June 30, 2021 13:29
@tnqn
Copy link
Member Author

tnqn commented Jun 30, 2021

/test-all

@tnqn tnqn requested review from jianjuns, wenyingd and antoninbas June 30, 2021 13:31
@tnqn tnqn force-pushed the fix-intra-node-service branch from 995bc65 to 701292f Compare June 30, 2021 13:34
@tnqn
Copy link
Member Author

tnqn commented Jun 30, 2021

/test-all

// traffic from a local server Pod, but this case is also
// covered by other flows (the flows matching the local and
// remote Pod subnets) anyway.
l3FwdTable.BuildFlow(priorityNormal).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this flow?

See:

  // This flow is for the return traffic of connections to a local
  // Pod through the gateway interface (so gatewayCTMark is set).
  // For example, the return traffic of a connection from an IP
  // address (not the Node's management IP or gateway interface IP
  // which are covered by other flows already) of the local Node
  // to a local Pod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned the reason in the commit message "It also removes a flow in L3Forwarding table that specially handles gatewayCT related traffic, which has been taken care of by another more generic flow in same table." It's added in #2318.

// Rewrite the destination MAC address with the local host gateway MAC if the packet is in the reply direction and
// is marked with gatewayCTMark. This is for connections which were initiated through the gateway, to ensure that
// this reply traffic gets forwarded correctly (back to the host network namespace, through the gateway). In
// particular, it is necessary in the following 2 cases:
// 1) reply traffic for connections from a local Pod to a ClusterIP Service (when AntreaProxy is disabled and
// kube-proxy is used). In this case the destination IP address of the reply traffic is the Pod which initiated the
// connection to the Service (no SNAT). We need to make sure that these packets are sent back through the gateway
// so that the source IP can be rewritten (Service backend IP -> Service ClusterIP).
// 2) when hair-pinning is involved, i.e. connections between 2 local Pods, for which NAT is performed. This
// applies regardless of whether AntreaProxy is enabled or not, and thus also applies to Windows Nodes (for which
// AntreaProxy is enabled by default). One example is a Pod accessing a NodePort Service for which
// externalTrafficPolicy is set to Local, using the local Node's IP address.
for _, proto := range c.ipProtocols {
flows = append(flows, l3FwdTable.BuildFlow(priorityHigh).MatchProtocol(proto).
MatchCTMark(gatewayCTMark, nil).
MatchCTStateRpl(true).MatchCTStateTrk(true).
Action().SetDstMAC(localGatewayMAC).
Action().GotoTable(l3FwdTable.GetNext()).
Cookie(c.cookieAllocator.Request(category).Raw()).
Done())
}

I think currently the deleted flow will never be hit:

table=70, priority=210,ct_state=+rpl+trk,ct_mark=0x20,ip actions=mod_dl_dst:e2:e5:a4:9b:1c:b1,goto_table:80
table=70, priority=200,ip,reg0=0x2/0xffff,ct_mark=0x20 actions=goto_table:80

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did not see that! But would you update the comments for the new flow to include the removed case too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to keep the comments in this func, and saying this case is covered by another flow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to only send non reply traffic to snatTable so we don't need to have special handling for various return traffic? I see the deleted flow is for gatewayCT related return traffic, and the comment also said Service return traffic may be caught by it. I think generally no reply traffic would need to go to snatTable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but how we identify non-reply traffic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "-rpl+trk" should work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, but I did not think through all cases to be confident.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was trying to ensure there would be no regression if only forwarding "-rpl+trk" only traffic to snatTable, I didn't figure out whether it would really affect windows case. As the patch needs to be backported, to minimize the risk, I would just make minimal change to the case of local pod subnet to fix the specific issue. Maybe let's consider the windows case together later.
I have added a comment here to say the case of gatewayCT flow has been covered by other flow, please take another look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy works for me.

@tnqn tnqn force-pushed the fix-intra-node-service branch 2 times, most recently from 6030e98 to afe3423 Compare July 2, 2021 16:05
@tnqn
Copy link
Member Author

tnqn commented Jul 2, 2021

/test-all

jianjuns
jianjuns previously approved these changes Jul 2, 2021
table=31, priority=200, ip,reg0=0x4/0xffff actions=output:br-int

L3Forwarding Table: 70
// Forward the packet to L2ForwardingCalculation table if it is traffic to the local Pods.
table=70, priority=200, ip,reg0=0x2/0xffff,nw_dst=10.10.0.0/24 actions=goto_table:80
// Forward the packet to L2ForwardingCalculation table if it is traffic to the local Pods and don't require MAC rewriting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't -> doesn't

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

…bled

When Egress enabled, extra flows will be added to L3Forwarding table,
one of which make the packets to local Pods jump to
L2ForwardingCalculation directly to prevent them from entering SNAT
table. However, it would also prevent the packets' MAC from being
rewritten even when they are marked as requiring it, which leads to
local Pods cannot access local Pods via their Services' ClusterIPs.

This patch fixes it by making the SNAT skipping flow apply to packets
that don't have macRewriteMark set only, with which all traffic to local
Pods will either be forwarded to L2ForwardingCalculation directly or be
MAC rewritten first before going to L2ForwardingCalculation if they are
required to do so. It also removes a flow in L3Forwarding table that
specially handles gatewayCT related traffic, which has been taken care
of by another more generic flow in same table.

Signed-off-by: Quan Tian <[email protected]>
@tnqn
Copy link
Member Author

tnqn commented Jul 5, 2021

/test-all

@tnqn
Copy link
Member Author

tnqn commented Jul 5, 2021

/skip-e2e
which failed on "TestFlowAggregator" which has been tracked in #2283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot access ClusterIP service if the endpoint is on local Node when both AntreaProxy and Egress are enabled
3 participants