-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change L2 forwarding flows for traffic to gateway and tunnel #1594
Conversation
Thanks for your PR. The following commands are available:
|
Codecov Report
@@ Coverage Diff @@
## master #1594 +/- ##
==========================================
- Coverage 63.31% 60.53% -2.79%
==========================================
Files 170 181 +11
Lines 14250 15562 +1312
==========================================
+ Hits 9023 9421 +398
- Misses 4292 5155 +863
- Partials 935 986 +51
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put L2ForwardCalculation logic in one table looks good to me. Just a few comments about the pipeline and the global variables.
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, conntrackCommitTable, binding.TableMissActionNext), | ||
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, IngressEntryTable, binding.TableMissActionNext), | ||
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, l2ForwardingCalcTable, binding.TableMissActionNext), | ||
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, conntrackCommitTable, binding.TableMissActionNext), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the next table of l2ForwardingCalcTable is changed to conntrackCommitTable? I feel from the pipeline's perspective its next table is still ingressEntryTable, just some particular traffic can skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But I hope to change the default flow (which is for BUM traffic) to goto conntrackCommitTable, and be dropped there.
What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, but would prefer we don't re-declare that the next table is conntrackCommitTable in l2ForwardCalcFlow
.
pkg/agent/openflow/pipeline.go
Outdated
@@ -87,6 +87,8 @@ const ( | |||
) | |||
|
|||
var ( | |||
ingressEntryTable, egressEntryTable binding.TableIDType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we still make ingressEntryTable as the next table of l2ForwardCalcTable, the global variable won't be needed.
90fc5c7
to
d74d739
Compare
@@ -785,8 +794,7 @@ func (c *client) l3FwdFlowToPod(localGatewayMAC net.HardwareAddr, podInterfaceIP | |||
flows = append(flows, flowBuilder.MatchDstIP(ip). | |||
Action().SetSrcMAC(localGatewayMAC). | |||
Action().SetDstMAC(podInterfaceMAC). | |||
Action().DecTTL(). | |||
Action().GotoTable(l3FwdTable.GetNext()). | |||
Action().GotoTable(l3DecTTLTable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyingd : I made this change, because later with SNAT policy, we can have packets from gw0 which will hit this flow as well after un-SNAT'd (such packets' dst MAC will be rewritten to vMAC in conntrackStateTable), but we want not to decrement TTL for them.
Also we can keep all routed packets (handled by l3Fwd funcs) consistent.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note that you have changed flows in L3DecTTLTable: 1) not dec TTL if the packet is from gw0; 2) dec TTL for other cases.And I guess you want to use L3DecTTLTable in both source and destination Node (originally I only want to use it on source Node). I am not sure if a packet is sending from local Pod to gateway port would also go into L3DecTTLTable (e.g. in NoEncap mode), if yes, then the logics in L3DecTTLTable might be not as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hope to use L3DecTTLTable on both source and destination port, for the reasons I mentioned above.
if a packet is sending from local Pod to gateway port would also go into L3DecTTLTable (e.g. in NoEncap mode)
I think in that case the host stack should decrement the TTL? So, in l3FwdFlowRouteToGW() I skip L3DecTTLTable().
func (c *client) l3FwdFlowRouteToGW(gwMAC net.HardwareAddr, category cookie.Category) []binding.Flow { | ||
l3FwdTable := c.pipeline[l3ForwardingTable] | ||
var flows []binding.Flow | ||
for _, ipProto := range c.ipProtocols { | ||
flows = append(flows, l3FwdTable.BuildFlow(priorityLow).MatchProtocol(ipProto). | ||
Action().SetDstMAC(gwMAC). | ||
Action().DecTTL(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyingd : do you think this case should bypass DecTTL()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the packet should be forwarded to the gateway interface, and the host should dec TTL, so we don't need TTL decrement in OpenFlow. @tnqn what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
pkg/agent/openflow/pipeline.go
Outdated
Done()) | ||
flows = append(flows, | ||
// Skip packets from the gateway interface. | ||
decTTLTable.BuildFlow(priorityHigh).MatchPriority(priorityNormal). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyingd : I assume kube-proxy packets (even they come from tunnel/remote Node) should bypass DecTTL(). What you think?
@@ -1096,7 +1094,6 @@ func (c *client) allowRulesMetricFlows(conjunctionID uint32, ingress bool) []bin | |||
metricFlow := func(isCTNew bool, protocol binding.Protocol) binding.Flow { | |||
return c.pipeline[metricTableID].BuildFlow(priorityNormal). | |||
MatchProtocol(protocol). | |||
MatchPriority(priorityNormal). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyingd : I assume MatchPriority() is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove the call for MatchPriority()
in the functions in this file (pipeline.go), but please note that we should keep it in file (network_policy.go), for CNP might call this API to reset flow priority. An original discussion is #803 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you also need to change the svg file if changing the pipeline.
pkg/agent/openflow/client.go
Outdated
if err := c.ofEntryOperations.Add(flow); err != nil { | ||
flows := []binding.Flow{ | ||
c.tunnelClassifierFlow(config.DefaultTunOFPort, cookie.Default), | ||
c.l2ForwardCalcFlow(globalVirtualMAC, config.DefaultTunOFPort, true, cookie.Default), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globalVirtualMAC
is also used in some other cases like Windows SNAT as the mark to replace the dst MAC. Do you think we could unify the usage, e.g., using this MAC for tunnel traffic, and using macRewriteMark
to rewrite dstMAC in any Node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That (using macRewriteMark to indicate MAC rewrite) makes sense to me. But I am not familiar with all other cases. Would you make the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I would make that change.
Sure. I will update ovs-pipeline.md after you agree on the design. |
If you are fine, I hope to change ovs-pipeline.md after this change - need @antoninbas to review the SVG change who will come back next week. @wenyingd |
I am fine with it. |
pkg/agent/openflow/pipeline.go
Outdated
l2FwdCalcTable := c.pipeline[l2ForwardingCalcTable] | ||
nextTable := ingressEntryTable | ||
if skipIngressRules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking is it possible to leverage the "ofport" value to decide next table? e.g., if the value is tunnel port or gateway port (I assume we have no ingress rules applied on the gw0), then skipIngressRules, otherwise go to ingressEntryTable.
Then we could not use the parameter "skipIngressRule"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is possible too, but do you think better to put the logic in l2FwdCalcTable and pipeline.go or the callers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personnally, I would prefer to put it inside the l2FwdCalcTable's logic, that would make the call simple, and easier to understand. If in the callers, reader should learn the pipeline case by case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Changed as your suggested.
func (c *client) l3FwdFlowRouteToGW(gwMAC net.HardwareAddr, category cookie.Category) []binding.Flow { | ||
l3FwdTable := c.pipeline[l3ForwardingTable] | ||
var flows []binding.Flow | ||
for _, ipProto := range c.ipProtocols { | ||
flows = append(flows, l3FwdTable.BuildFlow(priorityLow).MatchProtocol(ipProto). | ||
Action().SetDstMAC(gwMAC). | ||
Action().DecTTL(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so.
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, conntrackCommitTable, binding.TableMissActionNext), | ||
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, IngressEntryTable, binding.TableMissActionNext), | ||
l3DecTTLTable: bridge.CreateTable(l3DecTTLTable, l2ForwardingCalcTable, binding.TableMissActionNext), | ||
l2ForwardingCalcTable: bridge.CreateTable(l2ForwardingCalcTable, conntrackCommitTable, binding.TableMissActionNext), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, but would prefer we don't re-declare that the next table is conntrackCommitTable in l2ForwardCalcFlow
.
Let the packets to the tunnel port go through L2ForwardingCalcTable as well, to be consistent with other types of traffic. Redirect packets to gateway and tunnel, as well as BUM packets to conntrackCommitTable and bypass ingress NetworkPolicy enforcement. Let l3DecTTLTable decrement TTL for packets from tunnel to local Pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for addressing my comments!
/test-all |
Hi @jianjuns, One of the current PRs (#1617): https://github.com/vmware-tanzu/antrea/pull/1617/checks?check_run_id=1501453349 |
@srikartati Sure. Let me check if it is relevant to my PR. Add @tnqn for his notice too. |
@tnqn : from the test log, seems egress NP stats cannot be retrieved. However, I did not see how my PR is relevant to that. The only change seems relevant is that I removed "MatchPriority" in allow/dropRulesMetricFlows, as it seems not needed. |
@jianjuns looks like it's ingress NP stats, not egress could it be because of this: https://github.com/vmware-tanzu/antrea/blob/da021fa75c7eceb1082fbef8345826c28fbe39a3/pkg/agent/openflow/pipeline.go#L697-L699 I'm not 100% sure, but is it possible that in noEncap mode we do not create |
@antoninbas I thought ingress NP stats check passed but egress stats was not found. But you are right it might be ingress stats was not correctly updated. And your theory about DefaultTunOFPort makes sense! At least we should not decide the next table based on the ofPort value. Let me change it. |
Seems the failure is gone after changing the code: #1626. |
Let the packets to the tunnel port go through L2ForwardingCalcTable as well, to be consistent with other types of traffic. Redirect packets to gateway and tunnel, as well as BUM packets to conntrackCommitTable and bypass ingress NetworkPolicy enforcement. Let l3DecTTLTable decrement TTL for packets from tunnel to local Pods.
Let the packets to the tunnel port go through L2ForwardingCalcTable
as well, to be consistent with other types of traffic.
Redirect packets to gateway and tunnel, as well as BUM packets to
conntrackCommitTable and bypass ingress NetworkPolicy enforcement.