-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add network policy info to IPFIX flow records as part of Flow Exporter #1268
Conversation
Thanks for your PR. The following commands are available:
|
Codecov Report
@@ Coverage Diff @@
## master #1268 +/- ##
===========================================
- Coverage 54.98% 44.78% -10.21%
===========================================
Files 110 73 -37
Lines 10573 5564 -5009
===========================================
- Hits 5814 2492 -3322
+ Misses 4185 2791 -1394
+ Partials 574 281 -293
Flags with carried forward coverage won't be shown. Click here to find out more.
|
cafd4cd
to
dc114c7
Compare
dc114c7
to
92b97e0
Compare
Codecov Report
@@ Coverage Diff @@
## master #1268 +/- ##
==========================================
- Coverage 63.31% 58.92% -4.40%
==========================================
Files 170 179 +9
Lines 14250 15022 +772
==========================================
- Hits 9023 8852 -171
- Misses 4292 5199 +907
- Partials 935 971 +36
Flags with carried forward coverage won't be shown. Click here to find out more.
|
92b97e0
to
34cc9ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR LGTM. I know you need to wait for the other one merged first.
@@ -145,6 +155,39 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) { | |||
} | |||
} | |||
} | |||
|
|||
// Add network policy name and namespace from Connection label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network policy -> NetworkPolicy
ingressOfID := binary.LittleEndian.Uint32(conn.Labels[:4]) | ||
egressOfID := binary.LittleEndian.Uint32(conn.Labels[4:8]) | ||
// TODO: There's a chance that the ingressOfID is released and reused by another | ||
// NetworkPolicy rule could be modified in-between by the time the conntrack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess you need to rephrase the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this. This is not applicable after PR #1411 is merged, so removed it. I will be changing asyncDeleteInterval
to pollInterval
of Flow Exporter.
Rephrased the below comment.
// metrics. We probably need a different solution. | ||
if ingressOfID != 0 { | ||
policy := cs.ofClient.GetPolicyFromConjunction(ingressOfID) | ||
// Same as above, this may be because the NetworkPolicy is removed right after the metrics are fetched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is "metrics" or "flow record"?
34cc9ca
to
2cd0128
Compare
Thanks for your PR. The following commands are available:
|
2cd0128
to
bc0369f
Compare
I updated the patch with modifications after asyncRuleCache PR. Could you please review the patch again? |
453fc3f
to
3198f96
Compare
@@ -145,6 +156,36 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) { | |||
} | |||
} | |||
} | |||
|
|||
// Add NetworkPolicy name and namespace from Connection label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: namespace -> Namespace
Add "." at EOL.
3198f96
to
ae923cc
Compare
Thanks for the review. Will wait for IPv6 PR to merge: #1518 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple nits, otherwise LGTM. Is there no plan to test that feature?
func newIDAllocator(flowPollInterval time.Duration, allocatedIDs ...uint32) *idAllocator { | ||
allocator := &idAllocator{ | ||
availableSet: make(map[uint32]struct{}), | ||
asyncRuleCache: cache.NewStore(asyncRuleCacheKeyFunc), | ||
deleteQueue: workqueue.NewNamedDelayingQueue("async_delete_networkpolicyrule"), | ||
} | ||
|
||
// Set the asyncDeleteInterval based on flowPollInterval value. | ||
if asyncDeleteInterval < flowPollInterval { | ||
asyncDeleteInterval = flowPollInterval | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like flowPollInterval
does not really belong in this code. Maybe the parameter should be named asyncDeleteInterval
instead and the current asyncDeleteInterval
variable should be renamed to minAsyncDeleteInterval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was not comfortable too. This is a good suggestion. Changed other parts as well to keep flow poll interval separate.
@@ -145,6 +156,35 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) { | |||
} | |||
} | |||
} | |||
|
|||
// Add NetworkPolicy Name and Namespace from the connection label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Add NetworkPolicy Name and Namespace from the connection label. | |
// Retrieve NetworkPolicy Name and Namespace by using the ID stored in the connection label (for both ingress and egress directions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested exporter with multiple ingress and egress network policy using elk flow collector. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more unit tests and enhanced e2e test to test network policy support in Flow Exporter.
func newIDAllocator(flowPollInterval time.Duration, allocatedIDs ...uint32) *idAllocator { | ||
allocator := &idAllocator{ | ||
availableSet: make(map[uint32]struct{}), | ||
asyncRuleCache: cache.NewStore(asyncRuleCacheKeyFunc), | ||
deleteQueue: workqueue.NewNamedDelayingQueue("async_delete_networkpolicyrule"), | ||
} | ||
|
||
// Set the asyncDeleteInterval based on flowPollInterval value. | ||
if asyncDeleteInterval < flowPollInterval { | ||
asyncDeleteInterval = flowPollInterval | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was not comfortable too. This is a good suggestion. Changed other parts as well to keep flow poll interval separate.
b9afb1e
to
3e80300
Compare
cmd/antrea-agent/agent.go
Outdated
@@ -145,13 +145,19 @@ func run(o *Options) error { | |||
// notifying NetworkPolicyController to reconcile rules related to the | |||
// updated Pods. | |||
podUpdates := make(chan v1beta2.PodReference, 100) | |||
// We set flow poll interval as the time interval for rule deletion in the async | |||
// rule cache, which is implemented as the part of idAllocator. This is to preserve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/as the part of/as part of
cmd/antrea-agent/agent.go
Outdated
@@ -145,13 +145,19 @@ func run(o *Options) error { | |||
// notifying NetworkPolicyController to reconcile rules related to the | |||
// updated Pods. | |||
podUpdates := make(chan v1beta2.PodReference, 100) | |||
// We set flow poll interval as the time interval for rule deletion in the async | |||
// rule cache, which is implemented as the part of idAllocator. This is to preserve | |||
// the rule info for mapping in the flow records of Flow Exporter even after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rule info for populating NetworkPolicy fields in the Flow Exporter even after rule deletion
@@ -29,7 +29,8 @@ import ( | |||
) | |||
|
|||
var ( | |||
asyncDeleteInterval = time.Second * 5 | |||
minAsyncDeleteInterval = time.Second * 5 | |||
asyncDeleteInterval = time.Second * 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should really be a parameter of newIDAllocator
3e80300
to
1a7e4a9
Compare
1a7e4a9
to
ab70af1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think in the future we may want to investigate breaking up TestFlowExporter
into several test cases
t.Fatalf("Records with PodAIP and PodBIP does not have Pod Namespace") | ||
} | ||
// Check if records have both ingress and egress network policies. | ||
if !strings.Contains(record, "octetDeltaCount: 0") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a weird condition, could you add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -118,13 +128,6 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) { | |||
conn.DestinationPodName = dIface.ContainerInterfaceConfig.PodName | |||
conn.DestinationPodNamespace = dIface.ContainerInterfaceConfig.PodNamespace | |||
} | |||
// Do not export flow records of connections whose destination is local Pod and source is remote Pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so now that we generate 2 "copies", is there any confusion when we consume the flow records in ELK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's possible because of two copies of flows. Need to work with @zyiou to look into that.
I was in two minds whether to enable flow records from the destination node or disable because of flow collector implications. Need some more time. I am ok with continuing the old format of not to export to avoid confusion and take this up in a following PR that can be included in a patch release. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't export from the destination Node, then we will never have ingress network policy information unless the source & destination nodes are the same. However, unless I am mistaken this is already the case today for other fields (e.g. destination Pod name)
That wouldn't really be a candidate for a patch release, based on the impact on record consumers and the fact that this (FlowExporter) is an Alpha feature. But we could resolve this for 0.12.
I'll leave it up to you. Let's chat offline about the impact on the consumers of flow records, and especially on our reference ELK stack.
@@ -51,6 +52,56 @@ func TestFlowExporter(t *testing.T) { | |||
if err != nil { | |||
t.Fatalf("Error when getting the perftest server Pod's IP: %v", err) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's worth adding a comment somewhere, that because both source & destination Pods run on the same Node, we do not receive duplicate records from 2 different Nodes and records include "complete" information (e.g. both ingress & egress policy information)
ab70af1
to
aa17922
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue to enhance e2e tests Issue #1438. It is a good idea to break this up. Thanks.
@@ -118,13 +128,6 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) { | |||
conn.DestinationPodName = dIface.ContainerInterfaceConfig.PodName | |||
conn.DestinationPodNamespace = dIface.ContainerInterfaceConfig.PodNamespace | |||
} | |||
// Do not export flow records of connections whose destination is local Pod and source is remote Pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's possible because of two copies of flows. Need to work with @zyiou to look into that.
I was in two minds whether to enable flow records from the destination node or disable because of flow collector implications. Need some more time. I am ok with continuing the old format of not to export to avoid confusion and take this up in a following PR that can be included in a patch release. What do you think?
t.Fatalf("Records with PodAIP and PodBIP does not have Pod Namespace") | ||
} | ||
// Check if records have both ingress and egress network policies. | ||
if !strings.Contains(record, "octetDeltaCount: 0") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
11e1cc4
to
360d0b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
360d0b5
to
3b02387
Compare
/test-all |
/test-conformance |
1 similar comment
/test-conformance |
/test-conformance |
Network policy info (name and namespace), both ingress and egress policies are added. As part of this commit, we support exporting both flow records, for the case of inter-Node flows. One from the source Node, where the flow originates from and other from the destination Node of the flow, where the destination Pod resides. The reason is that they contain different NetworkPolicy info.
3b02387
to
e8c6dda
Compare
/test-all |
As there are no IPv6 related changes in this patch. Merging it without testing ipv6 tests. |
Network policy info (name and namespace), both ingress and egress
policies are added.
Async rule cache in idAllocator is configured with asyncDeleteInterval that is maximum of either 5s or configured flowPollInterval in Antrea Agent.