Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FlowExporter memory bloat when export process is dead #3994

Merged
merged 1 commit into from
Jul 13, 2022

Conversation

wsquan171
Copy link
Contributor

@wsquan171 wsquan171 commented Jul 12, 2022

When flow exporter is enabled, but failed to connect to downstream IPFIX collector, connections added to the priority queue inside flow exporter won't be expired and removed from queue, causing memory to bloat. Furthermore, connection store polling will remove conn from its connections map when the flow is no longer in conntrack, but not the related items in priority queue. When a new flow with same flow key is reestablished, a duplicated item with same key will be added to the queue, while the reference to the old one is lost, essentially causing memory leak.

This change addresses above issue in the following aspects:

  1. Connection store polling removes stale conn from both connections map and the priority queue. Since CS polling is independent of exporting process liveness, this allows clean up to be done without connection to collector.
  2. Fixes init of Connection.LastExportTime to be connection start time to make sure CS polling logic works properly when the exporting process is dead. Previously LastExportTime will only be filled by exporting process at the time of export, causing zero value to be compare in certain cases.
  3. Adds guards in priority queue to prevent having two Connections with same connection key in the heap.

Benchmark test BenchmarkExportConntrackConns did not show observable difference before and after change.

Fixes item 1 and 2 in #3972. Severity of item 3 is lower, which will be addressed in a later change.

@codecov
Copy link

codecov bot commented Jul 12, 2022

Codecov Report

Merging #3994 (16e4907) into main (33e175b) will increase coverage by 0.67%.
The diff coverage is 96.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3994      +/-   ##
==========================================
+ Coverage   64.39%   65.06%   +0.67%     
==========================================
  Files         295      295              
  Lines       43781    44037     +256     
==========================================
+ Hits        28191    28654     +463     
+ Misses      13304    13113     -191     
+ Partials     2286     2270      -16     
Flag Coverage Δ
e2e-tests 41.27% <0.00%> (?)
kind-e2e-tests 51.66% <96.42%> (+0.79%) ⬆️
unit-tests 44.29% <57.14%> (+0.05%) ⬆️
Impacted Files Coverage Δ
...agent/flowexporter/connections/deny_connections.go 81.72% <87.50%> (+18.50%) ⬆️
.../flowexporter/connections/conntrack_connections.go 81.42% <100.00%> (+4.46%) ⬆️
.../agent/flowexporter/priorityqueue/priorityqueue.go 93.33% <100.00%> (+0.92%) ⬆️
pkg/controller/externalippool/controller.go 84.82% <0.00%> (-6.25%) ⬇️
pkg/apiserver/storage/ram/watch.go 90.66% <0.00%> (-2.67%) ⬇️
pkg/agent/proxy/endpoints.go 78.57% <0.00%> (-2.39%) ⬇️
...lowaggregator/clickhouseclient/clickhouseclient.go 81.41% <0.00%> (-1.58%) ⬇️
pkg/agent/cniserver/ipam/antrea_ipam_controller.go 79.45% <0.00%> (-1.37%) ⬇️
pkg/controller/ipam/antrea_ipam_controller.go 75.10% <0.00%> (-1.32%) ⬇️
pkg/agent/openflow/pipeline.go 73.26% <0.00%> (+0.18%) ⬆️
... and 27 more

@wsquan171
Copy link
Contributor Author

/test-e2e

@wsquan171
Copy link
Contributor Author

/test-networkpolicy

@wsquan171
Copy link
Contributor Author

/test-conformance

@wsquan171 wsquan171 marked this pull request as ready for review July 12, 2022 21:01
@wsquan171 wsquan171 requested review from heanlan and antoninbas July 12, 2022 21:01
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only small comments, LGTM

pkg/agent/flowexporter/connections/deny_connections.go Outdated Show resolved Hide resolved
func (pq *ExpirePriorityQueue) AddItemToQueue(connKey flowexporter.ConnectionKey, conn *flowexporter.Connection) {
// AddOrOverwriteItemToQueue adds conn with connKey into the queue. If an existing item
// has the same connKey, it will be overwritten by the new item.
func (pq *ExpirePriorityQueue) AddOrOverwriteItemToQueue(connKey flowexporter.ConnectionKey, conn *flowexporter.Connection) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just call this WriteItemToQueue.

pkg/agent/flowexporter/priorityqueue/priorityqueue.go Outdated Show resolved Hide resolved
@wsquan171 wsquan171 force-pushed the fe-memory-fix branch 3 times, most recently from 267842d to 9f011e5 Compare July 12, 2022 22:12
@wsquan171
Copy link
Contributor Author

/test-conformance
/test-e2e
/test-networkpolicy

Copy link
Contributor

@heanlan heanlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Shawn. LGTM

@@ -74,6 +74,7 @@ func TestDenyConnectionStore_AddOrUpdateConn(t *testing.T) {
denyConnStore.AddOrUpdateConn(&testFlow, refTime.Add(-(time.Second * 20)), uint64(60))
expConn := testFlow
expConn.DestinationServicePortName = servicePortName.String()
expConn.LastExportTime = refTime.Add(-(time.Second * 20))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think denyConnStore.AddOrUpdateConn has already taken care of assigning value to LastExportTime?

@wsquan171
Copy link
Contributor Author

/test-conformance
/test-e2e
/test-networkpolicy

@antoninbas antoninbas merged commit 23ef6f7 into antrea-io:main Jul 13, 2022
@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Jul 13, 2022
@antoninbas
Copy link
Contributor

@wsquan171 I assume that this should be backported? Could you take care of it?

@antoninbas antoninbas added the action/backport Indicates a PR that requires backports. label Jul 13, 2022
@wsquan171 wsquan171 deleted the fe-memory-fix branch July 14, 2022 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants