Add performance unit test for Flow Exporter #2129

zyiou · 2021-04-27T17:49:48Z

This commit adds performance benchmarking for Flow Exporter. It evaluates Export() function under different number of conntrack connections, dying connections, idle records, deny connections and idle deny connections. A local server will receive the records and count number. CPU and memory profile is collected and visualized using pprof.

Also from benchmarking, we discovered and removed redundant calls like GetNodeName(), which is called every time when exporting a record and ResetConnStatsWithoutLock, which unnecessarily calls NewConnectionKey each time.

With fixes and changes in go-ipfix v0.5.3 and redundant calls removal in this PR, we had following improvements:
Before:

BenchmarkExportConntrackConns   	  50	  21747542 ns/op	 3708730 B/op	   80838 allocs/op
BenchmarkExportDenyConns      	          126	  14497807 ns/op	 2522699 B/op	   53345 allocs/op
BenchmarkPoll   	                 141	   8372127 ns/op	  813834 B/op	   52738 allocs/op

After

BenchmarkExportConntrackConns   	 126	  11444126 ns/op	  459816 B/op	   18332 allocs/op
BenchmarkExportDenyConns   	     223	   6391654 ns/op	  261669 B/op	   10637 allocs/op
BenchmarkPoll   	         130	   8280194 ns/op	  818076 B/op	   52814 allocs/op

Improvement

Test	ns/op	B/op	allocs/op
BenchmarkExportConntrackConns	reduce by 47.4%	reduce by 87.6%	reduce by 77.3%
BenchmarkExportDenyConns	reduce by 42.9%	reduce by 89.6%	reduce by 80.0%
BenchmarkPoll	less than 1% diff	less than 1% diff	less than 1% diff

codecov-commenter · 2021-04-27T18:39:00Z

Codecov Report

Merging #2129 (8f5f081) into main (6c350e0) will increase coverage by 0.24%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##             main    #2129      +/-   ##
==========================================
+ Coverage   61.77%   62.02%   +0.24%     
==========================================
  Files         276      280       +4     
  Lines       21342    21714     +372     
==========================================
+ Hits        13184    13468     +284     
- Misses       6773     6846      +73     
- Partials     1385     1400      +15

Flag	Coverage Δ
kind-e2e-tests	`52.39% <81.25%> (-0.62%)`	⬇️
unit-tests	`41.64% <13.33%> (+0.40%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/agent/flowexporter/connections/connections.go	`75.00% <0.00%> (-5.49%)`	⬇️
...agent/flowexporter/connections/deny_connections.go	`92.68% <ø> (-0.18%)`	⬇️
pkg/agent/flowexporter/exporter/exporter.go	`79.76% <100.00%> (+1.09%)`	⬆️
pkg/agent/flowexporter/flowrecords/flow_records.go	`78.43% <100.00%> (ø)`
pkg/apiserver/handlers/endpoint/handler.go	`58.82% <0.00%> (-11.77%)`	⬇️
pkg/agent/openflow/client.go	`57.98% <0.00%> (-0.67%)`	⬇️
pkg/ovs/openflow/ofctrl_bridge.go	`49.65% <0.00%> (-0.35%)`	⬇️
pkg/agent/openflow/packetin.go	`59.25% <0.00%> (ø)`
pkg/util/runtime/runtime_linux.go	`0.00% <0.00%> (ø)`
pkg/ovs/openflow/ofctrl_meter.go	`44.00% <0.00%> (ø)`
... and 8 more

zyiou · 2021-05-18T23:56:46Z

summary metrics:

TOTAL_CONNECTIONS: 1062873
INIT_CONNECTIONS: 30000
NEW_CONNECTIONS/poll: 10000
DYING_CONNECTIONS/poll: 5000
POLL_INTERVAL(s):5
TEST_DURATION(s): 120
MEMORY(M): 2299

Benchmark result:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:97: 
        Summary metrics:
        TOTAL_CONNECTIONS: 1062873
        INIT_CONNECTIONS: 30000
        NEW_CONNECTIONS/poll: 10000
        DYING_CONNECTIONS/poll: 5000
        POLL_INTERVAL(s):5
        MEMORY(M): 2299
BenchmarkExport-2   	       1	128249778560 ns/op	3908352376 B/op	104729354 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	129.283s

Memory usage

go tool pprof memprofile.out
File: exporter.test
Type: alloc_space
Time: Jun 11, 2021 at 2:27am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3318.40MB, 86.55% of 3833.90MB total
Dropped 85 nodes (cum <= 19.17MB)
Showing top 10 nodes out of 61
      flat  flat%   sum%        cum   cum%
 1096.59MB 28.60% 28.60%  1245.60MB 32.49%  antrea.io/antrea/pkg/agent/flowexporter/exporter.getUpdateConnections
  565.59MB 14.75% 43.35%   776.60MB 20.26%  github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement
  526.60MB 13.74% 57.09%   526.60MB 13.74%  k8s.io/klog/v2.(*loggingT).header
  235.50MB  6.14% 63.23%  1632.20MB 42.57%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addRecordToSet
  214.54MB  5.60% 68.83%   288.05MB  7.51%  github.com/vmware/go-ipfix/pkg/exporter.(*ExportingProcess).createAndSendMsg
     211MB  5.50% 74.33%      211MB  5.50%  github.com/vmware/go-ipfix/pkg/entities.EncodeToIEDataType
  201.18MB  5.25% 79.58%   201.18MB  5.25%  antrea.io/antrea/pkg/agent/flowexporter/flowrecords.(*FlowRecords).AddOrUpdateFlowRecord
      94MB  2.45% 82.03%   111.50MB  2.91%  crypto/rand.Int
   87.50MB  2.28% 84.31%    87.50MB  2.28%  net.IP.String
   85.88MB  2.24% 86.55%   332.53MB  8.67%  antrea.io/antrea/pkg/agent/flowexporter/connections.(*ConntrackConnectionStore).AddOrUpdateConn

CPU usage

go tool pprof profile.out
File: exporter.test
Type: cpu
Time: Jun 11, 2021 at 2:25am (UTC)
Duration: 2.14mins, Total samples = 39.81s (31.04%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 23250ms, 58.40% of 39810ms total
Dropped 305 nodes (cum <= 199.05ms)
Showing top 10 nodes out of 151
      flat  flat%   sum%        cum   cum%
   14410ms 36.20% 36.20%    14680ms 36.88%  syscall.Syscall
    1650ms  4.14% 40.34%     3180ms  7.99%  runtime.scanobject
    1140ms  2.86% 43.21%     1140ms  2.86%  runtime.duffcopy
    1070ms  2.69% 45.89%     1070ms  2.69%  runtime.memmove
    1050ms  2.64% 48.53%    22160ms 55.66%  antrea.io/antrea/pkg/agent/flowexporter/flowrecords.(*FlowRecords).ForAllFlowRecordsDo
     970ms  2.44% 50.97%     2670ms  6.71%  runtime.mallocgc
     900ms  2.26% 53.23%      900ms  2.26%  aeshashbody
     730ms  1.83% 55.06%     9150ms 22.98%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addRecordToSet
     670ms  1.68% 56.74%      670ms  1.68%  syscall.RawSyscall
     660ms  1.66% 58.40%     1440ms  3.62%  runtime.pcvalue

zyiou · 2021-05-25T22:50:00Z

We are having several performance improvements on go-ipfix side. Here is the performance difference before and after these two changes: vmware/go-ipfix#204, vmware/go-ipfix#206

Before change:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:97: 
        Summary metrics:
        TOTAL_CONNECTIONS: 365800
        INIT_CONNECTIONS: 30000
        NEW_CONNECTIONS/poll: 10000
        DYING_CONNECTIONS/poll: 5000
        POLL_INTERVAL(s):5
        TEST_DURATION(s):60
        MEMORY(M): 856
BenchmarkExport-2   	       1	62873606917 ns/op	3860629856 B/op	80121095 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	63.271s

After change:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:97: 
        Summary metrics:
        TOTAL_CONNECTIONS: 368334
        INIT_CONNECTIONS: 30000
        NEW_CONNECTIONS/poll: 10000
        DYING_CONNECTIONS/poll: 5000
        POLL_INTERVAL(s):5
        TEST_DURATION(s):60
        MEMORY(M): 793
BenchmarkExport-2   	       1	63058384712 ns/op	1381808352 B/op	36653986 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	63.521s

64.06% reduction in memory usage and 54.25% reduction in allocs

srikartati

In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.

go.mod

srikartati · 2021-05-26T23:27:33Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	}
+	// Generate connections for Dumpflows, each round with update connections, new connections and dying connections
+	mockConnDumper.EXPECT().DumpFlows(testZoneFilter).Return(conns, testConnectionsCount, nil).Times(1)
+	for i := 0; i < 5; i++ {


what does 5 mean here? Are we polling only five times during the test?

Yes in the previous setting. Changed to testduration/pollinterval to make it clear.

srikartati · 2021-05-26T23:29:29Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+		b.Fatalf("Got error when creating a local server: %v", err)
+	}
+	prevCount := 0
+	ticker := time.NewTicker(testIdleTimeOut)


use timer here?

If we use timer instead of ticker, we will need to reset timer each time. Is there any special reason for timer?

As we are sleeping for testDuration, I thought this ticker is only expected to trigger once as a timeout like the name suggests testIdleTimeOut. Looks like to update prevCount, you want this to be a periodic ticker. Could we just timeout once without worrying about how many messages are received?

srikartati · 2021-05-26T23:44:06Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+
+const (
+	testPollInterval           = 2 * time.Second
+	testConnectionsCount       = 30000


Looking at the other counts, it seems like you consider 9K connections to stay the same (30K - (20K +1K)). I feel they need to be updated (at least times) or die.

srikartati · 2021-05-26T23:51:37Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	go connStore.Run(stopChan1)
+	go exp.Run(stopChan2)


why separate channels?

I want the poll to be finished after test duration and close exporter when exporter does not receive records for some time (testIdleTimeOut). That's why we need two separate channels to stop connection store and exporter, respectively.

... close exporter when exporter does not receive records for some time (testIdleTimeOut)

Do you mean the collector could not receive the records? The exporter just sends the records in the test here.
Any particular reason to stop polling?
In other words, do we want to wait till we receive all messages at the collector? Could we just do the duration and close the collector/exporting processes when it finishes? We just record the number of messages at the collector before closing the process.
This way we can see within specific duration, how many messages are collected. As CPU time is optimized, we could see an increase in the number of messages at the collector. What do you think of this approach?

Got it. Yes this approach makes sense to me. As you mentioned, I was planning to wait for all messages to safely close the exporting process.

srikartati · 2021-05-26T23:51:47Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	testNewConnectionsCount    = 4000
+	testDyingConnectionsCount  = 1000
+	testZoneFilter             = uint16(65520)
+	testDuration               = 15 * time.Second


It's better to have the duration in minutes to stress the flow exporter.

srikartati · 2021-05-26T23:52:37Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	testDyingConnectionsCount  = 1000
+	testZoneFilter             = uint16(65520)
+	testDuration               = 15 * time.Second
+	testIdleTimeOut            = 5 * time.Second


is this timeout for the exporter channel?

pkg/agent/flowexporter/exporter/exporter_perf_test.go

zyiou · 2021-06-03T22:49:36Z

In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.

Sure. Will create a larger vm for testing.

srikartati · 2021-06-08T00:48:19Z

In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.

Sure. Will create a larger vm for testing.

I just meant to change the initial number of connections through an input parameter (from a lower number to a higher number) and tracking memory utilization. Current number can be the maximum number.

zyiou · 2021-06-11T03:36:37Z

Test results with different number of new connections per poll:

# new conns / poll	B/op	allocs/op
2000	660,247,144	17,082,729
4000	882,118,920	22,760,023
6000	1,098,164,016	28,307,021
8000	1,331,826,856	34,843,965
10000	1,603,746,952	40,960,480
12000	1,825,280,104	47,103,483
14000	2,050,849,072	53,434,063

Memory consumption is roughly linear as number of new connections increases.

srikartati

Thanks for doing comparison by increasing the connection counts. Can that parameter be passed into the benchmark test using the go test command?

srikartati · 2021-06-15T17:17:10Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+func statMaxMemAlloc(maxAlloc *uint64, interval time.Duration, stopCh chan struct{}) {
+	var memStats goruntime.MemStats
+	ticker := time.NewTicker(interval)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ticker.C:
+			goruntime.ReadMemStats(&memStats)
+			if memStats.Alloc > *maxAlloc {
+				*maxAlloc = memStats.Alloc
+			}
+		case <-stopCh:
+			return
+		}
+	}
+}


do we still need this memstat collection? Are these something different from what we are getting from benchmark test?

No we don't need this any more. Have it removed. Thanks!

pkg/agent/flowexporter/exporter/exporter_perf_test.go

antoninbas

Sorry for the late review. Overall I am a bit confused by the format of this benchmark. Because we do not include the actual conntrack polling in the benchmark at all, it's not really an "end-to-end" benchmark of the FlowExporter. Given that, I think it would have been more convenient to write distinct benchmarks (using an actual Go benchmark) for the ConntrackConnectionStore (Poll() method) and the flowExporter (export() method). Then we would have a good idea of the performance of each one and I feel like the CPU / memory data would be more accurate (as the benchmarked function would be invoked multiple times).

antoninbas · 2021-06-15T20:44:03Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+
+var count = 0
+
+func BenchmarkExport(b *testing.B) {


If we are not going to write a proper Go benchmark (which doesn't seem appropriate here anyway), I don't think we should call the function BenchmarkXXX or use testing.B. We can make it a regular test.

The benchmark function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably. The output

Makes sense to me. Updated to a more standard way for benchmarking Export() only. Thanks!

antoninbas · 2021-06-15T20:45:18Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	go func() {
+		defer conn.Close()
+		for {
+			buff := make([]byte, 272)


why 272? if this is related to the size of a flow record sent by the exporter, please find a way to compute this programmatically, otherwise I imagine this will break if we add new IEs? otherwise, if this is a fixed amount that's not expected to change in the future, add a comment to explain the value.

You are right. Here we should use a large number to cover expansion of record length in the future. Changed to a large constant.

pkg/agent/flowexporter/exporter/exporter_perf_test.go

antoninbas · 2021-06-15T20:47:14Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+		src := net.ParseIP(fmt.Sprintf("192.168.0.%d", randomNum))
+		dst := net.ParseIP(fmt.Sprintf("192.169.0.%d", randomNum))


the source IP is always equal to the destination IP, is that on purpose?

Actually they are a little different. Made some changes to differentiate them clearer.

pkg/agent/flowexporter/exporter/exporter_perf_test.go

zyiou · 2021-06-22T02:20:01Z

Updated testing result:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:61: 
        Summary:
        Number of conntrack connections: 20000
        Number of dying conntrack connections: 10000
        Number of deny connections: 20000
        Number of idle deny connections: 10000
        Total connections received: 37788
BenchmarkExport-2   	       1	1123878204 ns/op	135576880 B/op	 3458292 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	1.298s

go tool pprof memprofile.out
File: exporter.test
Type: alloc_space
Time: Jun 22, 2021 at 2:01am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 103.32MB, 83.29% of 124.05MB total
Dropped 16 nodes (cum <= 0.62MB)
Showing top 10 nodes out of 68
      flat  flat%   sum%        cum   cum%
   22.50MB 18.14% 18.14%       27MB 21.77%  github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement
   15.76MB 12.70% 30.84%    15.76MB 12.70%  antrea.io/antrea/pkg/agent/flowexporter/flowrecords.(*FlowRecords).AddFlowRecordToMap
   15.50MB 12.50% 43.34%    15.50MB 12.50%  k8s.io/klog/v2.(*loggingT).header
   12.04MB  9.71% 53.05%    12.04MB  9.71%  antrea.io/antrea/pkg/agent/flowexporter/connections.(*connectionStore).AddConnToMap
    8.50MB  6.86% 59.91%    33.52MB 27.02%  antrea.io/antrea/pkg/agent/flowexporter/exporter.addConnsAndGetRecords
       8MB  6.45% 66.36%    11.50MB  9.27%  github.com/vmware/go-ipfix/pkg/exporter.(*ExportingProcess).createAndSendMsg
       7MB  5.65% 72.00%    15.29MB 12.33%  antrea.io/antrea/pkg/agent/flowexporter/exporter.addDenyConns
    5.50MB  4.43% 76.44%    35.50MB 28.62%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addRecordToSet
    4.50MB  3.63% 80.06%     4.50MB  3.63%  github.com/vmware/go-ipfix/pkg/entities.EncodeToIEDataType
       4MB  3.22% 83.29%        4MB  3.22%  github.com/vmware/go-ipfix/pkg/entities.NewDataRecord (inline)

go tool pprof profile.out
File: exporter.test
Type: cpu
Time: Jun 22, 2021 at 2:01am (UTC)
Duration: 1.24s, Total samples = 1.11s (89.81%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 770ms, 69.37% of 1110ms total
Showing top 10 nodes out of 149
      flat  flat%   sum%        cum   cum%
     510ms 45.95% 45.95%      510ms 45.95%  syscall.Syscall
      60ms  5.41% 51.35%      100ms  9.01%  runtime.scanobject
      40ms  3.60% 54.95%       70ms  6.31%  runtime.mallocgc
      40ms  3.60% 58.56%       40ms  3.60%  syscall.RawSyscall
      20ms  1.80% 60.36%      160ms 14.41%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addDenyConnToSet
      20ms  1.80% 62.16%       20ms  1.80%  cmpbody
      20ms  1.80% 63.96%       20ms  1.80%  runtime.heapBits.bits (inline)
      20ms  1.80% 65.77%       40ms  3.60%  runtime.makeslice
      20ms  1.80% 67.57%       20ms  1.80%  runtime.memmove
      20ms  1.80% 69.37%       20ms  1.80%  runtime.step

Benchmark result with different number of conntrack connections + deny connections input

# conns	ns/op	B/op	allocs/op
20,000 + 20,000	1,123,878,204	135,576,880	3,458,292
50,000 + 50,000	2,856,738,645	329,267,544	8,690,977
100000 + 100000	5,469,929,206	656,425,016	17,396,982

srikartati · 2021-06-22T18:14:24Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	}
+
+	for n := 0; n < b.N; n++ {
+		exp.Export()


We are considering one set of connections for one export cycle. This is not exercising any code in AddOrUpdateConn method in the conntrack connection store. Is there a plan to have a test for that too?

AddOrUpdateConn is part of Poll() function which will involve mocking dumping flows output from ovs. Do you mean we can test AddOrUpdateConn individually or cover testing for Poll() too?

Covering the test for Poll() sounds good as it would include AddOrUpdateConn. However, we may need multiple polls if we want to exercise update code or we should initialize the connection store with some connections before the poll.

I think we are expecting Run() method in exporter.go to be exercised in a different e2e test by running it multiple times. Is that correct?

Covering test for Poll() --- Sure will add that test.
Benchmarking Run() in e2e tests --- Yes we should implement it in the future if you are referring to testing with actual polling, updating and exporting.

If possible, let's test things in isolation (when it makes sense) in this type of benchmark

srikartati · 2021-06-22T18:17:43Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+			Number of deny connections: 100000
+			Number of idle deny connections: 10000
+			Total connections received: 182861
+	BenchmarkExport-2   	       1	5469929206 ns/op	656425016 B/op	17396982 allocs/op


Do you know why we are running the benchmark only once even after adding b.N in the test?
I see the same thing in flow aggregator tests. Not sure what is the reason.
Should we move to regular test type and track the mem or allocation stats ourselves?
@antoninbas

I found if reducing the number of connections, it will run multiple times. It may be dependent on capacity of the machine the test is running on. Here is part of the result:

exporter_perf_test.go:61: Summary: Number of conntrack connections: 10000 Number of dying conntrack connections: 1000 Number of deny connections: 10000 Number of idle deny connections: 1000 Total connections received: 168554 exporter_perf_test.go:61: Summary: Number of conntrack connections: 10000 Number of dying conntrack connections: 1000 Number of deny connections: 10000 Number of idle deny connections: 1000 Total connections received: 199035 BenchmarkExport-2 206 7352172 ns/op 492372 B/op 17727 allocs/op PASS ok antrea.io/antrea/pkg/agent/flowexporter/exporter 7.627s

Interesting. Did not know that it depends on the runtime and resource consumption of the test.

It will stop running if one iteration takes too long.

I wonder if we should have different benchmarks for the different types of connections. Would it make sense?

zyiou · 2021-06-23T01:29:33Z

After removing redundant calls to env.GetNodeName() (latest commit), execution time get reduces by 19.5%.

BenchmarkExport
    exporter_perf_test.go:80: 
        Summary:
        Number of conntrack connections: 100000
        Number of dying conntrack connections: 10000
        Number of deny connections: 100000
        Number of idle deny connections: 10000
        Total connections received: 186463
BenchmarkExport-2   	       1	4890531828 ns/op	604259720 B/op	16604105 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	5.258s

After change

BenchmarkExport
    exporter_perf_test.go:80: 
        Summary:
        Number of conntrack connections: 100000
        Number of dying conntrack connections: 10000
        Number of deny connections: 100000
        Number of idle deny connections: 10000
        Total connections received: 180096
BenchmarkExport-2   	       1	5096203117 ns/op	603647600 B/op	16591190 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	5.448s

antoninbas · 2021-06-23T23:00:48Z

pkg/agent/flowexporter/connections/conntrack_connections_perf_test.go

+func BenchmarkPoll(b *testing.B) {
+	disableLogToStderr()
+	setupConntrackConnStore(b)
+	for n := 0; n < b.N; n++ {
+		mockConnDumper.EXPECT().DumpFlows(uint16(openflow.CtZone)).Return(conns, testNumOfConns, nil)
+		connStore.Poll()
+		conns = generateUpdatedConns(conns)
+	}
+	b.Logf("\nSummary:\nNumber of initial connections: %d\nNumber of new connections/poll: %d\nNumber of deleted connections/poll: %d\n", testNumOfConns, testNumOfNewConns, testNumOfDeletedConns)
+}


can you try this:

disableLogToStderr() setupConntrackConnStore(b) b.ResetTimer() for n := 0; n < b.N; n++ { mockConnDumper.EXPECT().DumpFlows(uint16(openflow.CtZone)).Return(conns, testNumOfConns, nil) connStore.Poll() b.StopTimer() conns = generateUpdatedConns(conns) b.StartTimer() }

I feel like the performance of Poll() is not that great according to the results, and I wonder if connection generation accounts for a lot of measurement overhead.

Also because of how you designed the benchmark, it looks like we need to run Poll multiple times to get good results. So I would recommend reducing the number of connections if needed, to ensure that we have multiple iterations. I'm assuming that resource usage (CPU / memory) increases linearly with the number of connections?

Yes, it makes sense to not counting generation of new connections into runtime and run the tests multiple iterations. Tested with different number of connections, CPU and memory are increasing linearly with number of conns:

# new conns # iterations ns/op B/op allocs/op

4000 256 4738835 ns/op 483896 B/op 27244 allocs/op

6000 206 5989825 ns/op 608971 B/op 36037 allocs/op

8000 157 7205885 ns/op 741305 B/op 45061 allocs/op

10000 123 8710284 ns/op 882714 B/op 54307 allocs/op

12000 97 11834417 ns/op 1037488 B/op 63868 allocs/op

antoninbas · 2021-06-23T23:01:31Z

pkg/agent/flowexporter/connections/conntrack_connections_perf_test.go

+var (
+	connStore      *ConntrackConnectionStore
+	conns          []*flowexporter.Connection
+	mockConnDumper *connectionstest.MockConnTrackDumper
+)


If possible, avoid global variables here. Wrap these in a struct and populate / return the struct during test setup.

antoninbas · 2021-06-23T23:06:54Z

pkg/agent/flowexporter/connections/conntrack_connections_perf_test.go

+		SourcePodNamespace:        "ns1",
+		SourcePodName:             "pod1",
+		DestinationPodNamespace:   "ns2",
+		DestinationPodName:        "pod2",
+		DestinationServiceAddress: net.ParseIP("10.0.0.1"),
+		DestinationServicePort:    30000,
+		TCPState:                  "SYN_SENT",


this is a bit misleading IMO. I thought this information was populated by Poll itself, it should not be populated in the return value of DumpFlows.

Yes you are right. Updated.

antoninbas · 2021-06-23T23:08:59Z

pkg/agent/flowexporter/connections/conntrack_connections_perf_test.go

+	ctrl := gomock.NewController(b)
+	defer ctrl.Finish()
+	mockIfaceStore := interfacestoretest.NewMockInterfaceStore(ctrl)
+	mockIfaceStore.EXPECT().GetInterfaceByIP(gomock.Any()).Return(nil, false).AnyTimes()


wouldn't it be more accurate to return Pod interface information? I think this can be postponed though, especially if you believe this part of Poll (fillPodInfo) is not CPU intensive

Replaced with Pod interface instead of nil. It turns out that calling to mockIfaceStore has some CPU consumption. Maybe we should cache some ip-interface in future work.

antoninbas · 2021-06-23T23:12:20Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	}
+
+	for n := 0; n < b.N; n++ {
+		exp.Export()


If possible, let's test things in isolation (when it makes sense) in this type of benchmark

antoninbas · 2021-06-23T23:21:56Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+			Number of deny connections: 100000
+			Number of idle deny connections: 10000
+			Total connections received: 182861
+	BenchmarkExport-2   	       1	5469929206 ns/op	656425016 B/op	17396982 allocs/op


It will stop running if one iteration takes too long.

I wonder if we should have different benchmarks for the different types of connections. Would it make sense?

antoninbas

what are the final results, in terms of performance improvements?

antoninbas · 2021-06-29T22:18:03Z

pkg/agent/flowexporter/connections/conntrack_connections_perf_test.go

+	}
+	randomNum := getRandomNum(int64(length - testNumOfDeletedConns))
+	for i := randomNum; i < testNumOfDeletedConns+randomNum; i++ {
+		updatedConns[i].DoneExport = true


what if updatedConns[i] is one of the new connections for this iteration? Does it still make sense?

In my opinion it is still valid because the connection can be stored and will be deleted in the next round. What do you think?

I think it's ok since we are not testing export here. It would have been good to add a comment to this effect.

antoninbas · 2021-06-29T22:22:05Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	exp, err := setupExporter(false)
+	if err != nil {
+		b.Fatalf("error when setting up exporter: %v", err)
+	}


The golang testing documentation recommends resetting the timer after doing an "expensive" setup: https://golang.org/pkg/testing/#hdr-Benchmarks

Can you call b.ResetTimer() in all you benchmarks, just before the loop?

Sure. Thanks!

srikartati

A couple of nits otherwise LGTM.
Thanks for working on this.

srikartati · 2021-06-29T22:25:53Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	testNumOfConns           = 20000
+	testNumOfDenyConns       = 20000
+	testNumOfDyingConns      = 2000
+	testNumOfInactiveRecords = 2000


keep the name consistent with the timeout.. testNumOfIdleRecords?

srikartati · 2021-06-29T22:30:16Z

pkg/agent/flowexporter/exporter/exporter_perf_test.go

+	BenchmarkExportConntrackConns-2   	      75	  13750074 ns/op	  965550 B/op	   22268 allocs/op
+	PASS
+	ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	5.494s
+*/


Add the output that you have for the varying number of connections. It will be good to show that.

Done. Thanks!

zyiou · 2021-06-30T17:27:49Z

what are the final results, in terms of performance improvements?

Improvement below, also put details in PR description

Test	ns/op	B/op	allocs/op
BenchmarkExportConntrackConns	reduce by 47.4%	reduce by 87.6%	reduce by 77.3%
BenchmarkExportDenyConns	reduce by 42.9%	reduce by 89.6%	reduce by 80.0%
BenchmarkPoll	less than 1% diff	less than 1% diff	less than 1% diff

antoninbas

LGTM

antoninbas · 2021-06-30T20:55:01Z

pkg/agent/flowexporter/connections/conntrack_connections_perf_test.go

+	}
+	randomNum := getRandomNum(int64(length - testNumOfDeletedConns))
+	for i := randomNum; i < testNumOfDeletedConns+randomNum; i++ {
+		updatedConns[i].DoneExport = true


I think it's ok since we are not testing export here. It would have been good to add a comment to this effect.

This commit adds performance benchmarking for Flow Exporter. It evaluates Export() function under different number of conntrack connections, dying connections, idle records, deny connections and idle deny connections. A local server will receive the records and count number. It also evaluates Poll() for adding and updating connections. CPU and memory profile is collected and visualized using pprof. Also from benchmarking, we discovered and removed redundant calls like GetNodeName(), which is called every time when exporting a record and ResetConnStatsWithoutLock, which unnecessarily calls NewConnectionKey each time. Signed-off-by: zyiou <[email protected]>

zyiou · 2021-06-30T21:47:32Z

/test-all

srikartati

LGTM

antoninbas · 2021-07-01T01:52:21Z

/test-e2e

vmwclabot added the cla-not-required label Apr 27, 2021

zyiou force-pushed the zyiou/performance_unit_test branch from e10ef81 to d532626 Compare May 18, 2021 22:58

zyiou force-pushed the zyiou/performance_unit_test branch from d532626 to 04ffb35 Compare May 21, 2021 06:32

zyiou marked this pull request as ready for review May 21, 2021 06:37

zyiou requested review from srikartati and antoninbas May 21, 2021 17:30

zyiou force-pushed the zyiou/performance_unit_test branch from 04ffb35 to 6604dbd Compare May 21, 2021 17:42

srikartati reviewed May 27, 2021

View reviewed changes

zyiou force-pushed the zyiou/performance_unit_test branch 2 times, most recently from 5ee2324 to 73004f1 Compare June 3, 2021 22:40

zyiou force-pushed the zyiou/performance_unit_test branch from 73004f1 to 6838a82 Compare June 3, 2021 22:50

zyiou added antrea/flow-visibility/test area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent labels Jun 9, 2021

antoninbas removed the antrea/flow-visibility/test label Jun 10, 2021

zyiou force-pushed the zyiou/performance_unit_test branch from 6838a82 to b0cfa5f Compare June 11, 2021 03:36

srikartati reviewed Jun 15, 2021

View reviewed changes

antoninbas reviewed Jun 15, 2021

View reviewed changes

zyiou force-pushed the zyiou/performance_unit_test branch from b0cfa5f to 87f76b4 Compare June 22, 2021 02:19

srikartati reviewed Jun 22, 2021

View reviewed changes

zyiou force-pushed the zyiou/performance_unit_test branch 2 times, most recently from 8ab86a2 to 978b069 Compare June 23, 2021 01:27

zyiou force-pushed the zyiou/performance_unit_test branch from 978b069 to 89f9a1c Compare June 23, 2021 17:24

antoninbas reviewed Jun 23, 2021

View reviewed changes

zyiou force-pushed the zyiou/performance_unit_test branch 3 times, most recently from 3bd6a67 to 8f5f081 Compare June 24, 2021 23:35

zyiou requested review from antoninbas and srikartati June 29, 2021 17:36

antoninbas reviewed Jun 29, 2021

View reviewed changes

srikartati reviewed Jun 29, 2021

View reviewed changes

zyiou force-pushed the zyiou/performance_unit_test branch from 8f5f081 to f7ecb65 Compare June 30, 2021 18:51

antoninbas previously approved these changes Jun 30, 2021

View reviewed changes

zyiou dismissed antoninbas’s stale review via 30759cf June 30, 2021 21:46

zyiou force-pushed the zyiou/performance_unit_test branch from f7ecb65 to 30759cf Compare June 30, 2021 21:46

srikartati approved these changes Jun 30, 2021

View reviewed changes

antoninbas approved these changes Jul 1, 2021

View reviewed changes

antoninbas merged commit e936453 into antrea-io:main Jul 1, 2021

		src := net.ParseIP(fmt.Sprintf("192.168.0.%d", randomNum))
		dst := net.ParseIP(fmt.Sprintf("192.169.0.%d", randomNum))

# new conns	# iterations	ns/op	B/op	allocs/op
4000	256	4738835 ns/op	483896 B/op	27244 allocs/op
6000	206	5989825 ns/op	608971 B/op	36037 allocs/op
8000	157	7205885 ns/op	741305 B/op	45061 allocs/op
10000	123	8710284 ns/op	882714 B/op	54307 allocs/op
12000	97	11834417 ns/op	1037488 B/op	63868 allocs/op

Add performance unit test for Flow Exporter #2129

Add performance unit test for Flow Exporter #2129

Conversation

zyiou commented Apr 27, 2021 • edited Loading

codecov-commenter commented Apr 27, 2021 • edited Loading

Codecov Report

zyiou commented May 18, 2021 • edited Loading

zyiou commented May 25, 2021 • edited Loading

srikartati left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou commented Jun 3, 2021

srikartati commented Jun 8, 2021

zyiou commented Jun 11, 2021 • edited Loading

srikartati left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou commented Jun 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou commented Jun 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srikartati left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou commented Jun 30, 2021

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zyiou commented Jun 30, 2021

srikartati left a comment

Choose a reason for hiding this comment

antoninbas commented Jul 1, 2021

zyiou commented Apr 27, 2021 •

edited

Loading

codecov-commenter commented Apr 27, 2021 •

edited

Loading

zyiou commented May 18, 2021 •

edited

Loading

zyiou commented May 25, 2021 •

edited

Loading

zyiou commented Jun 11, 2021 •

edited

Loading

srikartati Jun 22, 2021 •

edited

Loading

srikartati Jun 22, 2021 •

edited

Loading