Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance unit test for Flow Exporter #2129

Merged
merged 1 commit into from
Jul 1, 2021

Conversation

zyiou
Copy link
Contributor

@zyiou zyiou commented Apr 27, 2021

This commit adds performance benchmarking for Flow Exporter. It evaluates Export() function under different number of conntrack connections, dying connections, idle records, deny connections and idle deny connections. A local server will receive the records and count number. CPU and memory profile is collected and visualized using pprof.

Also from benchmarking, we discovered and removed redundant calls like GetNodeName(), which is called every time when exporting a record and ResetConnStatsWithoutLock, which unnecessarily calls NewConnectionKey each time.

With fixes and changes in go-ipfix v0.5.3 and redundant calls removal in this PR, we had following improvements:
Before:

BenchmarkExportConntrackConns   	  50	  21747542 ns/op	 3708730 B/op	   80838 allocs/op
BenchmarkExportDenyConns      	          126	  14497807 ns/op	 2522699 B/op	   53345 allocs/op
BenchmarkPoll   	                 141	   8372127 ns/op	  813834 B/op	   52738 allocs/op

After

BenchmarkExportConntrackConns   	 126	  11444126 ns/op	  459816 B/op	   18332 allocs/op
BenchmarkExportDenyConns   	     223	   6391654 ns/op	  261669 B/op	   10637 allocs/op
BenchmarkPoll   	         130	   8280194 ns/op	  818076 B/op	   52814 allocs/op

Improvement

Test ns/op B/op allocs/op
BenchmarkExportConntrackConns  reduce by 47.4%  reduce by 87.6% reduce by 77.3% 
BenchmarkExportDenyConns  reduce by 42.9%  reduce by 89.6%  reduce by 80.0%
BenchmarkPoll less than 1% diff  less than 1% diff less than 1% diff  

@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2021

Codecov Report

Merging #2129 (8f5f081) into main (6c350e0) will increase coverage by 0.24%.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2129      +/-   ##
==========================================
+ Coverage   61.77%   62.02%   +0.24%     
==========================================
  Files         276      280       +4     
  Lines       21342    21714     +372     
==========================================
+ Hits        13184    13468     +284     
- Misses       6773     6846      +73     
- Partials     1385     1400      +15     
Flag Coverage Δ
kind-e2e-tests 52.39% <81.25%> (-0.62%) ⬇️
unit-tests 41.64% <13.33%> (+0.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/flowexporter/connections/connections.go 75.00% <0.00%> (-5.49%) ⬇️
...agent/flowexporter/connections/deny_connections.go 92.68% <ø> (-0.18%) ⬇️
pkg/agent/flowexporter/exporter/exporter.go 79.76% <100.00%> (+1.09%) ⬆️
pkg/agent/flowexporter/flowrecords/flow_records.go 78.43% <100.00%> (ø)
pkg/apiserver/handlers/endpoint/handler.go 58.82% <0.00%> (-11.77%) ⬇️
pkg/agent/openflow/client.go 57.98% <0.00%> (-0.67%) ⬇️
pkg/ovs/openflow/ofctrl_bridge.go 49.65% <0.00%> (-0.35%) ⬇️
pkg/agent/openflow/packetin.go 59.25% <0.00%> (ø)
pkg/util/runtime/runtime_linux.go 0.00% <0.00%> (ø)
pkg/ovs/openflow/ofctrl_meter.go 44.00% <0.00%> (ø)
... and 8 more

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from e10ef81 to d532626 Compare May 18, 2021 22:58
@zyiou
Copy link
Contributor Author

zyiou commented May 18, 2021

summary metrics:

TOTAL_CONNECTIONS: 1062873
INIT_CONNECTIONS: 30000
NEW_CONNECTIONS/poll: 10000
DYING_CONNECTIONS/poll: 5000
POLL_INTERVAL(s):5
TEST_DURATION(s): 120
MEMORY(M): 2299

Benchmark result:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:97: 
        Summary metrics:
        TOTAL_CONNECTIONS: 1062873
        INIT_CONNECTIONS: 30000
        NEW_CONNECTIONS/poll: 10000
        DYING_CONNECTIONS/poll: 5000
        POLL_INTERVAL(s):5
        MEMORY(M): 2299
BenchmarkExport-2   	       1	128249778560 ns/op	3908352376 B/op	104729354 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	129.283s

Memory usage

go tool pprof memprofile.out
File: exporter.test
Type: alloc_space
Time: Jun 11, 2021 at 2:27am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3318.40MB, 86.55% of 3833.90MB total
Dropped 85 nodes (cum <= 19.17MB)
Showing top 10 nodes out of 61
      flat  flat%   sum%        cum   cum%
 1096.59MB 28.60% 28.60%  1245.60MB 32.49%  antrea.io/antrea/pkg/agent/flowexporter/exporter.getUpdateConnections
  565.59MB 14.75% 43.35%   776.60MB 20.26%  github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement
  526.60MB 13.74% 57.09%   526.60MB 13.74%  k8s.io/klog/v2.(*loggingT).header
  235.50MB  6.14% 63.23%  1632.20MB 42.57%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addRecordToSet
  214.54MB  5.60% 68.83%   288.05MB  7.51%  github.com/vmware/go-ipfix/pkg/exporter.(*ExportingProcess).createAndSendMsg
     211MB  5.50% 74.33%      211MB  5.50%  github.com/vmware/go-ipfix/pkg/entities.EncodeToIEDataType
  201.18MB  5.25% 79.58%   201.18MB  5.25%  antrea.io/antrea/pkg/agent/flowexporter/flowrecords.(*FlowRecords).AddOrUpdateFlowRecord
      94MB  2.45% 82.03%   111.50MB  2.91%  crypto/rand.Int
   87.50MB  2.28% 84.31%    87.50MB  2.28%  net.IP.String
   85.88MB  2.24% 86.55%   332.53MB  8.67%  antrea.io/antrea/pkg/agent/flowexporter/connections.(*ConntrackConnectionStore).AddOrUpdateConn

CPU usage

go tool pprof profile.out
File: exporter.test
Type: cpu
Time: Jun 11, 2021 at 2:25am (UTC)
Duration: 2.14mins, Total samples = 39.81s (31.04%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 23250ms, 58.40% of 39810ms total
Dropped 305 nodes (cum <= 199.05ms)
Showing top 10 nodes out of 151
      flat  flat%   sum%        cum   cum%
   14410ms 36.20% 36.20%    14680ms 36.88%  syscall.Syscall
    1650ms  4.14% 40.34%     3180ms  7.99%  runtime.scanobject
    1140ms  2.86% 43.21%     1140ms  2.86%  runtime.duffcopy
    1070ms  2.69% 45.89%     1070ms  2.69%  runtime.memmove
    1050ms  2.64% 48.53%    22160ms 55.66%  antrea.io/antrea/pkg/agent/flowexporter/flowrecords.(*FlowRecords).ForAllFlowRecordsDo
     970ms  2.44% 50.97%     2670ms  6.71%  runtime.mallocgc
     900ms  2.26% 53.23%      900ms  2.26%  aeshashbody
     730ms  1.83% 55.06%     9150ms 22.98%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addRecordToSet
     670ms  1.68% 56.74%      670ms  1.68%  syscall.RawSyscall
     660ms  1.66% 58.40%     1440ms  3.62%  runtime.pcvalue

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from d532626 to 04ffb35 Compare May 21, 2021 06:32
@zyiou zyiou marked this pull request as ready for review May 21, 2021 06:37
@zyiou zyiou requested review from srikartati and antoninbas May 21, 2021 17:30
@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from 04ffb35 to 6604dbd Compare May 21, 2021 17:42
@zyiou
Copy link
Contributor Author

zyiou commented May 25, 2021

We are having several performance improvements on go-ipfix side. Here is the performance difference before and after these two changes: vmware/go-ipfix#204, vmware/go-ipfix#206

Before change:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:97: 
        Summary metrics:
        TOTAL_CONNECTIONS: 365800
        INIT_CONNECTIONS: 30000
        NEW_CONNECTIONS/poll: 10000
        DYING_CONNECTIONS/poll: 5000
        POLL_INTERVAL(s):5
        TEST_DURATION(s):60
        MEMORY(M): 856
BenchmarkExport-2   	       1	62873606917 ns/op	3860629856 B/op	80121095 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	63.271s

After change:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:97: 
        Summary metrics:
        TOTAL_CONNECTIONS: 368334
        INIT_CONNECTIONS: 30000
        NEW_CONNECTIONS/poll: 10000
        DYING_CONNECTIONS/poll: 5000
        POLL_INTERVAL(s):5
        TEST_DURATION(s):60
        MEMORY(M): 793
BenchmarkExport-2   	       1	63058384712 ns/op	1381808352 B/op	36653986 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	63.521s

64.06% reduction in memory usage and 54.25% reduction in allocs

Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.

go.mod Outdated Show resolved Hide resolved
}
// Generate connections for Dumpflows, each round with update connections, new connections and dying connections
mockConnDumper.EXPECT().DumpFlows(testZoneFilter).Return(conns, testConnectionsCount, nil).Times(1)
for i := 0; i < 5; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does 5 mean here? Are we polling only five times during the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in the previous setting. Changed to testduration/pollinterval to make it clear.

b.Fatalf("Got error when creating a local server: %v", err)
}
prevCount := 0
ticker := time.NewTicker(testIdleTimeOut)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use timer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use timer instead of ticker, we will need to reset timer each time. Is there any special reason for timer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are sleeping for testDuration, I thought this ticker is only expected to trigger once as a timeout like the name suggests testIdleTimeOut. Looks like to update prevCount, you want this to be a periodic ticker. Could we just timeout once without worrying about how many messages are received?


const (
testPollInterval = 2 * time.Second
testConnectionsCount = 30000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the other counts, it seems like you consider 9K connections to stay the same (30K - (20K +1K)). I feel they need to be updated (at least times) or die.

Comment on lines 92 to 96
go connStore.Run(stopChan1)
go exp.Run(stopChan2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why separate channels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the poll to be finished after test duration and close exporter when exporter does not receive records for some time (testIdleTimeOut). That's why we need two separate channels to stop connection store and exporter, respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... close exporter when exporter does not receive records for some time (testIdleTimeOut)

Do you mean the collector could not receive the records? The exporter just sends the records in the test here.
Any particular reason to stop polling?
In other words, do we want to wait till we receive all messages at the collector? Could we just do the duration and close the collector/exporting processes when it finishes? We just record the number of messages at the collector before closing the process.
This way we can see within specific duration, how many messages are collected. As CPU time is optimized, we could see an increase in the number of messages at the collector. What do you think of this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yes this approach makes sense to me. As you mentioned, I was planning to wait for all messages to safely close the exporting process.

testNewConnectionsCount = 4000
testDyingConnectionsCount = 1000
testZoneFilter = uint16(65520)
testDuration = 15 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have the duration in minutes to stress the flow exporter.

testDyingConnectionsCount = 1000
testZoneFilter = uint16(65520)
testDuration = 15 * time.Second
testIdleTimeOut = 5 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this timeout for the exporter channel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

pkg/agent/flowexporter/exporter/exporter_perf_test.go Outdated Show resolved Hide resolved
@zyiou zyiou force-pushed the zyiou/performance_unit_test branch 2 times, most recently from 5ee2324 to 73004f1 Compare June 3, 2021 22:40
@zyiou
Copy link
Contributor Author

zyiou commented Jun 3, 2021

In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.

Sure. Will create a larger vm for testing.

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from 73004f1 to 6838a82 Compare June 3, 2021 22:50
@srikartati
Copy link
Member

In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.

Sure. Will create a larger vm for testing.

I just meant to change the initial number of connections through an input parameter (from a lower number to a higher number) and tracking memory utilization. Current number can be the maximum number.

@zyiou zyiou added antrea/flow-visibility/test area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent labels Jun 9, 2021
@zyiou
Copy link
Contributor Author

zyiou commented Jun 11, 2021

Test results with different number of new connections per poll:

# new conns / poll B/op allocs/op
2000 660,247,144 17,082,729
4000 882,118,920 22,760,023
6000 1,098,164,016 28,307,021
8000 1,331,826,856 34,843,965
10000 1,603,746,952 40,960,480
12000 1,825,280,104 47,103,483
14000 2,050,849,072 53,434,063

Memory consumption is roughly linear as number of new connections increases.

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from 6838a82 to b0cfa5f Compare June 11, 2021 03:36
Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing comparison by increasing the connection counts. Can that parameter be passed into the benchmark test using the go test command?

Comment on lines 216 to 261
func statMaxMemAlloc(maxAlloc *uint64, interval time.Duration, stopCh chan struct{}) {
var memStats goruntime.MemStats
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
goruntime.ReadMemStats(&memStats)
if memStats.Alloc > *maxAlloc {
*maxAlloc = memStats.Alloc
}
case <-stopCh:
return
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this memstat collection? Are these something different from what we are getting from benchmark test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't need this any more. Have it removed. Thanks!

pkg/agent/flowexporter/exporter/exporter_perf_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Overall I am a bit confused by the format of this benchmark. Because we do not include the actual conntrack polling in the benchmark at all, it's not really an "end-to-end" benchmark of the FlowExporter. Given that, I think it would have been more convenient to write distinct benchmarks (using an actual Go benchmark) for the ConntrackConnectionStore (Poll() method) and the flowExporter (export() method). Then we would have a good idea of the performance of each one and I feel like the CPU / memory data would be more accurate (as the benchmarked function would be invoked multiple times).


var count = 0

func BenchmarkExport(b *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not going to write a proper Go benchmark (which doesn't seem appropriate here anyway), I don't think we should call the function BenchmarkXXX or use testing.B. We can make it a regular test.

The benchmark function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably. The output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Updated to a more standard way for benchmarking Export() only. Thanks!

go func() {
defer conn.Close()
for {
buff := make([]byte, 272)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 272? if this is related to the size of a flow record sent by the exporter, please find a way to compute this programmatically, otherwise I imagine this will break if we add new IEs? otherwise, if this is a fixed amount that's not expected to change in the future, add a comment to explain the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Here we should use a large number to cover expansion of record length in the future. Changed to a large constant.

pkg/agent/flowexporter/exporter/exporter_perf_test.go Outdated Show resolved Hide resolved
Comment on lines 131 to 132
src := net.ParseIP(fmt.Sprintf("192.168.0.%d", randomNum))
dst := net.ParseIP(fmt.Sprintf("192.169.0.%d", randomNum))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the source IP is always equal to the destination IP, is that on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually they are a little different. Made some changes to differentiate them clearer.

pkg/agent/flowexporter/exporter/exporter_perf_test.go Outdated Show resolved Hide resolved
@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from b0cfa5f to 87f76b4 Compare June 22, 2021 02:19
@zyiou
Copy link
Contributor Author

zyiou commented Jun 22, 2021

Updated testing result:

go test -test.v -run=BenchmarkExport -test.benchmem -bench=. -memprofile memprofile.out -cpuprofile profile.out
goos: linux
goarch: amd64
pkg: antrea.io/antrea/pkg/agent/flowexporter/exporter
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkExport
    exporter_perf_test.go:61: 
        Summary:
        Number of conntrack connections: 20000
        Number of dying conntrack connections: 10000
        Number of deny connections: 20000
        Number of idle deny connections: 10000
        Total connections received: 37788
BenchmarkExport-2   	       1	1123878204 ns/op	135576880 B/op	 3458292 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	1.298s
go tool pprof memprofile.out
File: exporter.test
Type: alloc_space
Time: Jun 22, 2021 at 2:01am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 103.32MB, 83.29% of 124.05MB total
Dropped 16 nodes (cum <= 0.62MB)
Showing top 10 nodes out of 68
      flat  flat%   sum%        cum   cum%
   22.50MB 18.14% 18.14%       27MB 21.77%  github.com/vmware/go-ipfix/pkg/entities.(*dataRecord).AddInfoElement
   15.76MB 12.70% 30.84%    15.76MB 12.70%  antrea.io/antrea/pkg/agent/flowexporter/flowrecords.(*FlowRecords).AddFlowRecordToMap
   15.50MB 12.50% 43.34%    15.50MB 12.50%  k8s.io/klog/v2.(*loggingT).header
   12.04MB  9.71% 53.05%    12.04MB  9.71%  antrea.io/antrea/pkg/agent/flowexporter/connections.(*connectionStore).AddConnToMap
    8.50MB  6.86% 59.91%    33.52MB 27.02%  antrea.io/antrea/pkg/agent/flowexporter/exporter.addConnsAndGetRecords
       8MB  6.45% 66.36%    11.50MB  9.27%  github.com/vmware/go-ipfix/pkg/exporter.(*ExportingProcess).createAndSendMsg
       7MB  5.65% 72.00%    15.29MB 12.33%  antrea.io/antrea/pkg/agent/flowexporter/exporter.addDenyConns
    5.50MB  4.43% 76.44%    35.50MB 28.62%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addRecordToSet
    4.50MB  3.63% 80.06%     4.50MB  3.63%  github.com/vmware/go-ipfix/pkg/entities.EncodeToIEDataType
       4MB  3.22% 83.29%        4MB  3.22%  github.com/vmware/go-ipfix/pkg/entities.NewDataRecord (inline)
go tool pprof profile.out
File: exporter.test
Type: cpu
Time: Jun 22, 2021 at 2:01am (UTC)
Duration: 1.24s, Total samples = 1.11s (89.81%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 770ms, 69.37% of 1110ms total
Showing top 10 nodes out of 149
      flat  flat%   sum%        cum   cum%
     510ms 45.95% 45.95%      510ms 45.95%  syscall.Syscall
      60ms  5.41% 51.35%      100ms  9.01%  runtime.scanobject
      40ms  3.60% 54.95%       70ms  6.31%  runtime.mallocgc
      40ms  3.60% 58.56%       40ms  3.60%  syscall.RawSyscall
      20ms  1.80% 60.36%      160ms 14.41%  antrea.io/antrea/pkg/agent/flowexporter/exporter.(*flowExporter).addDenyConnToSet
      20ms  1.80% 62.16%       20ms  1.80%  cmpbody
      20ms  1.80% 63.96%       20ms  1.80%  runtime.heapBits.bits (inline)
      20ms  1.80% 65.77%       40ms  3.60%  runtime.makeslice
      20ms  1.80% 67.57%       20ms  1.80%  runtime.memmove
      20ms  1.80% 69.37%       20ms  1.80%  runtime.step

Benchmark result with different number of conntrack connections + deny connections input

# conns ns/op  B/op   allocs/op
20,000 + 20,000 1,123,878,204  135,576,880 3,458,292
50,000 + 50,000 2,856,738,645  329,267,544 8,690,977
100000 + 100000 5,469,929,206 656,425,016 17,396,982

}

for n := 0; n < b.N; n++ {
exp.Export()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are considering one set of connections for one export cycle. This is not exercising any code in AddOrUpdateConn method in the conntrack connection store. Is there a plan to have a test for that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddOrUpdateConn is part of Poll() function which will involve mocking dumping flows output from ovs. Do you mean we can test AddOrUpdateConn individually or cover testing for Poll() too?

Copy link
Member

@srikartati srikartati Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covering the test for Poll() sounds good as it would include AddOrUpdateConn. However, we may need multiple polls if we want to exercise update code or we should initialize the connection store with some connections before the poll.

I think we are expecting Run() method in exporter.go to be exercised in a different e2e test by running it multiple times. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covering test for Poll() --- Sure will add that test.
Benchmarking Run() in e2e tests --- Yes we should implement it in the future if you are referring to testing with actual polling, updating and exporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, let's test things in isolation (when it makes sense) in this type of benchmark

Number of deny connections: 100000
Number of idle deny connections: 10000
Total connections received: 182861
BenchmarkExport-2 1 5469929206 ns/op 656425016 B/op 17396982 allocs/op
Copy link
Member

@srikartati srikartati Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we are running the benchmark only once even after adding b.N in the test?
I see the same thing in flow aggregator tests. Not sure what is the reason.
Should we move to regular test type and track the mem or allocation stats ourselves?
@antoninbas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found if reducing the number of connections, it will run multiple times. It may be dependent on capacity of the machine the test is running on. Here is part of the result:

    exporter_perf_test.go:61: 
        Summary:
        Number of conntrack connections: 10000
        Number of dying conntrack connections: 1000
        Number of deny connections: 10000
        Number of idle deny connections: 1000
        Total connections received: 168554
    exporter_perf_test.go:61: 
        Summary:
        Number of conntrack connections: 10000
        Number of dying conntrack connections: 1000
        Number of deny connections: 10000
        Number of idle deny connections: 1000
        Total connections received: 199035
BenchmarkExport-2   	     206	   7352172 ns/op	  492372 B/op	   17727 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	7.627s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Did not know that it depends on the runtime and resource consumption of the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will stop running if one iteration takes too long.

I wonder if we should have different benchmarks for the different types of connections. Would it make sense?

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch 2 times, most recently from 8ab86a2 to 978b069 Compare June 23, 2021 01:27
@zyiou
Copy link
Contributor Author

zyiou commented Jun 23, 2021

After removing redundant calls to env.GetNodeName() (latest commit), execution time get reduces by 19.5%.

BenchmarkExport
    exporter_perf_test.go:80: 
        Summary:
        Number of conntrack connections: 100000
        Number of dying conntrack connections: 10000
        Number of deny connections: 100000
        Number of idle deny connections: 10000
        Total connections received: 186463
BenchmarkExport-2   	       1	4890531828 ns/op	604259720 B/op	16604105 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	5.258s

After change

BenchmarkExport
    exporter_perf_test.go:80: 
        Summary:
        Number of conntrack connections: 100000
        Number of dying conntrack connections: 10000
        Number of deny connections: 100000
        Number of idle deny connections: 10000
        Total connections received: 180096
BenchmarkExport-2   	       1	5096203117 ns/op	603647600 B/op	16591190 allocs/op
PASS
ok  	antrea.io/antrea/pkg/agent/flowexporter/exporter	5.448s

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from 978b069 to 89f9a1c Compare June 23, 2021 17:24
Comment on lines 73 to 76
func BenchmarkPoll(b *testing.B) {
disableLogToStderr()
setupConntrackConnStore(b)
for n := 0; n < b.N; n++ {
mockConnDumper.EXPECT().DumpFlows(uint16(openflow.CtZone)).Return(conns, testNumOfConns, nil)
connStore.Poll()
conns = generateUpdatedConns(conns)
}
b.Logf("\nSummary:\nNumber of initial connections: %d\nNumber of new connections/poll: %d\nNumber of deleted connections/poll: %d\n", testNumOfConns, testNumOfNewConns, testNumOfDeletedConns)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try this:

disableLogToStderr()
setupConntrackConnStore(b)
b.ResetTimer()
for n := 0; n < b.N; n++ {
		mockConnDumper.EXPECT().DumpFlows(uint16(openflow.CtZone)).Return(conns, testNumOfConns, nil)
		connStore.Poll()
                 b.StopTimer()
                 conns = generateUpdatedConns(conns)
                 b.StartTimer()
}

I feel like the performance of Poll() is not that great according to the results, and I wonder if connection generation accounts for a lot of measurement overhead.

Also because of how you designed the benchmark, it looks like we need to run Poll multiple times to get good results. So I would recommend reducing the number of connections if needed, to ensure that we have multiple iterations. I'm assuming that resource usage (CPU / memory) increases linearly with the number of connections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense to not counting generation of new connections into runtime and run the tests multiple iterations. Tested with different number of connections, CPU and memory are increasing linearly with number of conns:

# new conns # iterations ns/op B/op allocs/op
4000 256  4738835 ns/op  483896 B/op  27244 allocs/op
6000 206  5989825 ns/op  608971 B/op  36037 allocs/op
8000 157  7205885 ns/op  741305 B/op  45061 allocs/op
10000 123  8710284 ns/op  882714 B/op  54307 allocs/op
12000 97  11834417 ns/op 1037488 B/op  63868 allocs/op

Comment on lines 49 to 48
var (
connStore *ConntrackConnectionStore
conns []*flowexporter.Connection
mockConnDumper *connectionstest.MockConnTrackDumper
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, avoid global variables here. Wrap these in a struct and populate / return the struct during test setup.

Comment on lines 161 to 165
SourcePodNamespace: "ns1",
SourcePodName: "pod1",
DestinationPodNamespace: "ns2",
DestinationPodName: "pod2",
DestinationServiceAddress: net.ParseIP("10.0.0.1"),
DestinationServicePort: 30000,
TCPState: "SYN_SENT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit misleading IMO. I thought this information was populated by Poll itself, it should not be populated in the return value of DumpFlows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. Updated.

ctrl := gomock.NewController(b)
defer ctrl.Finish()
mockIfaceStore := interfacestoretest.NewMockInterfaceStore(ctrl)
mockIfaceStore.EXPECT().GetInterfaceByIP(gomock.Any()).Return(nil, false).AnyTimes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be more accurate to return Pod interface information? I think this can be postponed though, especially if you believe this part of Poll (fillPodInfo) is not CPU intensive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with Pod interface instead of nil. It turns out that calling to mockIfaceStore has some CPU consumption. Maybe we should cache some ip-interface in future work.

}

for n := 0; n < b.N; n++ {
exp.Export()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, let's test things in isolation (when it makes sense) in this type of benchmark

Number of deny connections: 100000
Number of idle deny connections: 10000
Total connections received: 182861
BenchmarkExport-2 1 5469929206 ns/op 656425016 B/op 17396982 allocs/op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will stop running if one iteration takes too long.

I wonder if we should have different benchmarks for the different types of connections. Would it make sense?

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch 3 times, most recently from 3bd6a67 to 8f5f081 Compare June 24, 2021 23:35
@zyiou zyiou requested review from antoninbas and srikartati June 29, 2021 17:36
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the final results, in terms of performance improvements?

}
randomNum := getRandomNum(int64(length - testNumOfDeletedConns))
for i := randomNum; i < testNumOfDeletedConns+randomNum; i++ {
updatedConns[i].DoneExport = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if updatedConns[i] is one of the new connections for this iteration? Does it still make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it is still valid because the connection can be stored and will be deleted in the next round. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok since we are not testing export here. It would have been good to add a comment to this effect.

exp, err := setupExporter(false)
if err != nil {
b.Fatalf("error when setting up exporter: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golang testing documentation recommends resetting the timer after doing an "expensive" setup: https://golang.org/pkg/testing/#hdr-Benchmarks

Can you call b.ResetTimer() in all you benchmarks, just before the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks!

Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits otherwise LGTM.
Thanks for working on this.

testNumOfConns = 20000
testNumOfDenyConns = 20000
testNumOfDyingConns = 2000
testNumOfInactiveRecords = 2000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the name consistent with the timeout.. testNumOfIdleRecords?

BenchmarkExportConntrackConns-2 75 13750074 ns/op 965550 B/op 22268 allocs/op
PASS
ok antrea.io/antrea/pkg/agent/flowexporter/exporter 5.494s
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the output that you have for the varying number of connections. It will be good to show that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@zyiou
Copy link
Contributor Author

zyiou commented Jun 30, 2021

what are the final results, in terms of performance improvements?

Improvement below, also put details in PR description

Test ns/op B/op allocs/op
BenchmarkExportConntrackConns  reduce by 47.4%  reduce by 87.6% reduce by 77.3% 
BenchmarkExportDenyConns  reduce by 42.9%  reduce by 89.6%  reduce by 80.0%
BenchmarkPoll less than 1% diff  less than 1% diff less than 1% diff  

@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from 8f5f081 to f7ecb65 Compare June 30, 2021 18:51
antoninbas
antoninbas previously approved these changes Jun 30, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
randomNum := getRandomNum(int64(length - testNumOfDeletedConns))
for i := randomNum; i < testNumOfDeletedConns+randomNum; i++ {
updatedConns[i].DoneExport = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok since we are not testing export here. It would have been good to add a comment to this effect.

This commit adds performance benchmarking for Flow Exporter. It
evaluates Export() function under different number of conntrack
connections, dying connections, idle records, deny connections and
idle deny connections. A local server will receive the records and
count number. It also evaluates Poll() for adding and updating
connections. CPU and memory profile is collected and visualized
using pprof.

Also from benchmarking, we discovered and removed redundant calls
like GetNodeName(), which is called every time when exporting a
record and ResetConnStatsWithoutLock, which unnecessarily calls
NewConnectionKey each time.

Signed-off-by: zyiou <[email protected]>
@zyiou zyiou force-pushed the zyiou/performance_unit_test branch from f7ecb65 to 30759cf Compare June 30, 2021 21:46
@zyiou
Copy link
Contributor Author

zyiou commented Jun 30, 2021

/test-all

Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-e2e

@antoninbas antoninbas merged commit e936453 into antrea-io:main Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent area/flow-visibility Issues or PRs related to flow visibility support in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants