-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance unit test for Flow Exporter #2129
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2129 +/- ##
==========================================
+ Coverage 61.77% 62.02% +0.24%
==========================================
Files 276 280 +4
Lines 21342 21714 +372
==========================================
+ Hits 13184 13468 +284
- Misses 6773 6846 +73
- Partials 1385 1400 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
|
e10ef81
to
d532626
Compare
summary metrics:
Benchmark result:
Memory usage
CPU usage
|
d532626
to
04ffb35
Compare
04ffb35
to
6604dbd
Compare
We are having several performance improvements on go-ipfix side. Here is the performance difference before and after these two changes: vmware/go-ipfix#204, vmware/go-ipfix#206 Before change:
After change:
64.06% reduction in memory usage and 54.25% reduction in allocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the metrics, can we make a check if our memory utilization is linear with number of new connections added. Basically, the idea is to detect the memory leaks.
} | ||
// Generate connections for Dumpflows, each round with update connections, new connections and dying connections | ||
mockConnDumper.EXPECT().DumpFlows(testZoneFilter).Return(conns, testConnectionsCount, nil).Times(1) | ||
for i := 0; i < 5; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does 5 mean here? Are we polling only five times during the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in the previous setting. Changed to testduration/pollinterval to make it clear.
b.Fatalf("Got error when creating a local server: %v", err) | ||
} | ||
prevCount := 0 | ||
ticker := time.NewTicker(testIdleTimeOut) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use timer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use timer instead of ticker, we will need to reset timer each time. Is there any special reason for timer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are sleeping for testDuration
, I thought this ticker is only expected to trigger once as a timeout like the name suggests testIdleTimeOut
. Looks like to update prevCount
, you want this to be a periodic ticker. Could we just timeout once without worrying about how many messages are received?
|
||
const ( | ||
testPollInterval = 2 * time.Second | ||
testConnectionsCount = 30000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the other counts, it seems like you consider 9K connections to stay the same (30K - (20K +1K)). I feel they need to be updated (at least times) or die.
go connStore.Run(stopChan1) | ||
go exp.Run(stopChan2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why separate channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the poll to be finished after test duration and close exporter when exporter does not receive records for some time (testIdleTimeOut
). That's why we need two separate channels to stop connection store and exporter, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... close exporter when exporter does not receive records for some time (testIdleTimeOut)
Do you mean the collector could not receive the records? The exporter just sends the records in the test here.
Any particular reason to stop polling?
In other words, do we want to wait till we receive all messages at the collector? Could we just do the duration and close the collector/exporting processes when it finishes? We just record the number of messages at the collector before closing the process.
This way we can see within specific duration, how many messages are collected. As CPU time is optimized, we could see an increase in the number of messages at the collector. What do you think of this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Yes this approach makes sense to me. As you mentioned, I was planning to wait for all messages to safely close the exporting process.
testNewConnectionsCount = 4000 | ||
testDyingConnectionsCount = 1000 | ||
testZoneFilter = uint16(65520) | ||
testDuration = 15 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to have the duration in minutes to stress the flow exporter.
testDyingConnectionsCount = 1000 | ||
testZoneFilter = uint16(65520) | ||
testDuration = 15 * time.Second | ||
testIdleTimeOut = 5 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this timeout for the exporter channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
5ee2324
to
73004f1
Compare
Sure. Will create a larger vm for testing. |
73004f1
to
6838a82
Compare
I just meant to change the initial number of connections through an input parameter (from a lower number to a higher number) and tracking memory utilization. Current number can be the maximum number. |
Test results with different number of new connections per poll:
Memory consumption is roughly linear as number of new connections increases. |
6838a82
to
b0cfa5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing comparison by increasing the connection counts. Can that parameter be passed into the benchmark test using the go test command?
func statMaxMemAlloc(maxAlloc *uint64, interval time.Duration, stopCh chan struct{}) { | ||
var memStats goruntime.MemStats | ||
ticker := time.NewTicker(interval) | ||
defer ticker.Stop() | ||
for { | ||
select { | ||
case <-ticker.C: | ||
goruntime.ReadMemStats(&memStats) | ||
if memStats.Alloc > *maxAlloc { | ||
*maxAlloc = memStats.Alloc | ||
} | ||
case <-stopCh: | ||
return | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this memstat collection? Are these something different from what we are getting from benchmark test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't need this any more. Have it removed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Overall I am a bit confused by the format of this benchmark. Because we do not include the actual conntrack polling in the benchmark at all, it's not really an "end-to-end" benchmark of the FlowExporter. Given that, I think it would have been more convenient to write distinct benchmarks (using an actual Go benchmark) for the ConntrackConnectionStore
(Poll()
method) and the flowExporter
(export()
method). Then we would have a good idea of the performance of each one and I feel like the CPU / memory data would be more accurate (as the benchmarked function would be invoked multiple times).
|
||
var count = 0 | ||
|
||
func BenchmarkExport(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not going to write a proper Go benchmark (which doesn't seem appropriate here anyway), I don't think we should call the function BenchmarkXXX
or use testing.B
. We can make it a regular test.
The benchmark function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably. The output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Updated to a more standard way for benchmarking Export()
only. Thanks!
go func() { | ||
defer conn.Close() | ||
for { | ||
buff := make([]byte, 272) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 272? if this is related to the size of a flow record sent by the exporter, please find a way to compute this programmatically, otherwise I imagine this will break if we add new IEs? otherwise, if this is a fixed amount that's not expected to change in the future, add a comment to explain the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Here we should use a large number to cover expansion of record length in the future. Changed to a large constant.
src := net.ParseIP(fmt.Sprintf("192.168.0.%d", randomNum)) | ||
dst := net.ParseIP(fmt.Sprintf("192.169.0.%d", randomNum)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the source IP is always equal to the destination IP, is that on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually they are a little different. Made some changes to differentiate them clearer.
b0cfa5f
to
87f76b4
Compare
Updated testing result:
Benchmark result with different number of conntrack connections + deny connections input
|
} | ||
|
||
for n := 0; n < b.N; n++ { | ||
exp.Export() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are considering one set of connections for one export cycle. This is not exercising any code in AddOrUpdateConn method in the conntrack connection store. Is there a plan to have a test for that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddOrUpdateConn
is part of Poll()
function which will involve mocking dumping flows output from ovs. Do you mean we can test AddOrUpdateConn
individually or cover testing for Poll()
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covering the test for Poll()
sounds good as it would include AddOrUpdateConn
. However, we may need multiple polls if we want to exercise update code or we should initialize the connection store with some connections before the poll.
I think we are expecting Run()
method in exporter.go to be exercised in a different e2e test by running it multiple times. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covering test for Poll()
--- Sure will add that test.
Benchmarking Run()
in e2e tests --- Yes we should implement it in the future if you are referring to testing with actual polling, updating and exporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, let's test things in isolation (when it makes sense) in this type of benchmark
Number of deny connections: 100000 | ||
Number of idle deny connections: 10000 | ||
Total connections received: 182861 | ||
BenchmarkExport-2 1 5469929206 ns/op 656425016 B/op 17396982 allocs/op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we are running the benchmark only once even after adding b.N in the test?
I see the same thing in flow aggregator tests. Not sure what is the reason.
Should we move to regular test type and track the mem or allocation stats ourselves?
@antoninbas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found if reducing the number of connections, it will run multiple times. It may be dependent on capacity of the machine the test is running on. Here is part of the result:
exporter_perf_test.go:61:
Summary:
Number of conntrack connections: 10000
Number of dying conntrack connections: 1000
Number of deny connections: 10000
Number of idle deny connections: 1000
Total connections received: 168554
exporter_perf_test.go:61:
Summary:
Number of conntrack connections: 10000
Number of dying conntrack connections: 1000
Number of deny connections: 10000
Number of idle deny connections: 1000
Total connections received: 199035
BenchmarkExport-2 206 7352172 ns/op 492372 B/op 17727 allocs/op
PASS
ok antrea.io/antrea/pkg/agent/flowexporter/exporter 7.627s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Did not know that it depends on the runtime and resource consumption of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will stop running if one iteration takes too long.
I wonder if we should have different benchmarks for the different types of connections. Would it make sense?
8ab86a2
to
978b069
Compare
After removing redundant calls to
After change
|
978b069
to
89f9a1c
Compare
func BenchmarkPoll(b *testing.B) { | ||
disableLogToStderr() | ||
setupConntrackConnStore(b) | ||
for n := 0; n < b.N; n++ { | ||
mockConnDumper.EXPECT().DumpFlows(uint16(openflow.CtZone)).Return(conns, testNumOfConns, nil) | ||
connStore.Poll() | ||
conns = generateUpdatedConns(conns) | ||
} | ||
b.Logf("\nSummary:\nNumber of initial connections: %d\nNumber of new connections/poll: %d\nNumber of deleted connections/poll: %d\n", testNumOfConns, testNumOfNewConns, testNumOfDeletedConns) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try this:
disableLogToStderr()
setupConntrackConnStore(b)
b.ResetTimer()
for n := 0; n < b.N; n++ {
mockConnDumper.EXPECT().DumpFlows(uint16(openflow.CtZone)).Return(conns, testNumOfConns, nil)
connStore.Poll()
b.StopTimer()
conns = generateUpdatedConns(conns)
b.StartTimer()
}
I feel like the performance of Poll()
is not that great according to the results, and I wonder if connection generation accounts for a lot of measurement overhead.
Also because of how you designed the benchmark, it looks like we need to run Poll
multiple times to get good results. So I would recommend reducing the number of connections if needed, to ensure that we have multiple iterations. I'm assuming that resource usage (CPU / memory) increases linearly with the number of connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense to not counting generation of new connections into runtime and run the tests multiple iterations. Tested with different number of connections, CPU and memory are increasing linearly with number of conns:
# new conns | # iterations | ns/op | B/op | allocs/op |
---|---|---|---|---|
4000 | 256 | 4738835 ns/op | 483896 B/op | 27244 allocs/op |
6000 | 206 | 5989825 ns/op | 608971 B/op | 36037 allocs/op |
8000 | 157 | 7205885 ns/op | 741305 B/op | 45061 allocs/op |
10000 | 123 | 8710284 ns/op | 882714 B/op | 54307 allocs/op |
12000 | 97 | 11834417 ns/op | 1037488 B/op | 63868 allocs/op |
var ( | ||
connStore *ConntrackConnectionStore | ||
conns []*flowexporter.Connection | ||
mockConnDumper *connectionstest.MockConnTrackDumper | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, avoid global variables here. Wrap these in a struct and populate / return the struct during test setup.
SourcePodNamespace: "ns1", | ||
SourcePodName: "pod1", | ||
DestinationPodNamespace: "ns2", | ||
DestinationPodName: "pod2", | ||
DestinationServiceAddress: net.ParseIP("10.0.0.1"), | ||
DestinationServicePort: 30000, | ||
TCPState: "SYN_SENT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit misleading IMO. I thought this information was populated by Poll
itself, it should not be populated in the return value of DumpFlows
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right. Updated.
ctrl := gomock.NewController(b) | ||
defer ctrl.Finish() | ||
mockIfaceStore := interfacestoretest.NewMockInterfaceStore(ctrl) | ||
mockIfaceStore.EXPECT().GetInterfaceByIP(gomock.Any()).Return(nil, false).AnyTimes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be more accurate to return Pod interface information? I think this can be postponed though, especially if you believe this part of Poll
(fillPodInfo
) is not CPU intensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with Pod interface instead of nil. It turns out that calling to mockIfaceStore
has some CPU consumption. Maybe we should cache some ip-interface in future work.
} | ||
|
||
for n := 0; n < b.N; n++ { | ||
exp.Export() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, let's test things in isolation (when it makes sense) in this type of benchmark
Number of deny connections: 100000 | ||
Number of idle deny connections: 10000 | ||
Total connections received: 182861 | ||
BenchmarkExport-2 1 5469929206 ns/op 656425016 B/op 17396982 allocs/op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will stop running if one iteration takes too long.
I wonder if we should have different benchmarks for the different types of connections. Would it make sense?
3bd6a67
to
8f5f081
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the final results, in terms of performance improvements?
} | ||
randomNum := getRandomNum(int64(length - testNumOfDeletedConns)) | ||
for i := randomNum; i < testNumOfDeletedConns+randomNum; i++ { | ||
updatedConns[i].DoneExport = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if updatedConns[i]
is one of the new connections for this iteration? Does it still make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion it is still valid because the connection can be stored and will be deleted in the next round. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok since we are not testing export here. It would have been good to add a comment to this effect.
exp, err := setupExporter(false) | ||
if err != nil { | ||
b.Fatalf("error when setting up exporter: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The golang testing documentation recommends resetting the timer after doing an "expensive" setup: https://golang.org/pkg/testing/#hdr-Benchmarks
Can you call b.ResetTimer()
in all you benchmarks, just before the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits otherwise LGTM.
Thanks for working on this.
testNumOfConns = 20000 | ||
testNumOfDenyConns = 20000 | ||
testNumOfDyingConns = 2000 | ||
testNumOfInactiveRecords = 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the name consistent with the timeout.. testNumOfIdleRecords
?
BenchmarkExportConntrackConns-2 75 13750074 ns/op 965550 B/op 22268 allocs/op | ||
PASS | ||
ok antrea.io/antrea/pkg/agent/flowexporter/exporter 5.494s | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the output that you have for the varying number of connections. It will be good to show that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
Improvement below, also put details in PR description
|
8f5f081
to
f7ecb65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
randomNum := getRandomNum(int64(length - testNumOfDeletedConns)) | ||
for i := randomNum; i < testNumOfDeletedConns+randomNum; i++ { | ||
updatedConns[i].DoneExport = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok since we are not testing export here. It would have been good to add a comment to this effect.
This commit adds performance benchmarking for Flow Exporter. It evaluates Export() function under different number of conntrack connections, dying connections, idle records, deny connections and idle deny connections. A local server will receive the records and count number. It also evaluates Poll() for adding and updating connections. CPU and memory profile is collected and visualized using pprof. Also from benchmarking, we discovered and removed redundant calls like GetNodeName(), which is called every time when exporting a record and ResetConnStatsWithoutLock, which unnecessarily calls NewConnectionKey each time. Signed-off-by: zyiou <[email protected]>
f7ecb65
to
30759cf
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-e2e |
This commit adds performance benchmarking for Flow Exporter. It evaluates Export() function under different number of conntrack connections, dying connections, idle records, deny connections and idle deny connections. A local server will receive the records and count number. CPU and memory profile is collected and visualized using pprof.
Also from benchmarking, we discovered and removed redundant calls like
GetNodeName()
, which is called every time when exporting a record andResetConnStatsWithoutLock
, which unnecessarily callsNewConnectionKey
each time.With fixes and changes in go-ipfix v0.5.3 and redundant calls removal in this PR, we had following improvements:
Before:
After
Improvement