BatchLookupAndDelete is much slower than iterator #1078

msherif1234 · 2023-06-26T17:47:17Z

Describe the bug
It was expected to see much better performance when using BatchLookupAndDelete compared to regular iterate and delete loop

uname -r
6.1.5-100.fc36.x86_64

To Reproduce
https://github.com/msherif1234/ebpf/tree/batch_performance

go test -exec sudo -bench=BenchmarkHashDelete -count 5 -run=^#
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf
cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkHashDelete/MapIteratorDelete-12                  787828              1506 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  795927              1500 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  703017              1512 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  663597              1559 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  628033              1665 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             78336             16656 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             67497             16563 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             72202             17565 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             66657             16030 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             77671             15628 ns/op           32846 B/op          8 allocs/op
PASS
ok      github.com/cilium/ebpf  18.168s

Expected behavior
I was expecting much better cpu and memory when using Batch apis

brycekahle · 2023-06-26T19:24:55Z

Can you add b.ResetTimer() right before your loops? I wonder if the slice allocation is influencing the timing?

msherif1234 · 2023-06-26T19:35:14Z

diff --git a/map_test.go b/map_test.go
index 6908077..5a95b93 100644
--- a/map_test.go
+++ b/map_test.go
@@ -2007,7 +2007,7 @@ func BenchmarkHashDelete(b *testing.B) {
                var k, v uint64
 
                b.ReportAllocs()
-
+               b.ResetTimer()
                for i := 0; i < b.N; i++ {
                        iter := m.Iterate()
                        for iter.Next(&k, &v) {
@@ -2030,7 +2030,7 @@ func BenchmarkHashDelete(b *testing.B) {
                v := make([]uint64, m.MaxEntries())
 
                b.ReportAllocs()
-
+               b.ResetTimer()
                for i := 0; i < b.N; i++ {
                        var next uint32
                        _, err := m.BatchLookupAndDelete(nil, &next, k, v, nil)

BenchmarkHashDelete/MapIteratorDelete-12                  405836              3537 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  383348              2704 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  504504              3335 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  346116              2909 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  382950              3025 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             31327             39469 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             33157             37318 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             27322             38024 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             30834             39786 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             36780             38381 ns/op           32846 B/op          8 allocs/op
PASS

brycekahle · 2023-06-26T21:21:39Z

I'm guessing this has to do with the marshaling of the slices. Can you mess around with various orders of magnitudes of the map size, to see if the difference is still just as drastic?

msherif1234 · 2023-06-27T11:35:21Z

100 entries

BenchmarkHashDelete/MapIteratorDelete-12                 1309647               908.2 ns/op            80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                 1238526               911.7 ns/op            80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                 1304388               905.3 ns/op            80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12            432500              2518 ns/op            3649 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12            389462              2595 ns/op            3649 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12            374091              2715 ns/op            3649 B/op          8 allocs/op

10000

BenchmarkHashDelete/MapIteratorDelete-12                   74716             14481 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   81942             13710 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   81295             14660 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12              7556            145436 ns/op          327898 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12              8298            142024 ns/op          327899 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12              8420            135013 ns/op          327899 B/op          8 allocs/op

100000

BenchmarkHashDelete/MapIteratorDelete-12                       1        5621548609 ns/op         6403464 B/op     400013 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   10000            105828 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   10000            103966 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12               885           1356382 ns/op         3212373 B/op          9 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12               949           1315810 ns/op         3212377 B/op          9 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12               742           1456063 ns/op         3212373 B/op          9 allocs/op

ti-mo · 2023-06-27T16:03:00Z

@msherif1234 Sorry, these benchmarks don't make any sense, except maybe:
BenchmarkHashDelete/MapIteratorDelete-12 1 5621548609 ns/op 6403464 B/op 400013 allocs/op

This demonstrates why: after the first iteration, the hash map is empty and subsequent MapIterators will return false from the first Next(), not iterating anything. 20 000 syscalls in under 14 000 ns sounds a little fast even for 2023 hardware. I pushed a new commit to #1077 with benchmarks for BatchLookupAndDelete too.

BenchmarkIterate/MapIterator-16         	             866	   1457681 ns/op	   64126 B/op	    4004 allocs/op
BenchmarkIterate/MapIteratorDelete-16   	             408	   2526392 ns/op	   80210 B/op	    6004 allocs/op
BenchmarkIterate/BatchLookup-16         	           59647	     19974 ns/op	   32851 B/op	       8 allocs/op
BenchmarkIterate/BatchLookupAndDelete-16         	   10000	    154757 ns/op	   32872 B/op	       8 allocs/op
BenchmarkIterate/BatchDelete-16                  	    8306	    136770 ns/op	   16416 B/op	       3 allocs/op

That said, it does seem like Map.batchLookup() always tries to unmarshal the whole keyBuf and valueBuf regardless of the Count received from the batch syscall. I'll see if this could be improved. Closing this.

msherif1234 added the bug Something isn't working label Jun 26, 2023

msherif1234 mentioned this issue Jun 26, 2023

WIP NETOBSERV-559: Using batchAPIs to help with CPU and memory resources netobserv/netobserv-ebpf-agent#141

Closed

ti-mo closed this as completed Jun 27, 2023

ti-mo mentioned this issue Jun 27, 2023

map: don't unmarshal unnecessary items on batch lookup #1080

Open

ti-mo added question and removed bug Something isn't working labels Jun 27, 2023

jpinsonneau mentioned this issue Mar 1, 2024

NETOBSERV-559: use LookupAndDelete to read maps netobserv/netobserv-ebpf-agent#283

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchLookupAndDelete is much slower than iterator #1078

BatchLookupAndDelete is much slower than iterator #1078

msherif1234 commented Jun 26, 2023 •

edited

Loading

brycekahle commented Jun 26, 2023

msherif1234 commented Jun 26, 2023 •

edited

Loading

brycekahle commented Jun 26, 2023

msherif1234 commented Jun 27, 2023

ti-mo commented Jun 27, 2023

BatchLookupAndDelete is much slower than iterator #1078

BatchLookupAndDelete is much slower than iterator #1078

Comments

msherif1234 commented Jun 26, 2023 • edited Loading

brycekahle commented Jun 26, 2023

msherif1234 commented Jun 26, 2023 • edited Loading

brycekahle commented Jun 26, 2023

msherif1234 commented Jun 27, 2023

ti-mo commented Jun 27, 2023

msherif1234 commented Jun 26, 2023 •

edited

Loading

msherif1234 commented Jun 26, 2023 •

edited

Loading