Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchLookupAndDelete is much slower than iterator #1078

Closed
msherif1234 opened this issue Jun 26, 2023 · 5 comments
Closed

BatchLookupAndDelete is much slower than iterator #1078

msherif1234 opened this issue Jun 26, 2023 · 5 comments

Comments

@msherif1234
Copy link

msherif1234 commented Jun 26, 2023

Describe the bug
It was expected to see much better performance when using BatchLookupAndDelete compared to regular iterate and delete loop

uname -r
6.1.5-100.fc36.x86_64

To Reproduce
https://github.com/msherif1234/ebpf/tree/batch_performance

go test -exec sudo -bench=BenchmarkHashDelete -count 5 -run=^#
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf
cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkHashDelete/MapIteratorDelete-12                  787828              1506 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  795927              1500 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  703017              1512 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  663597              1559 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  628033              1665 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             78336             16656 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             67497             16563 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             72202             17565 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             66657             16030 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             77671             15628 ns/op           32846 B/op          8 allocs/op
PASS
ok      github.com/cilium/ebpf  18.168s

Expected behavior
I was expecting much better cpu and memory when using Batch apis

@brycekahle
Copy link
Contributor

Can you add b.ResetTimer() right before your loops? I wonder if the slice allocation is influencing the timing?

@msherif1234
Copy link
Author

msherif1234 commented Jun 26, 2023

diff --git a/map_test.go b/map_test.go
index 6908077..5a95b93 100644
--- a/map_test.go
+++ b/map_test.go
@@ -2007,7 +2007,7 @@ func BenchmarkHashDelete(b *testing.B) {
                var k, v uint64
 
                b.ReportAllocs()
-
+               b.ResetTimer()
                for i := 0; i < b.N; i++ {
                        iter := m.Iterate()
                        for iter.Next(&k, &v) {
@@ -2030,7 +2030,7 @@ func BenchmarkHashDelete(b *testing.B) {
                v := make([]uint64, m.MaxEntries())
 
                b.ReportAllocs()
-
+               b.ResetTimer()
                for i := 0; i < b.N; i++ {
                        var next uint32
                        _, err := m.BatchLookupAndDelete(nil, &next, k, v, nil)
BenchmarkHashDelete/MapIteratorDelete-12                  405836              3537 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  383348              2704 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  504504              3335 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  346116              2909 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                  382950              3025 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             31327             39469 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             33157             37318 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             27322             38024 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             30834             39786 ns/op           32846 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12             36780             38381 ns/op           32846 B/op          8 allocs/op
PASS

@brycekahle
Copy link
Contributor

I'm guessing this has to do with the marshaling of the slices. Can you mess around with various orders of magnitudes of the map size, to see if the difference is still just as drastic?

@msherif1234
Copy link
Author

  • 100 entries
BenchmarkHashDelete/MapIteratorDelete-12                 1309647               908.2 ns/op            80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                 1238526               911.7 ns/op            80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                 1304388               905.3 ns/op            80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12            432500              2518 ns/op            3649 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12            389462              2595 ns/op            3649 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12            374091              2715 ns/op            3649 B/op          8 allocs/op
  • 10000
BenchmarkHashDelete/MapIteratorDelete-12                   74716             14481 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   81942             13710 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   81295             14660 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12              7556            145436 ns/op          327898 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12              8298            142024 ns/op          327899 B/op          8 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12              8420            135013 ns/op          327899 B/op          8 allocs/op
  • 100000
BenchmarkHashDelete/MapIteratorDelete-12                       1        5621548609 ns/op         6403464 B/op     400013 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   10000            105828 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapIteratorDelete-12                   10000            103966 ns/op              80 B/op          4 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12               885           1356382 ns/op         3212373 B/op          9 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12               949           1315810 ns/op         3212377 B/op          9 allocs/op
BenchmarkHashDelete/MapBatchLookupAndDelete-12               742           1456063 ns/op         3212373 B/op          9 allocs/op

@ti-mo
Copy link
Collaborator

ti-mo commented Jun 27, 2023

@msherif1234 Sorry, these benchmarks don't make any sense, except maybe:
BenchmarkHashDelete/MapIteratorDelete-12 1 5621548609 ns/op 6403464 B/op 400013 allocs/op

This demonstrates why: after the first iteration, the hash map is empty and subsequent MapIterators will return false from the first Next(), not iterating anything. 20 000 syscalls in under 14 000 ns sounds a little fast even for 2023 hardware. I pushed a new commit to #1077 with benchmarks for BatchLookupAndDelete too.

BenchmarkIterate/MapIterator-16         	             866	   1457681 ns/op	   64126 B/op	    4004 allocs/op
BenchmarkIterate/MapIteratorDelete-16   	             408	   2526392 ns/op	   80210 B/op	    6004 allocs/op
BenchmarkIterate/BatchLookup-16         	           59647	     19974 ns/op	   32851 B/op	       8 allocs/op
BenchmarkIterate/BatchLookupAndDelete-16         	   10000	    154757 ns/op	   32872 B/op	       8 allocs/op
BenchmarkIterate/BatchDelete-16                  	    8306	    136770 ns/op	   16416 B/op	       3 allocs/op

That said, it does seem like Map.batchLookup() always tries to unmarshal the whole keyBuf and valueBuf regardless of the Count received from the batch syscall. I'll see if this could be improved. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants