-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve flag parsing performance continuation #4198
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
c8e0490
to
a6f1b55
Compare
This reduces the complexity of the ParseArgs function by extracting different parts of it into separate helpers. This makes the code easier to read and understand and also makes it smaller. Before: 1a080c0 14863 T github.com/aquasecurity/tracee/pkg/events.ParseArgs After: 1a0cee0 11632 T github.com/aquasecurity/tracee/pkg/events.ParseArgs It's a reduction of 21.73% in the size of the function what should also improve cache locality and performance. The benchmark tests also show a significant improvement in performance, with a reduction of 6.4% in the time it takes to parse the arguments. Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem -run=^$ -tags ebpf -bench ^(BenchmarkParseArgsPrev|BenchmarkParseArgs)$ github.com/aquasecurity/tracee/pkg/events -benchtime=20000000x -race goos: linux goarch: amd64 pkg: github.com/aquasecurity/tracee/pkg/events cpu: AMD Ryzen 9 7950X 16-Core Processor === RUN BenchmarkParseArgsPrev BenchmarkParseArgsPrev BenchmarkParseArgsPrev-32 20000000 2002 ns/op 0 B/op 0 allocs/op === RUN BenchmarkParseArgs BenchmarkParseArgs BenchmarkParseArgs-32 20000000 1873 ns/op 0 B/op 0 allocs/op
a6f1b55
to
69241cc
Compare
arg.Type = "string" | ||
} | ||
// bypass the lock contention accessing the read-only map directly | ||
def, ok := CoreEvents[ID(id)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any issue of doing that, accessing directly and no using through the Core API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me double check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, good catch @rscampos. Currently this change isn't a problem, but in the future it will cause data races due to the upcoming runtime changes to CoreEvents
. That made me wonder why keep the non-changing definitions (syscalls) under lock contention? Would it be feasible to segregate them from the sigs? @yanivagman @NDStrahilevitz WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this brings a significant improvement in CPU time, although it will cause some concurrency issues in the future. I'm in favor of merging it (giving more information in the comment) to be a starting point for discussion about internal event segregation. If that's not feasible, we can easily fall back to the lock-contention way of retrieving the definition.
Replicate both benchmarks, those are the results: parseSyscall function, improve around 11%:
On ParseArgs, in my env., was not able to see a big improve. But the way you refactor the code is more clean.
Regarding the size of the function: reduction of 3455 bytes (22,33%) in the new version but the size of the Tracee binary increase by 736 bytes (probably because you create another file with the helpers function).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Exactly. Tks for bringing that data. |
Bypass the lock contention accessing the read-only map directly.
69241cc
to
3adc191
Compare
1. Explain what the PR does
3adc191 chore(events): parseSyscall with no lock
115a85c chore(events): reduce ParseArgs complexity
2. Explain how to test it
3. Other comments
Note that the benchmark measurements are non-deterministic.