You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using go auto instrumentation, we need to make performance trade-offs very clear to the end user.
We need to run performance tests to measure the throughput of the go auto instrumentation supports. We have two knobs that we can tweak
Related to this topic but more on the ebpf side.
As a general note, I think the ebpf code performance has a more "direct" impact on the probed code relative to the performance of our Go code.
Currently, we use perf buffer to transfer the events from ebpf to the user code in Go.
I think that we should switch to ring buffer instead of perf buffer as it provides better performance in almost all scenarios as explained in this blog post and in more detail in this patch
A key take-away from the above links is that the biggest performance impact by using the ring/perf buffer is the process in which the kernel signals the user code (blocking on reading from the buffer) that an event is ready. Today we signal for each event. In practice, a much more efficient approach would be to not signal each time we push an event to the buffer, but take the 'sampled' approach discussed in the links - to wake the Go user code every X events, where X may be a configurable parameter. Or for example, X can be a percentage of the buffer - if more than X% of the buffer is full, signal.
I did some basic testing using bpftool prog profile and it looks like when we signal from ebpf code to the user code, the time spent in the uprboe increase drastically. Hence, implementing the sampled ring buffer approach sounds like a good idea to me.
Another point to consider is what happens if the events throughput decreases and we wait a long time for X events - without signaling the Go code. For this case, I did this Pr - which adds the ability to set a timeout for flushing the buffer by the user code in case it didn't get a signal. This timeout can be another configurable parameter.
When using go auto instrumentation, we need to make performance trade-offs very clear to the end user.
We need to run performance tests to measure the throughput of the go auto instrumentation supports. We have two knobs that we can tweak
We should derive recommended values for the above and also make them configurable.
The text was updated successfully, but these errors were encountered: