-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LineReader: Reuse temporary buffer to reduce per-line allocation #27782
Conversation
@@ -0,0 +1,66 @@ | |||
package readfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you all have preferences on where/how to include benchmarks. Happy to move/remove this as desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine where you put it.
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
jenkins run tests |
Hey @kvch thanks for taking a look! I think I fixed the lint error (missing license header) if we could test again 😄 |
/test |
Pinging @elastic/agent (Team:Agent) |
Co-authored-by: Noémi Ványi <[email protected]>
Thank you! |
) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ``` (cherry picked from commit 0e3788b)
* master: [Auditbeat] scanner honor include_files (elastic#27722) chore(ci): remove not used param when triggering e2e tests (elastic#27823) LineReader: Reuse temporary buffer to reduce per-line allocation (elastic#27782)
) (#27824) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ``` (cherry picked from commit 0e3788b) Co-authored-by: Brad Moylan <[email protected]>
…stic#27782) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ```
What does this PR do?
Previously, the
LineReader
would allocate a []byte of sizeconfig.BufferSize
before decoding each line. The underlying array's size allocation is fixed, sooutBuffer.Append
retains all of it even when the appended bytes are much shorter.With this change, we store a single
tempBuffer []byte
which is reused across lines anywhere we need temporary storage. Converting tooutBuffer.Write
forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.Why is it important?
In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues