Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LineReader: Reuse temporary buffer to reduce per-line allocation #27782

Merged
merged 4 commits into from
Sep 9, 2021

Conversation

bmoylan
Copy link
Contributor

@bmoylan bmoylan commented Sep 7, 2021

What does this PR do?

Previously, the LineReader would allocate a []byte of size config.BufferSize before decoding each line. The underlying array's size allocation is fixed, so outBuffer.Append retains all of it even when the appended bytes are much shorter.

With this change, we store a single tempBuffer []byte which is reused across lines anywhere we need temporary storage. Converting to outBuffer.Write forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.

Why is it important?

In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:

goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/reader/readfile
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz

name                                 old time/op          new time/op          delta
EncoderReader/buffer-sized_lines-16           125µs ± 3%            94µs ± 9%  -24.55%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                 52.6µs ± 4%          36.3µs ±10%  -30.88%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  1.82ms ± 2%          1.70ms ±10%     ~     (p=0.151 n=5+5)
EncoderReader/skip_lines-16                   133µs ± 3%           140µs ± 8%     ~     (p=0.151 n=5+5)

name                                 old alloc/op         new alloc/op         delta
EncoderReader/buffer-sized_lines-16           442kB ± 0%           239kB ± 0%  -46.07%  (p=0.000 n=4+5)
EncoderReader/short_lines-16                  118kB ± 0%            15kB ± 0%  -87.27%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  8.73MB ± 0%          7.63MB ± 0%  -12.62%  (p=0.000 n=4+5)
EncoderReader/skip_lines-16                   270kB ± 0%           220kB ± 0%  -18.58%  (p=0.008 n=5+5)

name                                 old allocs/op        new allocs/op        delta
EncoderReader/buffer-sized_lines-16             718 ± 0%             519 ± 0%  -27.72%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                    522 ± 0%             421 ± 0%  -19.35%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                   2.65k ± 0%           1.58k ± 0%  -40.54%  (p=0.008 n=5+5)
EncoderReader/skip_lines-16                     420 ± 0%             419 ± 0%   -0.24%  (p=0.008 n=5+5)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 7, 2021
@@ -0,0 +1,66 @@
package readfile
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you all have preferences on where/how to include benchmarks. Happy to move/remove this as desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine where you put it.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 7, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-09-08T15:18:34.144+0000

  • Duration: 152 min 44 sec

  • Commit: bd12458

Test stats 🧪

Test Results
Failed 0
Passed 53918
Skipped 5324
Total 59242

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 53918
Skipped 5324
Total 59242

@kvch kvch self-requested a review September 7, 2021 16:47
@kvch
Copy link
Contributor

kvch commented Sep 7, 2021

jenkins run tests

@bmoylan
Copy link
Contributor Author

bmoylan commented Sep 7, 2021

Hey @kvch thanks for taking a look! I think I fixed the lint error (missing license header) if we could test again 😄

@adriansr
Copy link
Contributor

adriansr commented Sep 7, 2021

/test

@ChrsMark ChrsMark added the Team:Elastic-Agent Label for the Agent team label Sep 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 8, 2021
CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Noémi Ványi <[email protected]>
@kvch kvch added the backport-v7.16.0 Automated backport with mergify label Sep 8, 2021
@kvch
Copy link
Contributor

kvch commented Sep 9, 2021

Thank you!

@kvch kvch merged commit 0e3788b into elastic:master Sep 9, 2021
mergify bot pushed a commit that referenced this pull request Sep 9, 2021
)

## What does this PR do?

Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter.

With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.

## Why is it important?

In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:

```
goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/reader/readfile
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz

name                                 old time/op          new time/op          delta
EncoderReader/buffer-sized_lines-16           125µs ± 3%            94µs ± 9%  -24.55%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                 52.6µs ± 4%          36.3µs ±10%  -30.88%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  1.82ms ± 2%          1.70ms ±10%     ~     (p=0.151 n=5+5)
EncoderReader/skip_lines-16                   133µs ± 3%           140µs ± 8%     ~     (p=0.151 n=5+5)

name                                 old alloc/op         new alloc/op         delta
EncoderReader/buffer-sized_lines-16           442kB ± 0%           239kB ± 0%  -46.07%  (p=0.000 n=4+5)
EncoderReader/short_lines-16                  118kB ± 0%            15kB ± 0%  -87.27%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  8.73MB ± 0%          7.63MB ± 0%  -12.62%  (p=0.000 n=4+5)
EncoderReader/skip_lines-16                   270kB ± 0%           220kB ± 0%  -18.58%  (p=0.008 n=5+5)

name                                 old allocs/op        new allocs/op        delta
EncoderReader/buffer-sized_lines-16             718 ± 0%             519 ± 0%  -27.72%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                    522 ± 0%             421 ± 0%  -19.35%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                   2.65k ± 0%           1.58k ± 0%  -40.54%  (p=0.008 n=5+5)
EncoderReader/skip_lines-16                     420 ± 0%             419 ± 0%   -0.24%  (p=0.008 n=5+5)
```

(cherry picked from commit 0e3788b)
@bmoylan bmoylan deleted the bm/linereader-reuse-buffer branch September 9, 2021 15:12
mdelapenya added a commit to mdelapenya/beats that referenced this pull request Sep 9, 2021
* master:
  [Auditbeat] scanner honor include_files (elastic#27722)
  chore(ci): remove not used param when triggering e2e tests (elastic#27823)
  LineReader: Reuse temporary buffer to reduce per-line allocation (elastic#27782)
kvch pushed a commit that referenced this pull request Sep 14, 2021
) (#27824)

## What does this PR do?

Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter.

With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.

## Why is it important?

In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:

```
goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/reader/readfile
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz

name                                 old time/op          new time/op          delta
EncoderReader/buffer-sized_lines-16           125µs ± 3%            94µs ± 9%  -24.55%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                 52.6µs ± 4%          36.3µs ±10%  -30.88%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  1.82ms ± 2%          1.70ms ±10%     ~     (p=0.151 n=5+5)
EncoderReader/skip_lines-16                   133µs ± 3%           140µs ± 8%     ~     (p=0.151 n=5+5)

name                                 old alloc/op         new alloc/op         delta
EncoderReader/buffer-sized_lines-16           442kB ± 0%           239kB ± 0%  -46.07%  (p=0.000 n=4+5)
EncoderReader/short_lines-16                  118kB ± 0%            15kB ± 0%  -87.27%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  8.73MB ± 0%          7.63MB ± 0%  -12.62%  (p=0.000 n=4+5)
EncoderReader/skip_lines-16                   270kB ± 0%           220kB ± 0%  -18.58%  (p=0.008 n=5+5)

name                                 old allocs/op        new allocs/op        delta
EncoderReader/buffer-sized_lines-16             718 ± 0%             519 ± 0%  -27.72%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                    522 ± 0%             421 ± 0%  -19.35%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                   2.65k ± 0%           1.58k ± 0%  -40.54%  (p=0.008 n=5+5)
EncoderReader/skip_lines-16                     420 ± 0%             419 ± 0%   -0.24%  (p=0.008 n=5+5)
```

(cherry picked from commit 0e3788b)

Co-authored-by: Brad Moylan <[email protected]>
Icedroid pushed a commit to Icedroid/beats that referenced this pull request Nov 1, 2021
…stic#27782)

## What does this PR do?

Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter.

With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.

## Why is it important?

In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:

```
goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/reader/readfile
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz

name                                 old time/op          new time/op          delta
EncoderReader/buffer-sized_lines-16           125µs ± 3%            94µs ± 9%  -24.55%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                 52.6µs ± 4%          36.3µs ±10%  -30.88%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  1.82ms ± 2%          1.70ms ±10%     ~     (p=0.151 n=5+5)
EncoderReader/skip_lines-16                   133µs ± 3%           140µs ± 8%     ~     (p=0.151 n=5+5)

name                                 old alloc/op         new alloc/op         delta
EncoderReader/buffer-sized_lines-16           442kB ± 0%           239kB ± 0%  -46.07%  (p=0.000 n=4+5)
EncoderReader/short_lines-16                  118kB ± 0%            15kB ± 0%  -87.27%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  8.73MB ± 0%          7.63MB ± 0%  -12.62%  (p=0.000 n=4+5)
EncoderReader/skip_lines-16                   270kB ± 0%           220kB ± 0%  -18.58%  (p=0.008 n=5+5)

name                                 old allocs/op        new allocs/op        delta
EncoderReader/buffer-sized_lines-16             718 ± 0%             519 ± 0%  -27.72%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                    522 ± 0%             421 ± 0%  -19.35%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                   2.65k ± 0%           1.58k ± 0%  -40.54%  (p=0.008 n=5+5)
EncoderReader/skip_lines-16                     420 ± 0%             419 ± 0%   -0.24%  (p=0.008 n=5+5)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.16.0 Automated backport with mergify Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants