[libbeat] Implement early event encoding for the Elasticsearch output #38572

faec · 2024-03-23T00:38:40Z

Add early-encoding support to the queue and the Elasticsearch output.

Early event encoding lets outputs provide helpers that can perform output serialization on an event while it is still in the queue. This early encoding is done in the memory queue's producers, and in the disk queue's reader loop. Benchmarks while writing to Elasticsearch showed significant improvements (reported numbers were measured on Filebeat with a filestream input).

Memory reduction in each preset relative to past versions:

Preset	`main`	8.13.2	8.12.2
balanced	21%	31%	41%
throughput	43%	47%	56%
scale	23%	35%	46%
latency	24%	24%	41%

CPU reduction for each preset relative to main (earlier versions had negligible difference):

Preset	CPU reduction
balanced	7%
throughput	19%
scale	7%
latency	9%

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Any Beats or Agent configuration that uses the Elasticsearch output goes through the early encoding process and tests this PR (and other outputs still indirectly test the API changes). Some special cases to exercise are:

Enable the disk queue
Disabling output compression
Configure Elasticsearch with a low limit on batch size (to test retries / batch splitting)

…client

…y encoding

elasticmachine · 2024-04-08T20:49:34Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

faec · 2024-04-08T20:50:46Z

There's still one integration test I'm looking at, in the logstash output, which calls through to the elasticsearch output with unencoded events... I'm just making sure there's no real dependency there. Otherwise things should be ready though. resolved

faec · 2024-04-08T21:00:38Z

libbeat/outputs/elasticsearch/client_integration_test.go


-	batch := outest.NewBatch(beat.Event{
+	batch := encodeBatch(client, outest.NewBatch(beat.Event{


This linter error (and the similar ones below) is mistaken, there is no typecheck error here.

What's worse, if I add a nolint directive to skip it, it gives an error because the directive is "unused", so something is maybe wrong with the linter config...

leehinman

Can you add some of the benchmark data? It would be very nice to have for historical purposes.

Since we can't toggle back and forth for this change, do you have any tests that show documents ingested with it are exactly the same as without?

libbeat/outputs/elasticsearch/event_encoder.go

faec · 2024-04-09T13:44:26Z

@leehinman

Can you add some of the benchmark data?

I'm refreshing the benchmark data right now and I'll add it to the PR when done

do you have any tests that show documents ingested with it are exactly the same as without?

Only inspection on manual tests... I will try something more systematic, though I'm not sure how much variation arises just from ingestion metadata etc. (Do you have any suggestions for how to test this? I know you did something similar with the shipper at one point.)

leehinman · 2024-04-09T14:28:45Z

Only inspection on manual tests... I will try something more systematic, though I'm not sure how much variation arises just from ingestion metadata etc. (Do you have any suggestions for how to test this? I know you did something similar with the shipper at one point.)

https://github.com/leehinman/docdiff is what I did for shipper. I don't think we need a ton of testing here, the encoding function is the same between the two. I'm probably just being paranoid.

faec · 2024-04-09T22:22:35Z

@leehinman Ok I rigged up some generated data in python (depth-3 json objects with many random keys of varying datatypes), ingested with and without this PR and otherwise identical filebeat configs (except an identifying es_encoding field added via processor, to distinguish the two cases). Then I rigged up some jq calls to sort them by an id field and take the diff. The only event fields that differed between the two versions (on 100 events with ~100 fields each) were:

@timestamp
fields.es_encoding [intentional by configuration]
agent.ephemeral_id
agent.id

I'm probably just being paranoid

Better a little paranoia than an escalation :-)

pierrehilbert · 2024-04-10T06:00:56Z

Better a little paranoia than an escalation :-)

I really like this

leehinman

LGTM

faec added 29 commits February 28, 2024 14:59

unbuffer memory queue input channel

4ff1d75

remove queue entry ids

5627633

move ack callbacks to a dedicated worker goroutine

6a27aeb

extra work to keep acknowledgments flowing

3871376

add eager handling of deletions to the queue runloop

18ab493

move eager deletion before the check to unblock the push channel

7ae8929

remove callback worker and runloop mods

3f5f4ce

add API support for preencoding beats events

d027a6f

add error check

016d0f3

re-add entry ids

15b3f7c

encode events in goroutines

98f1ed9

switch to encoder factories to support parallel encoding

d51f6bd

remember to close the channel

8add438

encode in the pipeline client

3cb57b4

free original event data when encoding is done

c0f3b22

mark pre-encoded data correctly

bd1df4c

add early delete patch

63064a6

Merge branch 'main' of github.com:elastic/beats into memqueue-encode-…

e81e2d1

…client

disentangling the experimental scaffolding from the final api

8a3592e

move the encoding datatypes into the queue package

65e0d9f

Add early encoding hooks in the memory queue

2fd6081

pass encoder factory through during memqueue creation

eab04ff

add fallback encoding to client workers if queue doesn't support earl…

b487665

…y encoding

Remove proxy queue implementation

8ee228e

Use the memory queue instead of the proxy queue in the shipper output

bd97879

Merge branch 'delete-proxy-queue' into memqueue-encode-client

8e4321f

remove early-encoding hooks from pipeline clients

fc1914b

Change how encoded event sizes are reported

3119457

remove encoding hooks from output workers and eventConsumer

562e5bb

faec added the enhancement label Mar 23, 2024

faec added 6 commits April 8, 2024 10:43

lint

b05d252

lint

46321e7

lint

4084d49

working on tests

403586b

Finish test

ebf3441

Merge branch 'main' of github.com:elastic/beats into queue-early-encode

6e581b2

faec marked this pull request as ready for review April 8, 2024 20:49

faec requested a review from a team as a code owner April 8, 2024 20:49

faec requested review from rdner and leehinman April 8, 2024 20:49

remove futile nolint tags

70c7ba4

faec commented Apr 8, 2024

View reviewed changes

faec added 3 commits April 8, 2024 17:03

make check

faaaa02

add comments

c61ce5f

add comments

8eedfd9

leehinman reviewed Apr 9, 2024

View reviewed changes

libbeat/outputs/elasticsearch/event_encoder.go Show resolved Hide resolved

Merge branch 'benchmark-vanilla' into queue-early-encode

ed614fc

faec added 2 commits April 9, 2024 20:02

update logstash tests

5cc9ebb

update changelog

732fcd8

faec added 2 commits April 10, 2024 07:06

remove debug panic from the output

75f5e73

Merge branch 'main' of github.com:elastic/beats into queue-early-encode

5fe0d35

leehinman approved these changes Apr 10, 2024

View reviewed changes

faec merged commit c9e768d into elastic:main Apr 10, 2024
205 of 215 checks passed

faec mentioned this pull request Apr 10, 2024

Add early encoding support to the Logstash output #38841

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libbeat] Implement early event encoding for the Elasticsearch output #38572

[libbeat] Implement early event encoding for the Elasticsearch output #38572

faec commented Mar 23, 2024 •

edited

Loading

elasticmachine commented Apr 8, 2024

faec commented Apr 8, 2024 •

edited

Loading

faec Apr 8, 2024

leehinman left a comment

faec commented Apr 9, 2024

leehinman commented Apr 9, 2024

faec commented Apr 9, 2024

pierrehilbert commented Apr 10, 2024

leehinman left a comment


		batch := outest.NewBatch(beat.Event{
		batch := encodeBatch(client, outest.NewBatch(beat.Event{

[libbeat] Implement early event encoding for the Elasticsearch output #38572

[libbeat] Implement early event encoding for the Elasticsearch output #38572

Conversation

faec commented Mar 23, 2024 • edited Loading

Checklist

How to test this PR locally

elasticmachine commented Apr 8, 2024

faec commented Apr 8, 2024 • edited Loading

faec Apr 8, 2024

Choose a reason for hiding this comment

leehinman left a comment

Choose a reason for hiding this comment

faec commented Apr 9, 2024

leehinman commented Apr 9, 2024

faec commented Apr 9, 2024

pierrehilbert commented Apr 10, 2024

leehinman left a comment

Choose a reason for hiding this comment

faec commented Mar 23, 2024 •

edited

Loading

faec commented Apr 8, 2024 •

edited

Loading