Investigate changing the pipeline order to improve performance #44

cmacknz · 2022-05-27T15:13:48Z

Let's consider the case where the shipper is configured to use a disk queue with the Elasticsearch output. Let's also assume we use the default protobuf encoding over gRPC. If we reuse the existing structure of the beats publishing pipeline, the data flow will look like:

flowchart LR

A[Input] -->|Protobuf| B[Server] 
B --> C[Processors] 
C -->|CBOR| D[Disk Queue] 
D -->|JSON| E[Elasticsearch]

The diagram shows that the data must be serialized multiple times:

To the protobuf wire format when the input sends events to the shipper using gRPC. This could optionally be replaced with JSON, but we would likely still need to deserialize it regardless.
To CBOR when writing to the disk queue.
To JSON when writing to Elasticsearch.

It seems extremely worthwhile to restructure the pipeline to eliminate the amount of times the data must be serialized:

flowchart LR

A[Input] -->|Protobuf| B[Server] 
B -->|Protobuf| C[Disk Queue] 
C --> D[Processors] 
D -->|JSON| E[Elasticsearch]

In this case we would change the disk queue's serialization format to protobuf, deferring deserialization until after data as been read from the queue. This leaves us with a single transformation from protobuf, to the shipper's internal data format, and then back to JSON (or whatever encoding the output requires).

If the memory queue were used instead of the disk queue, we could use the same strategy of storing serialized events in the memory queue and only decoding them when they are read from the queue. This would give us a way to deterministically calculate the number of bytes stored in the memory queue. Currently the memory queue size must be specified in events.

The output of this issue should be a proof of concept demonstrating that this reordering of the pipeline is possible and has the expected benefits. At minimum the work will need to include:

Modifying the gRPC server in the shipper to stop deserializing messages so they can be passed directly to the queue. The ideal option would be to keep the existing RPC definitions but implement a no-op codec. See the gRPC encoding documentation. We may need to write a custom set of RPC handlers instead of generating them:

elastic-agent-shipper/api/shipper_grpc.pb.go

Line 145 in ca42ed1

    
           func _Producer_PublishEvents_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {

A fallback option is to use messages that just wrap a bytes payload with the required message type and serialization documented in the RPC call.

Modify the disk queue to use protobuf serialization. At minimum this depends on Libbeat disk queue should support custom serialization #41 and possibly some of the work for [Meta][Feature] Implement encrypted disk queue #33 to use the new disk queue headers.
Ensure we can still return errors back to clients (after deserialization or processing, for example). [Meta][Feature] Implement end to end acknowledgement #9 should provide a mechanism for this.
Benchmark the performance of the modified pipeline and compare it to the original configuration. We do not have a set of repeatable performance tests yet, so we may choose to defer this work until we do.

The text was updated successfully, but these errors were encountered:

cmacknz added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team 8.5-candidate labels May 27, 2022

cmacknz mentioned this issue May 27, 2022

[Meta] Elastic Agent Shipper Project #16

Open

100 tasks

cmacknz changed the title ~~Investigate changing the pipeline order to improve performance~~ [Meta] Investigate changing the pipeline order to improve performance May 27, 2022

jlind23 added v8.5.0 and removed 8.5-candidate labels Jul 8, 2022

jlind23 assigned fearful-symmetry Jul 13, 2022

cmacknz unassigned fearful-symmetry Jul 20, 2022

jlind23 added 8.6-candidate and removed v8.5.0 labels Jul 20, 2022

cmacknz added 8.7-candidate and removed 8.6-candidate labels Sep 14, 2022

cmacknz changed the title ~~[Meta] Investigate changing the pipeline order to improve performance~~ Investigate changing the pipeline order to improve performance Mar 6, 2023

cmacknz mentioned this issue Jun 5, 2023

Reducing Elastic Agent memory and cpu footprint elastic/elastic-agent#2756

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate changing the pipeline order to improve performance #44

Investigate changing the pipeline order to improve performance #44

cmacknz commented May 27, 2022 •

edited

Loading

Investigate changing the pipeline order to improve performance #44

Investigate changing the pipeline order to improve performance #44

Comments

cmacknz commented May 27, 2022 • edited Loading

cmacknz commented May 27, 2022 •

edited

Loading