GCP PubSub Input Performance #35029

kcreddy · 2023-04-05T18:59:55Z

GCP Pubsub input has certain bottlenecks which needs to be addressed:

Saves CPU time while publishing the events to the internal queue by disabling EventNormalization.
Test if creating multiple beat pipeline clients improves performance.
- There is a single beat.Client for the input instance. The input creates many goroutines to read pubusb messages. But the beat.Client becomes a bottleneck because the Publish() call acquires a lock.
- Need to test the effect of having multiple beat.Clients in a pool for each pubsub input instance (like one client per configured num_goroutines). This is similar change to AWS-S3 input which massively increased input performance.
- Use multiple pub/sub subscriptions (parallel calls to Receive()) each with their own beat.Client

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-04-05T18:59:58Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

elasticmachine · 2024-01-31T19:26:47Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

kcreddy · 2024-04-17T06:33:57Z

Closed #37657 which was aimed at creating multiple pubsub clients rather than beat pipeline clients. Having multiple beat pipeline clients helps in reducing lock contention, but as seen in the attached mutex profiles in the PR, multiple pubsub clients doesn't really reduce it.

The solution we need is similar to AWS S3's SQS event processor:

beats/x-pack/filebeat/input/awss3/sqs_s3_event.go

Lines 310 to 311 in 5b24b7d

    
           // Create a pipeline client scoped to this goroutine. 
        
           client, err := p.pipeline.ConnectWith(beat.ClientConfig{

Here, the S3 input creates 1 pipeline client for each SQS message to process all S3 events within that SQS message.
But in this case GCP Pubsub input must create a Sync Pool for maintaining multiple pipeline clients, and each pipeline client must process a pubsub message. Although there would be CPU cost associated with this pool for fetching and releasing the clients, it might help reduce lock contention and the benefits and drawbacks should be tested.

kcreddy · 2024-06-24T11:19:38Z

Hey @andrewkroh,

I tried 2 variations which we discussed:

adding pipeline clients inside an array
using sync.Pool.

As pet the results below, there is not any performance improvement observed in both variations.

Both variations are taken from this base commit which is close to v8.14.0 tag: 4c4b2f8
The base filebeat version I used to compare is v8.14.0 tag (which has minor changes only related to docs and CI compared to the commit 4c4b2f8)

Variation 1: adding pipeline clients inside an array

v8.14.0...kcreddy:beats:variation1-array: 2 files changed: x-pack/filebeat/input/gcppubsub/config.go and x-pack/filebeat/input/gcppubsub/input.go

Create configurable number of Outleter (pipeline clients) and store them in an array.
Choose a pipeline client randomly from the array during sub.Receive and use it to publish messages.
PR: x-pack/filebeat/input/gcppubsub: Add configurable outlets #39999

Results:
default -> subscription.num_goroutines: 1, subscription.max_outstanding_messages: 1600
ocl -> outlet.num_pipelines (pipeline clients) config setting.
seed -> including a seed time.Now().UnixNano() (inside PR).
g2 -> subscription.num_goroutines: 2
m2000 -> subscription.max_outstanding_messages: 2000
r30s -> Test ran for 30 seconds

Events Per 30 seconds, measured from Non-zero metrics log.

Run	8.14.0	8.14.1 (2 ocl)	8.14.1 (5 ocl)	8.14.1 (5 ocl+seed)
`default-r30s`	86k	90k	88k	89k
`g2m2000r30s`	186k	197k	170k	177k

Contentions: from running pprof command: go tool pprof -seconds 30 http://localhost:5066/debug/pprof/mutex
pprof.filebeat.contentions.delay.001.8.14.0-default-r30s.pb.gz
pprof.filebeat.contentions.delay.001-8.14.0-g2m2000-r30s.pb.gz
pprof.filebeat.contentions.delay.001-8.14.1-default-ocl5-seed-r30s.pb.gz
pprof.filebeat.contentions.delay.001-8.14.1-g2m2000-ocl5-seed-r30s.pb.gz

Variation 2: using sync.Pool.

v8.14.0...kcreddy:beats:variation2-syncpool: 1 file changed: x-pack/filebeat/input/gcppubsub/input.go

Create a sync.Pool that returns an Outleter (pipeline client)
In sub.Receive, use pool.Get() and use it to publish messages and put it back using pool.Put().
PR: x-pack/filebeat/input/gcppubsub: Add syncpool on outlet #39998

Results:
default -> subscription.num_goroutines: 1, subscription.max_outstanding_messages: 1600
g2 -> subscription.num_goroutines: 2
m2000 -> subscription.max_outstanding_messages: 2000
r30s -> Test ran for 30 seconds

Events Per 30 seconds, measured from Non-zero metrics log.

Run	8.14.0	8.14.1
`default-r30s`	86k	86k
`g2m2000r30s`	186k	165k

Contentions: from running pprof command: go tool pprof -seconds 30 http://localhost:5066/debug/pprof/mutex
pprof.filebeat.contentions.delay.001.8.14.0-default-r30s.pb.gz
pprof.filebeat.contentions.delay.001-8.14.0-g2m2000-r30s.pb.gz
pprof.filebeat.contentions.delay.001.8.14.1-default-syncpool-r30s.pb.gz
pprof.filebeat.contentions.delay.001.8.14.1-g2m2000-syncpool-r30s.pb.gz

andrewkroh · 2024-06-28T14:30:24Z

The contention for the lock within the beat.Client Publish() will be most noticeable when there is some edge-processing being used (like dissecting a message in the Beat). For our integrations we have shifted most processing out of this area where the lock is being held and into Elasticsearch ingest node. The exceptions are for the default processors that are always present in Agent.

If there is little edge processing then optimizing to avoid contention on the publish lock probably isn't worthwhile. The sync.Pool experiment is showing that. And if we did pursue this we would need to use something other than sync.Pool to provide a pool of clients because we would want some upper-bound on the number of clients and we need to be able to Close() the clients.

Let's verify that we can achieve 20k EPS with no code changes, and record the settings we used. We can refer to those in the future if we are doing tuning.

kcreddy · 2024-06-30T07:04:27Z

I have run few more tests on existing filebeat (no code changes) to check if we can reach much higher throughput just by tuning the existing settings. The follow tests are run with output.file configuration, with no changes dedfault internal queue settings.

default -> subscription.num_goroutines: 1, subscription.max_outstanding_messages: 1600
g -> subscription.num_goroutines. Example: g2 => subscription.num_goroutines: 2
m -> subscription.max_outstanding_messages. Example: m2000 => subscription.max_outstanding_messages: 2000
r30s -> Test ran for 30 seconds

Run	8.14.0 (current GA)	EPS
`default-r30s`	86k	3k
`g2m2000r30s`	186k	6k
`g5m2000r30s`	408k	14k
`g5m6000r30s`	580k	19k
`g10m2000r30s`	496k	17k
`g10m6000r30s`	630k	21k

So, just by tuning subscription.num_goroutines and subscription.max_outstanding_messages, we can go more than 20k events per second and beyond.

Due to lack of throughput improvement, only optimising for contention isn't worth it to pursue this investigation to add multiple outlet (pipeline) clients further.

kcreddy added enhancement Filebeat Filebeat Team:Security-External Integrations labels Apr 5, 2023

kcreddy self-assigned this Apr 5, 2023

kcreddy changed the title ~~GCP PubSub Input Performance Enhancements~~ GCP PubSub Input Performance Apr 19, 2023

kcreddy mentioned this issue Oct 3, 2023

[filebeat][gcp-pubsub] - Set EventNormalization as false #36716

Merged

6 tasks

kcreddy mentioned this issue Jan 17, 2024

Add multiple beat.Clients #37657

Closed

6 tasks

norrietaylor added Team:Security-Service Integrations Security Service Integrations Team and removed Team:Security-External Integrations labels Jan 31, 2024

kcreddy mentioned this issue Apr 16, 2024

x-pack/filebeat/input/gcppubsub: Prevent input blockage by increasing default max_outstanding_messages #38985

Merged

6 tasks

kcreddy closed this as completed Jun 30, 2024

This was referenced Jun 30, 2024

x-pack/filebeat/input/gcppubsub: Add syncpool on outlet #39998

Closed

x-pack/filebeat/input/gcppubsub: Add configurable outlets #39999

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCP PubSub Input Performance #35029

GCP PubSub Input Performance #35029

kcreddy commented Apr 5, 2023 •

edited

Loading

elasticmachine commented Apr 5, 2023

elasticmachine commented Jan 31, 2024

kcreddy commented Apr 17, 2024

kcreddy commented Jun 24, 2024 •

edited

Loading

andrewkroh commented Jun 28, 2024

kcreddy commented Jun 30, 2024

GCP PubSub Input Performance #35029

GCP PubSub Input Performance #35029

Comments

kcreddy commented Apr 5, 2023 • edited Loading

elasticmachine commented Apr 5, 2023

elasticmachine commented Jan 31, 2024

kcreddy commented Apr 17, 2024

kcreddy commented Jun 24, 2024 • edited Loading

Variation 1: adding pipeline clients inside an array

Variation 2: using sync.Pool.

andrewkroh commented Jun 28, 2024

kcreddy commented Jun 30, 2024

kcreddy commented Apr 5, 2023 •

edited

Loading

kcreddy commented Jun 24, 2024 •

edited

Loading