Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that a team's events always end up in the same Kafka partition #3303

Closed
wants to merge 1 commit into from

Conversation

Twixes
Copy link
Member

@Twixes Twixes commented Feb 11, 2021

Changes

This should greatly improvement our batching of events at scale by ensuring that a team's events always end up in one partition. Otherwise batches easily end up just 1 event long when the load is spread across all partitions, which is what we see on Cloud.
On the other hand, my worry is that on private deployments (with few teams, maybe even only one that's seriously used) this could cause a significant imbalance in partitioning. How to proceed best?
@fuziontech

@Twixes Twixes requested a review from fuziontech February 11, 2021 10:53
@macobo
Copy link
Contributor

macobo commented Feb 11, 2021

In practice I think this make issues worse if large client(s) get all sent to one partition. We're very top-heavy on cloud.

Also, on CH deployments we wouldn't really be able to scale out kafka/consumers due to this.

@timgl timgl temporarily deployed to posthog-better-event-ba-ddp55z February 11, 2021 10:59 Inactive
@Twixes
Copy link
Member Author

Twixes commented Feb 11, 2021

Yeah, that's the problem, but also the current way cripples batching… So there needs to be a smarter way.

@macobo
Copy link
Contributor

macobo commented Feb 11, 2021

Let's start with a common baseline - what are the core assumptions here that batching is trying to make use of to achieve performance, how does the current setup run counter to that?

@mariusandra
Copy link
Collaborator

I'd say let's enable plugin ingestion, measure the performance and see if we need this.

Looking at it differently, the more teams in a batch, the more things we can do in parallel.

@Twixes
Copy link
Member Author

Twixes commented Feb 11, 2021

Current setup sends events basically randomly to partitions (to be consumed by the plugin server) and therefore when we have 5 consumers like now on Cloud, they all get events from a project about evenly. Good for parallelization, but this way batch processing with plugins is 5x less effective, since that is inherently project-based.

@mariusandra
Copy link
Collaborator

I'd say right now we should prioritise overall throughput over individual team batch size. If there are more teams, we have more things to run in parallel, maximising resources. In the future we will have to do clever things with team routing anyway, all the way up to having different kafka topics for different organisations, so I propose to leave optimising this for then.

@mariusandra mariusandra deleted the better-event-batching branch February 11, 2021 19:13
@mariusandra mariusandra mentioned this pull request Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants