You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kafka is optimised around storing small messages. I'm currently using goka (very effectively) to aggregate a stream of transactions, however I'm concerned that the amount of data I store in my compacted topic will grow larger and larger if I start aggregating historical data too.
A working example:
I have a stream of transactions for users spending money.
I'm aggregating how much a user spends of all time.
I want to aggregate how much a user has spent per week.
I'm currently using a single compacted topic of userid -> user object (storing aggregated values).
To solve this problem in my mind, I need to split my dataset down into smaller pieces and use compound keys to store and retrieve historical data (e.g useridX.weekY).
Is there a way I can do this with goka? It looks to me like there is no way to change the key stored in the compacted topic from the event key that comes from a stream.
The only thing I can think is to emit from my first processor into a new stream, and then have another processor that aggregates this into a weekly compacted topic. Seems like a lot of overhead and I'm not sure if there's a simpler way.
The text was updated successfully, but these errors were encountered:
There is no automatic way how do such aggregation (we are still not sure how to implement a time window #99). But you could do the following:
When a message is received, the processor calls ctx.Loopback("userX.weekY", message).
Add to DefineGroup a Loop edge with a callback to handle it.
You can also store in the original key userX some bookkeeping information, eg, since which week the user has values. With that you can later do some cleanup by sending into the loopback a cleanup message.
Just remember that if a table have a single type for the values, so if you store in userX one type and userX.weekY another type, you'd actually need to wrap both in another type that can contain one or the other type.
@db7 Thanks for the prompt response, I think I originally discounted using loopback due to it requiring the same type, but on second thoughts it's not the end of the world using the structure partially.
Kafka is optimised around storing small messages. I'm currently using goka (very effectively) to aggregate a stream of transactions, however I'm concerned that the amount of data I store in my compacted topic will grow larger and larger if I start aggregating historical data too.
A working example:
I have a stream of transactions for users spending money.
I'm aggregating how much a user spends of all time.
I want to aggregate how much a user has spent per week.
I'm currently using a single compacted topic of userid -> user object (storing aggregated values).
To solve this problem in my mind, I need to split my dataset down into smaller pieces and use compound keys to store and retrieve historical data (e.g
useridX.weekY
).Is there a way I can do this with goka? It looks to me like there is no way to change the key stored in the compacted topic from the event key that comes from a stream.
The only thing I can think is to emit from my first processor into a new stream, and then have another processor that aggregates this into a weekly compacted topic. Seems like a lot of overhead and I'm not sure if there's a simpler way.
The text was updated successfully, but these errors were encountered: