Question: How to implement compound keys (segment my data) #124

andrewmunro · 2018-04-23T14:16:32Z

Kafka is optimised around storing small messages. I'm currently using goka (very effectively) to aggregate a stream of transactions, however I'm concerned that the amount of data I store in my compacted topic will grow larger and larger if I start aggregating historical data too.

A working example:

I have a stream of transactions for users spending money.
I'm aggregating how much a user spends of all time.
I want to aggregate how much a user has spent per week.

I'm currently using a single compacted topic of userid -> user object (storing aggregated values).

To solve this problem in my mind, I need to split my dataset down into smaller pieces and use compound keys to store and retrieve historical data (e.g useridX.weekY).

Is there a way I can do this with goka? It looks to me like there is no way to change the key stored in the compacted topic from the event key that comes from a stream.

The only thing I can think is to emit from my first processor into a new stream, and then have another processor that aggregates this into a weekly compacted topic. Seems like a lot of overhead and I'm not sure if there's a simpler way.

The text was updated successfully, but these errors were encountered:

db7 · 2018-04-23T14:44:45Z

Awesome! Happy to hear you're trying goka.

There is no automatic way how do such aggregation (we are still not sure how to implement a time window #99). But you could do the following:

When a message is received, the processor calls ctx.Loopback("userX.weekY", message).
Add to DefineGroup a Loop edge with a callback to handle it.

You can also store in the original key userX some bookkeeping information, eg, since which week the user has values. With that you can later do some cleanup by sending into the loopback a cleanup message.

Just remember that if a table have a single type for the values, so if you store in userX one type and userX.weekY another type, you'd actually need to wrap both in another type that can contain one or the other type.

Does that help?

andrewmunro · 2018-04-24T16:11:19Z

@db7 Thanks for the prompt response, I think I originally discounted using loopback due to it requiring the same type, but on second thoughts it's not the end of the world using the structure partially.

db7 added the question label Apr 23, 2018

andrewmunro closed this as completed Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to implement compound keys (segment my data) #124

Question: How to implement compound keys (segment my data) #124

andrewmunro commented Apr 23, 2018

db7 commented Apr 23, 2018 •

edited

Loading

andrewmunro commented Apr 24, 2018

Question: How to implement compound keys (segment my data) #124

Question: How to implement compound keys (segment my data) #124

Comments

andrewmunro commented Apr 23, 2018

db7 commented Apr 23, 2018 • edited Loading

andrewmunro commented Apr 24, 2018

db7 commented Apr 23, 2018 •

edited

Loading