Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to implement compound keys (segment my data) #124

Closed
andrewmunro opened this issue Apr 23, 2018 · 2 comments
Closed

Question: How to implement compound keys (segment my data) #124

andrewmunro opened this issue Apr 23, 2018 · 2 comments
Labels

Comments

@andrewmunro
Copy link

Kafka is optimised around storing small messages. I'm currently using goka (very effectively) to aggregate a stream of transactions, however I'm concerned that the amount of data I store in my compacted topic will grow larger and larger if I start aggregating historical data too.

A working example:

I have a stream of transactions for users spending money.
I'm aggregating how much a user spends of all time.
I want to aggregate how much a user has spent per week.

I'm currently using a single compacted topic of userid -> user object (storing aggregated values).

To solve this problem in my mind, I need to split my dataset down into smaller pieces and use compound keys to store and retrieve historical data (e.g useridX.weekY).

Is there a way I can do this with goka? It looks to me like there is no way to change the key stored in the compacted topic from the event key that comes from a stream.

The only thing I can think is to emit from my first processor into a new stream, and then have another processor that aggregates this into a weekly compacted topic. Seems like a lot of overhead and I'm not sure if there's a simpler way.

@db7
Copy link
Collaborator

db7 commented Apr 23, 2018

Awesome! Happy to hear you're trying goka.

There is no automatic way how do such aggregation (we are still not sure how to implement a time window #99). But you could do the following:

  • When a message is received, the processor calls ctx.Loopback("userX.weekY", message).
  • Add to DefineGroup a Loop edge with a callback to handle it.

You can also store in the original key userX some bookkeeping information, eg, since which week the user has values. With that you can later do some cleanup by sending into the loopback a cleanup message.

Just remember that if a table have a single type for the values, so if you store in userX one type and userX.weekY another type, you'd actually need to wrap both in another type that can contain one or the other type.

Does that help?

@db7 db7 added the question label Apr 23, 2018
@andrewmunro
Copy link
Author

@db7 Thanks for the prompt response, I think I originally discounted using loopback due to it requiring the same type, but on second thoughts it's not the end of the world using the structure partially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants