-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to a binary format for internal record keys #724
Comments
Fixing this will require using some sort of binary format for internal record key. Some options: This may also require that we track key schemas per topic. It may not be reasonable to expect source topics to bey keyed according to this internal format (We currently have that expectation but feel thats ok because its reasonable to expect a string key). This means we would need to support both some external format and our internal format. |
Related ticket: #824 |
When processing a query with a GROUP BY clause containing multiple columns, KSQL generates a new key for each record so that streams can repartition and aggregate according to the GROUP BY clause. The new key is the values in the GROUP BY columns concatenated together separated by the string "|+|". If the values themselves contain this separator then the resulting key may be ambiguous.
For example, consider the following query:
CREATE TABLE AS SELECT A, B, count(*) FROM STREAM FOO GROUP BY A, B
And the following two records:
{..."A":"foo|+|bar", "B":"baz"...}
{..."A":"foo", "B":"bar|+|baz"...}
Both records will take the key "foo|+|bar|+|baz", and the result of the aggregate will be incorrect.
The text was updated successfully, but these errors were encountered: