-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KSQL joins will fail silently if a table's topic message key does not match the declared KSQL Table key #749
Comments
Related: #804 |
@rmoff, thanks for explaining the issue. I came across this while trying to find a solution for this: |
Does this mean that joins though the reference tables (N to N) are impossible in KSQL? CREATE STREAM UserRoles_Repartitioned AS
SELECT UserId + '-' + RoleId AS Id, UserId, RoleId
FROM UserRoles_FromDebezium
PARTITION BY Id; and then use it to create table CREATE STREAM UserRoles_ReadModel AS
SELECT u.UserName, o.RoleName, ...
FROM Users AS u
JOIN Roles AS o ON 1=1
JOIN UserRoles ur ON ur.Id = u.Id + '-' + r.Id; Is this the way to go? EDIT: changed customer/order to user/role since it is more realistic example... |
@pavel-agarkov You can't have more than one |
Where condition is not working properly in KSQL . ksql> select * from TBL_PLN_PRO_DIV_SDIV; ksql> SELECT * FROM TBL_MS_TARGET_GROUP; Join is working properly ksql> SELECT A.PRIMARY_DEMO_ID,B.DEMO_ID FROM TBL_PLN_PRO_DIV_SDIV A LEFT JOIN TBL_MS_TARGET_GROUP B ON (A.PRIMARY_DEMO_ID=B.DEMO_ID); But where condition is not working Left join . ksql> SELECT * FROM TBL_MS_TARGET_GROUP WHERE OP_TYPE = 'I'; ksql> SELECT A.PRIMARY_DEMO_ID,B.DEMO_ID FROM TBL_PLN_PRO_DIV_SDIV A LEFT JOIN TBL_MS_TARGET_GROUP B ON (A.PRIMARY_DEMO_ID=B.DEMO_ID) where B.OP_TYPE = 'I'; ksql> SELECT A.PRIMARY_DEMO_ID,B.DEMO_ID FROM TBL_PLN_PRO_DIV_SDIV A LEFT JOIN TBL_MS_TARGET_GROUP B ON (A.PRIMARY_DEMO_ID=B.DEMO_ID); 3085 | null |
@karthikeyanrd27 please open a new issue with details, and you can reference this one to link it if you think they are related. It makes it easier to track and debug specific problems. When you raise the issue, please can you include your schema ( |
@rmoff This example should work now, given that ksqlDB now support "structured keys". Ie, if you define the Can you verify so we can close this ticket? |
KSQL supports joining streams to tables. However, for this to work, the table's underlying kafka topic must have as a key the column on which the join is made. Currently KSQL silently fails to make a join in which it is non-obvious to the user (particularly one from the Database world and familiar with SQL) why it doesn't work.
Consider a simple stream/table (event/reference, a.k.a. fact/dimension) join:
RENTAL
is a stream of rental events, with various foreign key relationships including aCUSTOMER_ID
CUSTOMER
is a table of customer information, with a primary key ofCUSTOMER_ID
The data in this example comes from MySQL, connected into Kafka using Debezium.
MySQL:
The event data:
The reference data:
The executed join in MySQL:
Now the same in KSQL:
Stream:
Table:
Join:
Here is the problem. The key that we declared for the table (
KEY='customer_id'
) does not match the key for the Kafka message:Examining the underlying Kafka topic:
Same data, Avro deserialised:
So technically KSQL is evaluating the join correctly, but in practice this is going to suck for the end user, particularly one who is not familiar with Kafka's key/value message structure.
The workaround is to manually rekey the topic:
The resulting topic is keyed correctly (i.e. the key is the
CUSTOMER_ID
):Now in the KSQL table the ROWKEY matches CUSTOMER_ID:
and the desired join succeeds:
How do we make this less painful for the user? Several ideas:
Some kind of rekey operation that takes a
CREATE TABLE
declaration and explicitly rekeys the source topic (i.e. implements the above workaround).This could be done:
NOREKEY
' syntaxREKEY
option in theCREATE TABLE
declaration (requires users to know to look for the option)Evaluate a sample of messages and warn the user if the message key doesn't match the declared table key
Something that would also help reduce instances of this -but not avoid the problem entirely- would be to support different Key formats (in this case, the Key is the declared
CUSTOMER_ID
, but is serialised as Avro not String that KSQL currently assumes)Interestingly, a side-effect of KSQL only using
STRING
Keys is that the message on the derived topic cannot be read using avro-console-consumer ifprint.key=true
:The text was updated successfully, but these errors were encountered: