Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka tombstone message support #16962

Merged
merged 1 commit into from
Apr 14, 2023

Conversation

vlad-lyutenko
Copy link
Contributor

@vlad-lyutenko vlad-lyutenko commented Apr 11, 2023

Description

Kafka tombstone message support

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Kafka
* Fix query failure when a Kafka topic contains tombstone messages, that is messages with a ``NULL`` value. ({issue}`16962`)

@cla-bot cla-bot bot added the cla-signed label Apr 11, 2023
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/kafka-tombstone branch from 80b8a40 to 4404bb3 Compare April 11, 2023 18:28
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


waitUntilTableExists(topicName);

// tombstone message should have all null values except key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think a test which verifies that tombstones can be differentiated from messages with all fields set to null would be useful? It would help us "document" the contract which we talked about so that users can differentiate between tombstones and actual nulls in their messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added, we can differentiate these 2 cases by internal message_corrupt field

@hashhar hashhar requested a review from Praveen2112 April 12, 2023 05:26
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/kafka-tombstone branch 2 times, most recently from c05dcd9 to 7353537 Compare April 12, 2023 10:06
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/kafka-tombstone branch from 7353537 to 9ae48c4 Compare April 12, 2023 16:36
@hashhar
Copy link
Member

hashhar commented Apr 14, 2023

@mosabua this would be nice to document on how users can inspect/filter out tombstone messages if they want to. Before this change it lead to query failure.

The TL;DR is:

  • A tombstone message is a message in Kafka which has a null value with a non-null key k, often emitted to indicate that the record k was deleted
  • When querying a Kafka topic which contains tombstone messages and messages which have all their fields set to null you can differentiate between them by inspecting the _message_corrupt field. It's set to true for tombstone messages. So you can find out tombstone messages by doing something like WHERE (field1 IS NULL AND field2 IS NULL .. AND field_n IS NULL AND _message_corrupt = true). Just _message_corrupt = true will also match actually corrupt messages so by itself it's not a sufficient filter.

@hashhar hashhar merged commit c0dfd88 into trinodb:master Apr 14, 2023
@github-actions github-actions bot added this to the 414 milestone Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants