-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka insert prep #18485
Kafka insert prep #18485
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please improve PR title and description.
LGTM % comments
lib/trino-record-decoder/src/main/java/io/trino/decoder/RowDecoderFactory.java
Outdated
Show resolved
Hide resolved
@@ -34,9 +34,9 @@ public DispatchingRowEncoderFactory(Map<String, RowEncoderFactory> factories) | |||
this.factories = ImmutableMap.copyOf(requireNonNull(factories, "factories is null")); | |||
} | |||
|
|||
public RowEncoder create(ConnectorSession session, String dataFormat, Optional<String> dataSchema, List<EncoderColumnHandle> columnHandles) | |||
public RowEncoder create(ConnectorSession session, String dataFormat, Optional<String> dataSchema, List<EncoderColumnHandle> columnHandles, String topic, boolean isKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extract some entity that would have things related to the row? That would have parameters like: String dataFormat, Optional<String> dataSchema, List<EncoderColumnHandle> columnHandles, String topic, boolean isKey
. I don't want to create anything artificial, but something that represents some logical thing.
Also please use an enum
instead of boolean
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, lmk what you think when you have a chance. Thanks for all the suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
...in/trino-kafka/src/main/java/io/trino/plugin/kafka/schema/confluent/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaSplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaSplitManager.java
Outdated
Show resolved
Hide resolved
...test/java/io/trino/plugin/kafka/schema/confluent/TestAvroConfluentContentSchemaProvider.java
Outdated
Show resolved
Hide resolved
...test/java/io/trino/plugin/kafka/schema/confluent/TestAvroConfluentContentSchemaProvider.java
Outdated
Show resolved
Hide resolved
cbc97db
to
00a8843
Compare
@@ -31,9 +32,9 @@ public DispatchingRowDecoderFactory(Map<String, RowDecoderFactory> factories) | |||
this.factories = ImmutableMap.copyOf(factories); | |||
} | |||
|
|||
public RowDecoder create(String dataFormat, Map<String, String> decoderParams, Set<DecoderColumnHandle> columns) | |||
public RowDecoder create(String dataFormat, Map<String, String> decoderParams, Set<DecoderColumnHandle> columns, ConnectorSession session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should session be the first parameter here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks! We can extend the suggestion for encoder parameters and create a DecoderRowSpec
. Update: modified the commit, lmk when you have a chance.
{ | ||
return readSchema(tableHandle.getKeyDataSchemaLocation(), tableHandle.getKeySubject()); | ||
} | ||
|
||
@Override | ||
public final Optional<String> readValueContentSchema(KafkaTableHandle tableHandle) | ||
public final Optional<String> getValue(KafkaTableHandle tableHandle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getValue -> getMessage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do! Just for context: schema registry uses the term "value" for kafka "message" subject, not sure why since they're both confluent products. I'll change it shortly.
efad910
to
80d421c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% single comment
@@ -58,7 +58,9 @@ | |||
|
|||
public class AvroSchemaConverter | |||
{ | |||
public static final String DUMMY_FIELD_NAME = "dummy"; | |||
public static final String DUMMY_FIELD_NAME = "$cdekvwhddlgixmua"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use random hardcoded characters. I liked $dummy
more. Maybe $empty_field_marker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What concerns do you have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should field have random part and $dummy part, e.g. cdekvwhddlgixmua$dummy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It leaks to users. It is even documented. So if it is documented it cannot be so cryptic, and it has to be more user (human) friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will update, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with $empty_field_marker
- seems more descriptive than $dummy
, lmk.
No need to redefine the parameter in TestAvroDecoder
Remove hard coded value.
80d421c
to
10f7399
Compare
@@ -433,7 +433,7 @@ Inserts are not supported, and the only data format supported is AVRO. | |||
* ``IGNORE`` - Ignore structs with no fields. This propagates to parents. | |||
For example, an array of structs with no fields is ignored. | |||
* ``FAIL`` - Fail the query if a struct with no fields is defined. | |||
* ``DUMMY`` - Add a dummy boolean field called ``dummy``, which is null. | |||
* ``DUMMY`` - Add a dummy boolean field name consisting of random characters ``$empty_field_marker``, which is null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now, should replace DUMMY
to something more meaningful? Maybe MARK
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, thanks!
Change the dummy field name to something less likely to be used.
This is needed for a subsequent commit adding functionality for handling records with no fields.
This will be needed for encoding avro records that use schema registry for providing the schema.
10f7399
to
c5fde04
Compare
@elonazoulay Please provide a release notes. |
Description
Extracted commits from #16828
These are preparatory commits laying the groundwork for implementing insert into kafka.
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: