-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter,interchange,parser: Support avro sink comments #22019
adapter,interchange,parser: Support avro sink comments #22019
Conversation
589aa1c
to
ecfbee8
Compare
cd4e56f
to
a626ba7
Compare
column name is taken to be a name of a column in the top level of the | ||
materialized view. Object names are looked up according to usual SQL name | ||
resolution rules for the search path and active database. | ||
format `[[db.]schema.]object.column`. Object names are looked up according to usual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the implementatoin to keep it consistent with the naming resolution for the type below [[db.]schema.]object
and with the COMMENT ON
sql which expects the object to be specified. Also, sinks can't be queried and they don't have a relation desc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
# Test Avro UPSERT sinks doc comments | ||
|
||
$ postgres-execute connection=postgres://mz_system:materialize@${testdrive.materialize-internal-sql-addr} | ||
ALTER SYSTEM SET enable_comment = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the avro comments from materialize comments behaviour is also gated by this flag. If this flag is not set, users can't create comments in materialize and there will be no automatic documentation added in the generated avro schema.
This does not prevent the user from explicitly adding a DOC ON
options in their create sink sql though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not prevent the user from explicitly adding a DOC ON options in their create sink sql though.
We should add a separate feature flag for the use of DOC ON
comments, so that we can properly follow the feature lifecycle for this feature. It should start out in private preview while we're validating its design and implementation with our preview customers. That way we can make breaking changes if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Let me add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Put it behind enable_sink_doc_on_option
This PR has higher risk. Make sure to carefully review the file hotspots. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?
Buggy File Hotspots:
|
FWIW, this works fine in Postgres:
|
@umanwizard Yeah (updated the text). It's just not implemented yet in materialize. I will do a follow up to add both (Tracking https://github.com/MaterializeInc/database-issues/issues/6696) |
src/sql-parser/src/parser.rs
Outdated
|
||
fn parse_column_name(&mut self) -> Result<(RawItemName, Ident), ParserError> { | ||
let start = self.peek_pos(); | ||
let mut item_name = self.parse_raw_name()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This differs from
materialize/src/sql-parser/src/parser.rs
Line 7764 in 8b47d4c
let mut identifiers = self.parse_identifiers()?; |
(AvroValueFullname, String), | ||
(NullDefaults, bool, Default(false)) | ||
); | ||
/// Creating this by hand instead of using generate_extracted_config! macro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open an issue about this so we don't forget? I think keeping everything in the macro is good, so we should eventually add enum support to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍. I did not find any other similar use cases, so this could be an outlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed https://github.com/MaterializeInc/materialize/issues/22213, will add it in the comment as well.
08ba4bf
to
33453a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to Avro schema generation look fine to me. Someone from Adapter should review the rest.
I think we could use an end-to-end test that verifies that other Avro-speaking tools (e.g. the official Java or Python Avro libraries) can properly interpret these schemas and see the docs on them. But I wouldn't think that should block this PR as we have nothing like that currently. @philip-stoev or @def- , what do you think?
Makes sense to have that. I'll try implementing that test on Friday, probably using Python Avro lib, since that's easiest from our side. I'd prefer if you can wait for the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapter parts lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to take a look at this tomorrow as well! Might slip into the weekend but will get you a review by Monday morning at the latest.
Instead of bringing another tool into the mix, can we simply issue a GET request against the schema service and fish for the required parts in the response? |
Moving to draft to look into the tests |
The failure is:
Maybe I'm holding it wrong? |
e761c56
to
2464117
Compare
@def- I had run into a similar issue locally because for column/fields the "doc" field wasn't getting serialized properly, I fixed that in this PR. Running mzcompose to check this out (taking a while). Btw, is this running for a previous version? What scenario should I run this with? |
@def- Oh I think it's the envelope |
@def- It should be fixed now. |
165e417
to
83f3437
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for waiting for my tests. I postponed the Python-avro-based check until we reach a conclusion on how we want to test it.
|
||
$ schema-registry-verify schema-type=avro subject=sink-sink-comments2-value | ||
{"type":"record","name":"envelope","doc":"comment on view sink_source_comments_view","fields":[{"name":"before","type":["null",{"type":"record","name":"row","fields":[{"name":"l_k","type":"string"},{"name":"l_v1","type":["null","string"],"default":null,"doc":"doc on l_v1"},{"name":"l_v2","type":["null","long"],"default":null,"doc":"value doc on l_v1"},{"name":"c","type":"long"}]}],"default":null},{"name":"after","type":["null","row"],"default":null}]} | ||
{"type":"record","name":"envelope","fields":[{"name":"before","type":["null",{"type":"record","name":"row","doc":"comment on view sink_source_comments_view","fields":[{"name":"l_k","type":"string"},{"name":"l_v1","type":["null","string"],"default":null,"doc":"doc on l_v1"},{"name":"l_v2","type":["null","long"],"default":null,"doc":"value doc on l_v2"},{"name":"c","type":"long"}]}],"default":null},{"name":"after","type":["null","row"],"default":null}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected to have the comment on the row instead of on the envelope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see discussion here: https://materializeinc.slack.com/archives/CUFU852KT/p1696642943413989
Dennis Felsing ***@***.***> writes:
@def- commented on this pull request.
>
$ schema-registry-verify schema-type=avro subject=sink-sink-comments2-value
- {"type":"record","name":"envelope","doc":"comment on view sink_source_comments_view","fields":[{"name":"before","type":["null",{"type":"record","name":"row","fields":[{"name":"l_k","type":"string"},{"name":"l_v1","type":["null","string"],"default":null,"doc":"doc on l_v1"},{"name":"l_v2","type":["null","long"],"default":null,"doc":"value doc on l_v1"},{"name":"c","type":"long"}]}],"default":null},{"name":"after","type":["null","row"],"default":null}]}
+ {"type":"record","name":"envelope","fields":[{"name":"before","type":["null",{"type":"record","name":"row","doc":"comment on view sink_source_comments_view","fields":[{"name":"l_k","type":"string"},{"name":"l_v1","type":["null","string"],"default":null,"doc":"doc on l_v1"},{"name":"l_v2","type":["null","long"],"default":null,"doc":"value doc on l_v2"},{"name":"c","type":"long"}]}],"default":null},{"name":"after","type":["null","row"],"default":null}]}
Is it expected to have the comment on the row instead of on the
envelope?
Yes. Discussed in Slack:
https://materializeinc.slack.com/archives/CUFU852KT/p1696642943413989
…
--
Reply to this email directly or view it on GitHub:
#22019 (review)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
Addressing review comments updated error message Moar test
test updating test
platform-checks: Explicit sink comment check (currently fails) platform-checks: Added comments to Identifiers check parallel-workload: Enabled COMMENT ON testdrive: Added failure case on NULL and escaping sqlsmith: will come separately in MaterializeInc/sqlsmith#3
83f3437
to
388e025
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had time to quickly skim, but looks really solid. Basically no substantive comments. 👌🏽
# Test Avro UPSERT sinks doc comments | ||
|
||
$ postgres-execute connection=postgres://mz_system:materialize@${testdrive.materialize-internal-sql-addr} | ||
ALTER SYSTEM SET enable_comment = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not prevent the user from explicitly adding a DOC ON options in their create sink sql though.
We should add a separate feature flag for the use of DOC ON
comments, so that we can properly follow the feature lifecycle for this feature. It should start out in private preview while we're validating its design and implementation with our preview customers. That way we can make breaking changes if necessary.
column name is taken to be a name of a column in the top level of the | ||
materialized view. Object names are looked up according to usual SQL name | ||
resolution rules for the search path and active database. | ||
format `[[db.]schema.]object.column`. Object names are looked up according to usual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
@@ -69,6 +69,7 @@ | |||
"persist_streaming_snapshot_and_fetch_enabled": "true", | |||
"enable_unified_clusters": "true", | |||
"enable_jemalloc_profiling": "true", | |||
"enable_comment": "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add "enable_sink_doc_on_option": "true"
here so we can use it easily in testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
23e61d8
to
1bdcd5c
Compare
All the comments are addressed, enabling auto-merge. |
Enabling flag for tests
1bdcd5c
to
e393529
Compare
Addressed the feedback, enabled flag for tests
Added implementation to
COMMENT ON
)Currently,
COMMENT ON
(in materialize) does not support adding comments for a column in a custom type and this PR also skips that. I will do a follow up to add support for that https://github.com/MaterializeInc/database-issues/issues/6696.Motivation
Partially fixes https://github.com/MaterializeInc/database-issues/issues/6480, does not allow comments on columns in a custom type yet.
Tips for reviewer
Added inline comments.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.cc: @benesch @dseisun-materialize