-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc/developer: add design doc for adding docs to Avro sink schemas #21564
Conversation
ba14cb3
to
44d18fd
Compare
@benesch Sure, thank you! |
It's unclear to me in the design doc, can you set a top level doc comment for the full schema? |
44d18fd
to
57af193
Compare
@sjwiesman Avro does support a top level doc for the record. We should be able to allow it as well. |
987607e
to
0012be4
Compare
Ugh, just thought of a very big wrinkle. Nested records don't retain their nice names! Given e.g. CREATE TYPE point AS (x integer, y integer);
CREATE MATERIALIZED VIEW v AS SELECT ROW(1, 1)::point AS c1, 'text' AS c2;
CREATE SINK FROM v INTO KAFKA ... FORMAT AVRO ...; the generated schema won't have a |
I wonder how hard it'd be to change sink schema generation to use the Materialize names of the nested record types rather than auto-generating names. Alternatively, perhaps the |
This is a design for #21557.
0012be4
to
b802781
Compare
I just pushed up a big change that uses SQL names rather than Avro field specifiers to indicate on which fields to attach the comments. It's quite a bit more verbiage in the design document, but I think it is not really any harder to implement, and insulates the |
+1 to specifying on sql names instead of avro field names, because that's already known to the user. |
be5dc67
to
8bc1fb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. For implementation, I think it would make sense to do this in a test-driven way, since the rules for choosing which doc string to use are rather complex. I.e., it might be better to first land some failing tests (and obviously disable them in CI) that encode the desired behavior, before writing the implementation.
Co-authored-by: umanwizard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
This all feels Complicated, and it makes me nervous that every future addition of support for new avro things will leave us with an unmanageable sink syntax. I appreciate why we haven't chosen the other options, though, so this still seems worthwhile to unlock the functionality.
If anything, though, I'd be happy to see some unpacking of why we support comment-on-type... since I don't quite understand the motivation and it ~doubles the surface area.
|
||
#### Planner | ||
|
||
The planner will "freeze" any comments that have been promoted to documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🆒
* It is not ergonomic. The provided schema must exactly match the schema | ||
Materialize generates, *except* for the `doc` fields. Minor errors in | ||
constructing the schema (e.g., using a `long` where an `int` is required, or | ||
ordering fields wrong) will result in hard to debug failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea, which came up recently in another context, was to add some avro_schema_for(<relation>)
function that would output our generated avro schema as a string.
That would mitigate this concern a bit, since users could just edit the generated schema instead of having to get the types right a priori. I don't think it helps with the other concern however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we had a similar request to see the schema without creating the sink https://github.com/MaterializeInc/materialize/issues/21661
Co-authored-by: Ben Kirwin <[email protected]>
Basically just because Avro supports it. I don't think we support enums at all, but the first two correspond directly to comment-on-type and comment-on-column here. |
@umanwizard - Yeah! That's true now but it was not originally specced that way -- they were both ways of writing field-level docs: see ece1b20. Agree that with the updated semantics all looks good! |
Thanks for the reviews folks! Enabled auto-merge. |
@moulimukherjee — here's an initial sketch of a design and implementation for Avro field documentation. Could I hand this off to you to address and resolve any resulting discussion?
This is a design for MaterializeInc/database-issues#6480.
Motivation
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.