Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: avro changefeeds inconsistently output hex encoding for BYTES #79995

Closed
HonoreDB opened this issue Apr 15, 2022 · 3 comments
Closed

cdc: avro changefeeds inconsistently output hex encoding for BYTES #79995

HonoreDB opened this issue Apr 15, 2022 · 3 comments
Assignees
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc

Comments

@HonoreDB
Copy link
Contributor

HonoreDB commented Apr 15, 2022

When a column type is BYTES or one of its aliases, our avro encoder just passes the raw datum to goavro, but goavro then tries to detect what kind of byte array this is and convert it into a printable string. This makes it harder to use changefeeds with BYTES data that's not meant to be printable. We should document and unit-test this behavior, since we can't change it now for backwards compatibility, and probably also add an option to override it.

Jira issue: CRDB-15698

@HonoreDB HonoreDB added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture T-cdc labels Apr 15, 2022
@blathers-crl
Copy link

blathers-crl bot commented Apr 15, 2022

cc @cockroachdb/cdc

HonoreDB added a commit to HonoreDB/cockroach that referenced this issue May 17, 2022
cockroachdb#79995
was filed because standard avro deserialization libraries,
including the one some users were using and the one we use in
tests, get clever with byte sequences to make them printable.
This PR just adds a few tests to demonstrate that we're not
including escape sequences in the raw binary data we output.

Release note: None
craig bot pushed a commit that referenced this issue May 18, 2022
80849: docs: update cluster-to-cluster streaming RFC r=gh-casper a=gh-casper

Previously it used the producer job with statefull outboxes design,
now it is updated with consumer-tracked state design.

Release note: None

81411: changefeedccl: tests for non-ascii bytes in avro r=[miretskiy] a=HonoreDB

#79995
was filed because standard avro deserialization libraries,
including the one some users were using and the one we use in
tests, get clever with byte sequences to make them printable.
This PR just adds a few tests to demonstrate that we're not
including escape sequences in the raw binary data we output.

Release note: None

Co-authored-by: Casper <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
@amruss
Copy link
Contributor

amruss commented May 24, 2022

We're not going to add an option since we don’t have any control of the decoding process - external to us

@HonoreDB
Copy link
Contributor Author

Closing in favor of a docs issue cockroachdb/docs#13934

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc
Projects
None yet
Development

No branches or pull requests

3 participants