Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: support the various timestamp column types in Avro changefeeds #32472

Closed
rolandcrosby opened this issue Nov 19, 2018 · 2 comments
Labels
A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@rolandcrosby
Copy link

We need to support the various timestamp column types (specifically, the ledger workload depends on timestamptz, but we should figure out mappings for all of the SQL timestamp types we support). This likely needs additional research about common conventions used in the Avro/Confluent/Kafka ecosystem, since AFAIK Avro doesn't have a native/straightforward timestamp representation.

@rolandcrosby rolandcrosby added the A-cdc Change Data Capture label Nov 19, 2018
@rolandcrosby
Copy link
Author

Here's what BottledWater, a (now-unmaintained) Postgres-to-Kafka connector, does.

And here's the Debezium Postgres-to-Avro type mapping.

I'm starting to think ISO 8601-formatted timestamp strings might be the rational thing to do here, since they natively let you specify time zone offsets and various levels of precision, but am open to persuasion in any direction.

@knz knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Nov 26, 2018
@rolandcrosby
Copy link
Author

@danhhz I see we now support TIMESTAMP (with and without time zone, since internally "with time zone" is converted to and stored as UTC) in Avro. Both of these are output as an Avro long with a logicalType of timestamp-micros. We do not yet support TIME or DATE in Avro. I think it makes sense to represent those types the same way (i.e. as a count of microseconds, just with no day-fraction part for DATE and only a day-fraction part for TIME), since that more or less matches what we do elsewhere in SQL. Do you have any objections to that proposed encoding?

Also, can we use the same encoding for INTERVAL? Our docs say that "intervals are stored internally as months, days, and nanoseconds", so I suspect the same encoding should work, but wanted to check if there are any gotchas you're aware of.

danhhz added a commit to danhhz/cockroach that referenced this issue Feb 14, 2019
This adds support for sql DATE, TIME, UUID, INET, and JSONB columns with
the `experimental_avro` CHANGEFEED format.

It also adds a set of "golden" tests for what avro schema each of our
sql column types map to.

INTERVAL has a corresponding avro logical type, but is left
unimplemented for now because our INTERVALs have a larger domain of
values than avro's: 8 bytes vs 4.

ARRAY, BIT, and COLLATEDSTRING are a little tricker, so we'll wait for
customer demand. These are expected to be safe enough to backport if we
need them.

Closes cockroachdb#32472
Closes cockroachdb#34421
Closes cockroachdb#34417

Release note (enterprise change): the `CHANGEFEED` `experimental_avro`
format now supports sql columns of type DATE, TIME, UUID, INET, and
JSONB.
danhhz added a commit to danhhz/cockroach that referenced this issue Feb 15, 2019
This adds support for sql DATE, TIME, UUID, INET, and JSONB columns with
the `experimental_avro` CHANGEFEED format.

It also adds a set of "golden" tests for what avro schema each of our
sql column types map to. Ditto for some example values for each type.

INTERVAL has a corresponding avro logical type, but is left
unimplemented for now because our INTERVALs have a larger domain of
values than avro's: 8 bytes vs 4.

ARRAY, BIT, and COLLATEDSTRING are a little tricker, so we'll wait for
customer demand. These are expected to be safe enough to backport if we
need them.

Closes cockroachdb#32472
Closes cockroachdb#34421
Closes cockroachdb#34417

Release note (enterprise change): the `CHANGEFEED` `experimental_avro`
format now supports sql columns of type DATE, TIME, UUID, INET, and
JSONB.
craig bot pushed a commit that referenced this issue Feb 15, 2019
34918: changefeedccl: support more columns types with avro r=nvanbenschoten a=danhhz

This adds support for sql DATE, TIME, UUID, INET, and JSONB columns with
the `experimental_avro` CHANGEFEED format.

It also adds a set of "golden" tests for what avro schema each of our
sql column types map to.

INTERVAL has a corresponding avro logical type, but is left
unimplemented for now because our INTERVALs have a larger domain of
values than avro's: 8 bytes vs 4.

ARRAY, BIT, and COLLATEDSTRING are a little tricker, so we'll wait for
customer demand. These are expected to be safe enough to backport if we
need them.

Closes #32472
Closes #34421
Closes #34417

Release note (enterprise change): the `CHANGEFEED` `experimental_avro`
format now supports sql columns of type DATE, TIME, UUID, INET, and
JSONB.

Co-authored-by: Daniel Harrison <[email protected]>
@craig craig bot closed this as completed in #34918 Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants