Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: CockroachDB's extended literal formats are not compatible with postgres #26128

Open
knz opened this issue May 26, 2018 · 14 comments
Open

sql: CockroachDB's extended literal formats are not compatible with postgres #26128

knz opened this issue May 26, 2018 · 14 comments
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-nostale Marks an issue/pr that should be ignored by the stale bot

Comments

@knz
Copy link
Contributor

knz commented May 26, 2018

  • format x'aaaaaaa'

    • in PostgreSQL: BIT literal, defines a bit string and allows any number of hexadecimal digits (i.e. the smallest value is 4 bits).
    • in CockroachDB: BYTES literal, defines a byte string and only allows pairs of hexadecimal digits.
  • format b'aaaaaaa'

    • in PostgreSQL: BIT literal, defines a bit string and allows any number of binary digits (i.e. the smallest value is 1 bit).
    • in CockroachDB: BYTES literal, defines a byte string using escapes.

In PostgreSQL, byte array literals use a different syntax, either:

  • e'ab\nc'::BYTEA (escaped string, the same is achieved currently with b'...' in crdb)
  • '\xAAAA'::BYTEA (i.e. string starting with \x; the same is achieved currently with x'...' in crdb)

@nvanbenschoten @bdarnell what do you think? We need to support the BIT type properly ultimately. I am tempted to set up a plan to deprecate the current literal format in 2.1 so that it becomes available for BIT in 2.2. Thoughts?

Jira issue: CRDB-5687

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect labels May 26, 2018
@bdarnell
Copy link
Contributor

Sigh. Yeah, I guess we'll need to change this (over a couple of release cycles).

@nvanbenschoten
Copy link
Member

We need to support the BIT type properly ultimately. I am tempted to set up a plan to deprecate the current literal format in 2.1 so that it becomes available for BIT in 2.2.

No objection from me, although I'm interested in the migration plan.

@knz
Copy link
Contributor Author

knz commented May 29, 2018

@nvanbenschoten basically add a new syntax and a big flashy warning with the old one in 2.1; in 2.2 make the old syntax opt-in via a compat flag, and in 2.3 remove.

@bdarnell
Copy link
Contributor

I think it might be nice to go through a cycle in which the b and x syntaxes are disallowed by default, instead of going straight from one meaning to another (to take advantage of our no-version-skip policy).

  • 2.1: warn on all uses of this syntax (how do we issue this warning? will it be noticed by users?)
  • 2.2: remove it
  • 2.3: reintroduce the syntax with the new meaning

This ensures that even if users don't notice the warning they'll get a clear breakage instead of a change in behavior. We could make the new syntax available on an opt-in basis earlier than 2.3 if it's high enough priority.

@knz
Copy link
Contributor Author

knz commented May 29, 2018

I was more thinking of this:

  • 2.1: warn on all uses of this syntax (how do we issue this warning? will it be noticed by users?)
  • 2.2: make the old syntax opt-in (i.e. not remove entirely)
  • 2.3: change the syntax with the new meaning

@nvanbenschoten
Copy link
Member

Isn't make the old syntax opt-in the warning we want for 2.1? That will certainly be noticed by users.

@knz
Copy link
Contributor Author

knz commented May 29, 2018

A warning can be ignored. Changing the thing to become opt-in is intrusive. I think we want a version in-between the current behavior and the one where the change is intrusive to warn users. Unless you think we can skip that step?

@nvanbenschoten
Copy link
Member

It seems to me that opting in (through a cluster setting, I presume) is roughly equivalent to acknowledging the warning, so it really isn't much more intrusive.

Unless you think we can skip that step?

I'll defer to Ben's judgment on that.

@knz
Copy link
Contributor Author

knz commented Aug 16, 2018

Ok we have a way forward which I am likely to explore for 2.1:

  • we support and document b'...' (small b) for byte arrays; we happen to support B'....' for them but that's an accident and not documented.

  • postgres supports and documents B'....' (capital B) for bit literals; they do also support b'...' for them but this is nowhere to be found in their docs.

So we can distinguish on the case of the "b" and properly fix #20991 and this one in one fell swoop.

@knz
Copy link
Contributor Author

knz commented Aug 16, 2018

Basically what I am proposing is to distinguish on the case of b in 2.1 to add proper support for pg's BIT, but also deprecate the small b for byte arrays in 2.1 (to be removed fully in 2.2).

@knz
Copy link
Contributor Author

knz commented Aug 19, 2018

I have tried it and it works pretty well! 🎉

see #28807.

craig bot pushed a commit that referenced this issue Sep 4, 2018
28807: sql: fix (really: add) the handling of bit arrays  r=knz a=knz

Fixes #20991.
Informs #26128.

As described in #28814 there was a mismatch between the original
implementation of BIT in CockroachDB, which was inspired by MySQL, and
that required by the PostgreSQL dialect. That prior PR #28814 thus
disabled the functionality pending a replacement.

This patch provides this replacement.

PostgreSQL bit arrays are arbitrary long strings of bits, possibly
longer (or shorter) than 64 bits, with odd number of bits. They also
support concatenation, which integers do not. They are also different
from strings, because they support bitwise logic operators, which
strings do not. Their literal values are also different from both
integer and string values, noted with `B'....'` (note the capital B;
and see discussion below).

This patch provides the correct handling of bit arrays in
CockroachDB. It does this as follows:

- a new literal syntax `B'....'` for bit array literals.
- a new coltype TBitArray.
- a new datum type DBitArray.
- new operator overloads specialized for bit arrays.
- a new ColumnType BIT (see discussion below).
- a new KV encoding type BitArray. Complete with separate
  value and key encodings.

The new bit array type is conditional on a cluster version, so that
lower version nodes in a cluster do not get confused by the new key
encoding.

Regarding the new ColumnType BIT: this now becomes a first class
SemanticType instead of a VisibleType on INT. As discussed in #28814
any column previously created with an INT semantic type and BIT in the
VisibleType is untouched.

With regards to sorting, bit arrays sort lexicographically from the
leftmost bit. In memory the bits are stored in words of 64 bits, with
the leftmost bit in the MSB of the word, so as to enable efficient
comparisons.

To ensure the ordering is preserved in the key encoding we encode as
follows:

- each word in turn using varint encoding, then
- a word list terminator, which is guaranteed to not occur in the
  words before that, then
- a varint that indicates the number of used bits in the last word.

To decode, we must first scan through the word list until we find the
terminator, to determine the number of words; then we scan again to
decode the words in memory, then we use the last varint together with
the number of decoded words to compute the final bit array size.

Release note (bug fix): CockroachDB now supports the BIT and VARBIT (BIT
VARYING) data types like PostgreSQL: this is a bit array. See the
PostgreSQL documentation for details. Only the bit array literal
notation with a capital B (e.g. `B'10001'`) is currently supported;
the syntax with a small b (e.g. `b'abcd'`) continues to denote *byte*
arrays as in previous versions of CockroachDB.

Release note (backward-incompatible change): CockroachDB happened to
support the notation `B'abcde'` previously to express byte array
literals, although this was not documented. This is not supported any
more; the notation `B'100011'` will now express *bit* array literals
like in PostgreSQL. The notation `b'...'` remains for byte array
literals.

b273321 

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@jordanlewis
Copy link
Member

@knz after #28807 does this issue change?

@knz
Copy link
Contributor Author

knz commented May 7, 2019

So here's the current status:

Syntax PostgreSQL CockroachDB Advertised feature in CockroachDB?
B'...' bit array using binary digits bit array using binary digits yes (since 19.1 - NB this used to have a different meaning prior to 2.1)
X'...' bit array using hex digits byte array using hex digits yes (since 1.1)
b'...' bit array using binary digits byte array literal, literal bytes yes (since 1.0)
x'...' bit array using hex digits byte array literal, literal bytes no

My opinions:

  1. I am sad that we have a divergence.
  2. I think it would be good to have at least one literal syntax for bit arrays using hex digits
  3. that syntax should be using either x'...' or X'...' to have at least partial overlap with pg compat
  4. a naive choice would be to say "since X'...' is documented already but x'...' is not, it's better to use x'...' this way we inconvenience fewer crdb users". The choice is naive because I actually think neither X'...' nor x'...' is actually used.
  5. Let's add telemetry to find out.
  6. I'd like to point out that for consistency it would be nice to have some symmetry, I would like to say for example "Use the capitals B and X for bit arrays, and use the lowercase b and x for byte arrays." After all we're all about making data easy.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz knz added the X-nostale Marks an issue/pr that should be ignored by the stale bot label Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-nostale Marks an issue/pr that should be ignored by the stale bot
Projects
None yet
Development

No branches or pull requests

5 participants