Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: tuples not properly preserved during serialization #26624

Closed
knz opened this issue Jun 12, 2018 · 10 comments · Fixed by #28143
Closed

sql: tuples not properly preserved during serialization #26624

knz opened this issue Jun 12, 2018 · 10 comments · Fixed by #28143
Assignees
Labels
A-sql-semantics C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Milestone

Comments

@knz
Copy link
Contributor

knz commented Jun 12, 2018

Found while investigating #26621.
The expression ROW(x) gets serialized after normalization (format with FmtParsable) to (x) which has a completely different meaning. This causes the following error when distribution is enabled, but not with local execution:

> select row(1) in (row(2), row(1), row(v)) from kv;
pq: (1:::INT) IN ((2:::INT), (1:::INT), ROW("$0")): unsupported comparison operator: (1:::INT) IN ((2:::INT), (1:::INT), ROW("$0")): expected ROW("$0") to be of type int, found type tuple{int}

Also all the labeled tuples lose their labels during serialization. This likely breaks distribution of queries with labeled tuples.

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-semantics labels Jun 12, 2018
@knz knz assigned RaduBerinde and BramGruneir and unassigned RaduBerinde Jun 12, 2018
@knz
Copy link
Contributor Author

knz commented Jun 12, 2018

cc @RaduBerinde

@knz
Copy link
Contributor Author

knz commented Jun 12, 2018

I tried to always pretty-print using the syntax row(...) but that breaks pretty-printing of a IN b because the syntax a IN ROWS(...) is not valid.

@RaduBerinde
Copy link
Member

Could we introduce syntax to annotate as tuple? Something like (1):::tuple{int} or (1):::(int)

@knz
Copy link
Contributor Author

knz commented Jun 12, 2018

Yes that might be a solution.

knz added a commit to knz/cockroach that referenced this issue Jun 12, 2018
This patch adds more complete support for composite types (labeled
tuples) by ensuring the following:

- tuple labels are preserved during constant folding of tuple
  expressions.
- tuple labels are preserved during expression transformations
  in optimizations.
- subqueries in scalar contexts receive a composite type with labels.

Note that there is currently a bug in DTuple serialization which break
composite literals in distributed execution, so the composite type
support is not fully ready yet.
This is tracked as separate bug cockroachdb#26624.

Release note: None
knz added a commit to knz/cockroach that referenced this issue Jun 12, 2018
This patch adds more complete support for composite types (labeled
tuples) by ensuring the following:

- tuple labels are preserved during constant folding of tuple
  expressions.
- tuple labels are preserved during expression transformations
  in optimizations.
- subqueries in scalar contexts receive a composite type with labels.

Note that there is currently a bug in DTuple serialization which break
composite literals in distributed execution, so the composite type
support is not fully ready yet.
This is tracked as separate bug cockroachdb#26624.

Release note: None
knz added a commit to knz/cockroach that referenced this issue Jun 13, 2018
This patch adds more complete support for composite types (labeled
tuples) by ensuring the following:

- tuple labels are preserved during constant folding of tuple
  expressions.
- tuple labels are preserved during expression transformations
  in optimizations.
- subqueries in scalar contexts receive a composite type with labels.

Note that there is currently a bug in DTuple serialization which break
composite literals in distributed execution, so the composite type
support is not fully ready yet.
This is tracked as separate bug cockroachdb#26624.

Release note: None
craig bot pushed a commit that referenced this issue Jun 13, 2018
26621:  sql,opt: propagate composite types (labeled tuples) r=knz a=knz

Needed to resolve #24866.

This patch adds more complete support for composite types (labeled
tuples) by ensuring the following:

- tuple labels are preserved during constant folding of tuple
  expressions.
- tuple labels are preserved during expression transformations
  in optimizations.
- subqueries in scalar contexts receive a composite type with labels.

Note that there is currently a bug in DTuple serialization which break
composite literals in distributed execution, so the composite type
support is not fully ready yet.
This is tracked as separate bug #26624.

Release note: None

26683: opt: don't push limit through project when ordering on synthesized column r=RaduBerinde a=RaduBerinde

It is invalid to push limit/offset through a projection if the
ordering depends on a synthesized column. Fix the rules to check for
this.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
craig bot pushed a commit that referenced this issue Jun 14, 2018
26628: sql: Support tuple column access and tuple stars r=knz a=knz

First commits from #26621.
Completes the fix to #24866 by re-activating disabled tests.
This work is yet another step towards #16971. It would actually fix #16971 if it were not for #26627, #26624 and #26629.

This work is yet another step towards #16971.

The labeled tuples introduced in #25283 can now be accessed using
their labels or using a star, e.g. `(E).*`.

Release note (sql change): Labeled tuples can now be accessed using
their labels (e.g. `SELECT (x).word FROM (SELECT pg_expand_keywords()
AS x)` or a star (e.g. `SELECT (x).* FROM (SELECT pg_expand_keywords()
AS x)`).

Fixes #26720.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Bram Gruneir <[email protected]>
@rjnn rjnn assigned rjnn and unassigned BramGruneir Jul 19, 2018
@knz knz added this to the 2.1 milestone Jul 23, 2018
@knz
Copy link
Contributor Author

knz commented Jul 27, 2018

some brainstorming:

  1. the syntax (...):::(...) is interesting conceptually but it introduces a grammar problem: when parsing an expression of the form (x+y):::(...), the parser must determine by the time it encounters the 1st closing parenthesis that it must conclude parsing a TupleExpr and not ParenExpr, whereas only the token after that (the :::) indicates how to make this choice. Remember a LALR parser usually works with just 1 token lookahead. It's doable - it requires introducing a special "extra lookahead" token (we've done that in a few places - that's the _LA trick). But it's a bit complex.

  2. maybe we should add the ROW keyword always for every 1-valued tuple, but only 1-valued tuples (not for 2+-valued tuples), to keep the pretty-printing compact in the general case. Then the only remaining case where there might be a problem is an expression like 123 IN (456) (1-valued list of values on the RHS of IN. Then we can handle IN with a special pretty-printing case for ComparisonExpr.

  3. another approach is to support the syntax (x,) to indicate a 1-valued tuple. This avoids the lookahead problem and has precedents in other languages. If we recognize this syntax we can do so everywhere, including on the RHS of IN and then there is never a problem.

Then after that for tuples with labels we need to find a new syntax which doesn't use parentheses like the current one. We can look at that after we've solved the no-label case.

@knz
Copy link
Contributor Author

knz commented Jul 27, 2018

regarding the 2nd point above. The problem of adding row just for 1-valued tuples is like this:

  • (1,2) in ((1,2),(3,4)) remains without row because these are 2+-valued - all good
  • row(1) in (row(2),row(3)) gets row and remains correct - all good
  • 123 in (456) becomes 123 in row(456) which is incorrect - hence the problem.

So in that last case I was proposing to special case the pretty-print of ComparisonExpr to detect the combination tuple RHS + length 1 and have ComparisonExpr spell out the parentheses without ROW itself, instead of delegating to (Tuple).Format().

@knz
Copy link
Contributor Author

knz commented Jul 27, 2018

although technically perhaps we could allow X IN Y for any Y with a tuple type, so 123 in row(456) would be recognized (as a crdb extension) too.

That might be another way to do solution 2.

@rjnn
Copy link
Contributor

rjnn commented Jul 31, 2018

So I went down path (2), and one problem is that there are many special cases. It is not just ComparisonExpr, it is also PARTITION, VALUES, and other such representations that store their internal expressions as Tuples.

I would prefer allowing ROW in the syntax anywhere where we allow (foo). So Values, Partition, any clause that takes a parenthesized list of values, can be prefixed with ROW. This would be purely optional with respect to users, but would clearly disambiguate things for our internal parseable format.

@knz
Copy link
Contributor Author

knz commented Aug 1, 2018

for PARTITION, VALUES and all other places the syntax is unambiguous (nothing else than a tuple can be placed there). Therefore it never fails to roundtrip even if you dont use row there.

The ambiguity really exists only for comparisons, IIRC.

@knz
Copy link
Contributor Author

knz commented Aug 1, 2018

Ok I just changed my mind and I now recognize path (2) is non-viable, but for a different reason.

The problem is not the incompatibility of syntax between IN and other things. This could be indeed solved like you did in #28100.

The problem is that a DTuple does not contain the "row" attribute (like Tuple does), so after constant folding, as soon as there are DTuples in the expression tree, the pretty-print creates broken expressions.

As long as we try to solve this with the ROW keyword, things break like pushing down the pieces of a jigsaw puzzle:

  • if you try to add ROW for 0- and 1-valued DTuples, that will make the grammar happy for the tuples themselves, but things like x in (1) will pretty-print to x in row(1:::int) and that becomes invalid.

  • if you you never add row in DTuple and instead attempt to add it from other expressions (e.g. ComparisonExpr), then 0- and 1- valued DTuples that appear in other contexts start to break.

So really the entire ROW approach is bankrupt and cannot be saved. Therefore I have changed my mind and I think that indeed the path (3) is better. There is also a simple grammar solution, which I will post shortly.

craig bot pushed a commit that referenced this issue Aug 1, 2018
28128: sql: casting to timestamp should strip tz info r=jordanlewis a=jordanlewis

And binary operations between timestamp and timestamptz should strip the
tz from the timestamptz.

Confirmed that all of this behavior matches postgres (and that it didn't
before).

Release note (bug fix): correct casts and binary operators between
timestamptz and timestamp in some cases.

28149: sql: avoid using Tuple in RangePartition r=knz a=knz

Forked off  #28143, needed for #25522 / #26624.

The `Tuple` AST node is really for scalar tuples. RangePartition is
not using a scalar tuple. So split them.

Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz knz assigned knz and unassigned jordanlewis and rjnn Aug 2, 2018
craig bot pushed a commit that referenced this issue Aug 2, 2018
28191:  sql: avoid using Tuple in ValuesClause r=knz a=knz

Forked off from #28143.
Needed for #26624.

The `Tuple` AST node is really about scalar tuples. The `VALUES`
clause operands are not really scalar tuples. So instead use
`Exprs` for `ValuesClause`.

This will simplify later patch to fix the handling of scalar tuples.

Release note: None



Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot pushed a commit that referenced this issue Aug 2, 2018
28192: sql/parser: parsing ROW produces *Tuple, not Tuple r=knz a=knz

Forked off from #28143.
Needed for #26624.

The tree.Tuple result of parsing tuples was immediately re-allocated
on the heap in the parent action rule. This patch simplifies this by
allocating on the heap upfront.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot pushed a commit that referenced this issue Aug 2, 2018
28143: sql: fix the handling of tuples in distsql r=knz a=knz

Fixes #26624.
Supersedes #28100.

This patch fixes the bug where 0-valued and 1-valued tuples were not
properly serialized in distsql processor/flow specs, causing these
special tuples to not be properly supported for distributed execution.

It also fixes the bug where labeled DTuples were not serialized with
their tuples, causing labeled tuples to be broken with distributed
execution.

Release note: None

28217: storage: dissallow empty HeartbeatRequest.Now r=andreimatei a=andreimatei

Release note: None

28219: distsql: fix a missing context in log message r=arjunravinarayan a=arjunravinarayan

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Arjun Narayan <[email protected]>
@craig craig bot closed this as completed in #28143 Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-semantics C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants