sql: tuples not properly preserved during serialization #26624

knz · 2018-06-12T11:49:38Z

Found while investigating #26621.
The expression ROW(x) gets serialized after normalization (format with FmtParsable) to (x) which has a completely different meaning. This causes the following error when distribution is enabled, but not with local execution:

> select row(1) in (row(2), row(1), row(v)) from kv;
pq: (1:::INT) IN ((2:::INT), (1:::INT), ROW("$0")): unsupported comparison operator: (1:::INT) IN ((2:::INT), (1:::INT), ROW("$0")): expected ROW("$0") to be of type int, found type tuple{int}

Also all the labeled tuples lose their labels during serialization. This likely breaks distribution of queries with labeled tuples.

The text was updated successfully, but these errors were encountered:

knz · 2018-06-12T11:58:13Z

cc @RaduBerinde

knz · 2018-06-12T11:58:53Z

I tried to always pretty-print using the syntax row(...) but that breaks pretty-printing of a IN b because the syntax a IN ROWS(...) is not valid.

RaduBerinde · 2018-06-12T12:08:59Z

Could we introduce syntax to annotate as tuple? Something like (1):::tuple{int} or (1):::(int)

knz · 2018-06-12T12:43:19Z

Yes that might be a solution.

This patch adds more complete support for composite types (labeled tuples) by ensuring the following: - tuple labels are preserved during constant folding of tuple expressions. - tuple labels are preserved during expression transformations in optimizations. - subqueries in scalar contexts receive a composite type with labels. Note that there is currently a bug in DTuple serialization which break composite literals in distributed execution, so the composite type support is not fully ready yet. This is tracked as separate bug cockroachdb#26624. Release note: None

26621: sql,opt: propagate composite types (labeled tuples) r=knz a=knz Needed to resolve #24866. This patch adds more complete support for composite types (labeled tuples) by ensuring the following: - tuple labels are preserved during constant folding of tuple expressions. - tuple labels are preserved during expression transformations in optimizations. - subqueries in scalar contexts receive a composite type with labels. Note that there is currently a bug in DTuple serialization which break composite literals in distributed execution, so the composite type support is not fully ready yet. This is tracked as separate bug #26624. Release note: None 26683: opt: don't push limit through project when ordering on synthesized column r=RaduBerinde a=RaduBerinde It is invalid to push limit/offset through a projection if the ordering depends on a synthesized column. Fix the rules to check for this. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Radu Berinde <[email protected]>

26628: sql: Support tuple column access and tuple stars r=knz a=knz First commits from #26621. Completes the fix to #24866 by re-activating disabled tests. This work is yet another step towards #16971. It would actually fix #16971 if it were not for #26627, #26624 and #26629. This work is yet another step towards #16971. The labeled tuples introduced in #25283 can now be accessed using their labels or using a star, e.g. `(E).*`. Release note (sql change): Labeled tuples can now be accessed using their labels (e.g. `SELECT (x).word FROM (SELECT pg_expand_keywords() AS x)` or a star (e.g. `SELECT (x).* FROM (SELECT pg_expand_keywords() AS x)`). Fixes #26720. Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Bram Gruneir <[email protected]>

knz · 2018-07-27T18:58:12Z

some brainstorming:

the syntax (...):::(...) is interesting conceptually but it introduces a grammar problem: when parsing an expression of the form (x+y):::(...), the parser must determine by the time it encounters the 1st closing parenthesis that it must conclude parsing a TupleExpr and not ParenExpr, whereas only the token after that (the :::) indicates how to make this choice. Remember a LALR parser usually works with just 1 token lookahead. It's doable - it requires introducing a special "extra lookahead" token (we've done that in a few places - that's the _LA trick). But it's a bit complex.
maybe we should add the ROW keyword always for every 1-valued tuple, but only 1-valued tuples (not for 2+-valued tuples), to keep the pretty-printing compact in the general case. Then the only remaining case where there might be a problem is an expression like 123 IN (456) (1-valued list of values on the RHS of IN. Then we can handle IN with a special pretty-printing case for ComparisonExpr.
another approach is to support the syntax (x,) to indicate a 1-valued tuple. This avoids the lookahead problem and has precedents in other languages. If we recognize this syntax we can do so everywhere, including on the RHS of IN and then there is never a problem.

Then after that for tuples with labels we need to find a new syntax which doesn't use parentheses like the current one. We can look at that after we've solved the no-label case.

knz · 2018-07-27T19:01:45Z

regarding the 2nd point above. The problem of adding row just for 1-valued tuples is like this:

(1,2) in ((1,2),(3,4)) remains without row because these are 2+-valued - all good
row(1) in (row(2),row(3)) gets row and remains correct - all good
123 in (456) becomes 123 in row(456) which is incorrect - hence the problem.

So in that last case I was proposing to special case the pretty-print of ComparisonExpr to detect the combination tuple RHS + length 1 and have ComparisonExpr spell out the parentheses without ROW itself, instead of delegating to (Tuple).Format().

knz · 2018-07-27T19:02:46Z

although technically perhaps we could allow X IN Y for any Y with a tuple type, so 123 in row(456) would be recognized (as a crdb extension) too.

That might be another way to do solution 2.

rjnn · 2018-07-31T20:22:42Z

So I went down path (2), and one problem is that there are many special cases. It is not just ComparisonExpr, it is also PARTITION, VALUES, and other such representations that store their internal expressions as Tuples.

I would prefer allowing ROW in the syntax anywhere where we allow (foo). So Values, Partition, any clause that takes a parenthesized list of values, can be prefixed with ROW. This would be purely optional with respect to users, but would clearly disambiguate things for our internal parseable format.

knz · 2018-08-01T11:07:34Z

for PARTITION, VALUES and all other places the syntax is unambiguous (nothing else than a tuple can be placed there). Therefore it never fails to roundtrip even if you dont use row there.

The ambiguity really exists only for comparisons, IIRC.

knz · 2018-08-01T11:29:10Z

Ok I just changed my mind and I now recognize path (2) is non-viable, but for a different reason.

The problem is not the incompatibility of syntax between IN and other things. This could be indeed solved like you did in #28100.

The problem is that a DTuple does not contain the "row" attribute (like Tuple does), so after constant folding, as soon as there are DTuples in the expression tree, the pretty-print creates broken expressions.

As long as we try to solve this with the ROW keyword, things break like pushing down the pieces of a jigsaw puzzle:

if you try to add ROW for 0- and 1-valued DTuples, that will make the grammar happy for the tuples themselves, but things like x in (1) will pretty-print to x in row(1:::int) and that becomes invalid.
if you you never add row in DTuple and instead attempt to add it from other expressions (e.g. ComparisonExpr), then 0- and 1- valued DTuples that appear in other contexts start to break.

So really the entire ROW approach is bankrupt and cannot be saved. Therefore I have changed my mind and I think that indeed the path (3) is better. There is also a simple grammar solution, which I will post shortly.

28128: sql: casting to timestamp should strip tz info r=jordanlewis a=jordanlewis And binary operations between timestamp and timestamptz should strip the tz from the timestamptz. Confirmed that all of this behavior matches postgres (and that it didn't before). Release note (bug fix): correct casts and binary operators between timestamptz and timestamp in some cases. 28149: sql: avoid using Tuple in RangePartition r=knz a=knz Forked off #28143, needed for #25522 / #26624. The `Tuple` AST node is really for scalar tuples. RangePartition is not using a scalar tuple. So split them. Release note: None Co-authored-by: Jordan Lewis <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>

28191: sql: avoid using Tuple in ValuesClause r=knz a=knz Forked off from #28143. Needed for #26624. The `Tuple` AST node is really about scalar tuples. The `VALUES` clause operands are not really scalar tuples. So instead use `Exprs` for `ValuesClause`. This will simplify later patch to fix the handling of scalar tuples. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>

28192: sql/parser: parsing ROW produces *Tuple, not Tuple r=knz a=knz Forked off from #28143. Needed for #26624. The tree.Tuple result of parsing tuples was immediately re-allocated on the heap in the parent action rule. This patch simplifies this by allocating on the heap upfront. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>

28143: sql: fix the handling of tuples in distsql r=knz a=knz Fixes #26624. Supersedes #28100. This patch fixes the bug where 0-valued and 1-valued tuples were not properly serialized in distsql processor/flow specs, causing these special tuples to not be properly supported for distributed execution. It also fixes the bug where labeled DTuples were not serialized with their tuples, causing labeled tuples to be broken with distributed execution. Release note: None 28217: storage: dissallow empty HeartbeatRequest.Now r=andreimatei a=andreimatei Release note: None 28219: distsql: fix a missing context in log message r=arjunravinarayan a=arjunravinarayan Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Arjun Narayan <[email protected]>

knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-semantics labels Jun 12, 2018

knz assigned RaduBerinde and BramGruneir and unassigned RaduBerinde Jun 12, 2018

knz mentioned this issue Jun 12, 2018

sql,opt: propagate composite types (labeled tuples) #26621

Merged

This was referenced Jun 12, 2018

sql: distsql does not support labeled tuples #26627

Closed

sql: Support tuple column access and tuple stars #26628

Merged

justinj mentioned this issue Jun 12, 2018

sql: support Hibernate's use of _pg_expandarray #16971

Closed

11 tasks

rjnn assigned rjnn and unassigned BramGruneir Jul 19, 2018

knz added this to the 2.1 milestone Jul 23, 2018

knz assigned jordanlewis Jul 26, 2018

This was referenced Aug 1, 2018

sql: fix the handling of tuples in distsql #28143

Merged

sql: avoid using Tuple in RangePartition #28149

Merged

This was referenced Aug 2, 2018

sql: avoid using Tuple in ValuesClause #28191

Merged

sql/parser: parsing ROW produces *Tuple, not Tuple #28192

Merged

knz assigned knz and unassigned jordanlewis and rjnn Aug 2, 2018

craig bot closed this as completed in #28143 Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: tuples not properly preserved during serialization #26624

sql: tuples not properly preserved during serialization #26624

knz commented Jun 12, 2018 •

edited

Loading

knz commented Jun 12, 2018

knz commented Jun 12, 2018

RaduBerinde commented Jun 12, 2018

knz commented Jun 12, 2018

knz commented Jul 27, 2018 •

edited

Loading

knz commented Jul 27, 2018

knz commented Jul 27, 2018

rjnn commented Jul 31, 2018

knz commented Aug 1, 2018

knz commented Aug 1, 2018

sql: tuples not properly preserved during serialization #26624

sql: tuples not properly preserved during serialization #26624

Comments

knz commented Jun 12, 2018 • edited Loading

knz commented Jun 12, 2018

knz commented Jun 12, 2018

RaduBerinde commented Jun 12, 2018

knz commented Jun 12, 2018

knz commented Jul 27, 2018 • edited Loading

knz commented Jul 27, 2018

knz commented Jul 27, 2018

rjnn commented Jul 31, 2018

knz commented Aug 1, 2018

knz commented Aug 1, 2018

knz commented Jun 12, 2018 •

edited

Loading

knz commented Jul 27, 2018 •

edited

Loading