-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: enable referring to columns symbolically #10729
Conversation
I just realized I forgot to add the tests. Please comment on the general direction, I'll add the tests in parallel. |
Nice, I like this idea! Seems like something we should use for distsql (I'm currently using Review status: 0 of 7 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
LGTM so far, but as you mentioned this needs lots of testing. Also, this is introducing new syntax into SQL, so I'd like to think through the syntax as a user-facing feature here as well. A full RFC isn't necessary, but do you mind writing up a brief description of these new Ordinal References (something like what you'd find in our docs). I get the motivation for their introduction in regard to #10632, but talking through their introduction as a user-facing feature would help justify this approach over others. Reviewed 7 of 7 files at r1. pkg/sql/parser/expr.go, line 548 at r1 (raw file):
Looks like this assertion got separated from its declaration. pkg/sql/parser/sql.y, line 32 at r1 (raw file):
Comments from Reviewable |
Didn't we conclude that it makes more sense to make all node types pointers? E.g. Anyway, I was thinking that it would be great if we didn't need to modify trees to convert between |
@RaduBerinde thanks for your suggestions. Regarding using IndexedVar I will try it and see if it fits. I like the idea! @nvanbenschoten: I will add some more text in the commit message to explain with examples. Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. pkg/sql/parser/expr.go, line 548 at r1 (raw file):
|
b98aa48
to
1be19c9
Compare
@RaduBerinde I have simplified to use only IndexedVar as you suggested. PTAL. @nvanbenschoten I have extended the commit message (and the header of this PR) with your suggestions. PTAL. |
Note that there are two commits now but the first one will disappear after #10799 gets merged. |
Review status: 0 of 22 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. Comments from Reviewable |
Review status: 0 of 23 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sql/parser/indexed_vars.go, line 93 at r3 (raw file):
IndexedVars pkg/sql/parser/indexed_vars.go, line 101 at r3 (raw file):
It feels a bit off that we need to allocate a set of Comments from Reviewable |
Review status: 0 of 23 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/sql/parser/expr.go, line 548 at r1 (raw file): Previously, knz (kena) wrote…> Done.Comments from Reviewable |
24399fc
to
0929a77
Compare
TFYRs! Review status: 0 of 23 files reviewed at latest revision, 3 unresolved discussions. pkg/sql/parser/expr.go, line 548 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…> The move isn't necessary now that you removed the new expression type. I'd revert the change.pkg/sql/parser/indexed_vars.go, line 93 at r3 (raw file): Previously, RaduBerinde wrote…> IndexedVarspkg/sql/parser/indexed_vars.go, line 101 at r3 (raw file): Previously, RaduBerinde wrote…> It feels a bit off that we need to allocate a set of `IndexedVar`s (as part of `IndexedVarHelper`) just to bind a container to existing `IndexedVars`. I don't have a better suggestion though (given that we need to support IndexedVarUsed)..Comments from Reviewable |
b147b8b
to
ee6d44a
Compare
This patch introduces a new semantic concept that is useful for pretty-printing the AST back to valid SQL and again back to AST; this will be useful during development work on optimizations. The motivation is detailed below. A side effect is that it also introduces a user-visible feature. Because it is visible to user it will need to be documented, but some caveats apply. More details are given below too. **Motivation** During optimizations, column references (IndexedVar) can move around and be substituted in ways that may lose track of the original name used in the query to name the column. More specifically, we can move a reference to a position where there is no name known for the column *yet*. For example we want to go from: `SELECT * FROM (SELECT * FROM (VALUES (1), (2))) AS a(x) WHERE x > 2` to: `SELECT * FROM (SELECT * FROM ... WHERE ??? > 2) AS a(x)` Now although we can move the IndexedVar to the position in the `???` it is not so clear which *name* to give this indexed var if we pretty-print the resulting tree -- in this example the name "x" does not exist at that position. This patch makes it possible to side-step this issue, by introducing a new SQL syntax that refers to columns at the "current level" by ordinal position: "@n" where N is an integer literal. With this patch the example above can then be reliably serialized to: `SELECT * FROM (SELECT * FROM ... WHERE @1 > 2) AS a(x)` **User-visible changes** SQL already has a traditional, limited way to do refer to column numerically in some syntactic positions, specifically ORDER BY and GROUP BY. For example, in `SELECT a + b FROM foo ORDER BY 1`, "1" refers to the first value rendered, i.e. `a + b`. These are called "column ordinals"; they are supported in *some* SQL engines, sometimes for backward compatibility, sometimes because of historical reasons. The feature added in this patch complements and extends this mechanism. **However, the use of column ordinals by client applications is also customarily strongly discouraged.** The use of the new column ordinal references added in this patch should be equally discouraged. The reason why is that they are not robust against schema updates. Say, a table is initially created with columns `a, b, c` in this order. Then a query is designed to refer to column `a` by position, with number 1. Then later, independently a DB admin changes the schema and removes column `a`, and adds a new version of column `a` with e.g. a different type. Now the schema is `b, c, a`, and all the queries that expect to refer to `a` by position 1 are now broken. The new feature in this patch is also subject to this limitation. It is intended primarily for use during development when the schema updates are tightly controlled by the operator manipulating the query. Meanwhile, since the feature is visible to users it should still be (minimally) documented. The salient aspects that should be communicated are: 1) don't use this feature in client applications unless you 100% understand the limitation described above. 2) **the @ notation refers to a column number in the data source, not in the rendered columns**. The data source is the thing named after FROM. For example, suppose a table `foo` has columns `a` and `b` in this order. Then the query `SELECT b, a FROM foo WHERE @2 = 123` is equivalent to `SELECT b, a FROM foo WHERE b = 123`. 3) point 2 above means that there is a difference between the new column ordinal references and the traditional SQL ordinals, which can be illustrated as follows. With SQL ordinals, the query `SELECT b, a FROM foo ORDER BY 1` sorts with column `b`, because this is the first value rendered (columns after SELECT); whereas `SELECT b, a FROM foo ORDER BY @1` sorts with column `a`, because this is the first column in the data source (columns after FROM).
This commit disallows ordinal column references by default. Any statements using them will result in an error. The session setting `allow_ordinal_column_references` can be set to true to revert to previous behavior. Note that numeric tuple indexing (e.g., `SELECT ((1,2,3)).@2`) is still allowed. This deprecation is motivated by the inconsistent and surprising behavior that ordinal column references provide. Ordinal column references (e.g., `SELECT @1 FROM t`) are a vestigial and undocumented feature originally motivated to aid the heuristic planner. The PR that added them, cockroachdb#10729, discourages their use by users because "they are not robust against schema changes". It also points out a subtle difference between ordinal column references and SQL standard ordinals that could confuse users. As an example, the following statements are not equivalent: ```sql SELECT @2 FROM t ORDER BY @1; SELECT @2 FROM t ORDER BY 1; ``` In the current implementation, an ordinal column reference `@n` resolves to the n-th column in the current scope's slice of columns. This has several implications: 1. Ordinal columns can refer to any column, not just columns of data sources in `FROM` clauses, as was originally intended. This makes it hard to reason about the resolution of an ordinal column reference. For example, it's somewhat surprising that the result of the `SELECT` below is not `(10, 1)`. ``` CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 10); SELECT @2, @1 FROM (SELECT @2, @1 FROM t); ?column? | ?column? -----------+----------- 1 | 10 (1 row) ``` 2. The ordering of columns in the scope's slice is not guaranteed to be consistent, so any reasonable ordinal column resolution occurs more-or-less by chance. As an example of unexpected behavior, consider: ``` CREATE TABLE t (a INT PRIMARY KEY, INDEX ((a+10))); INSERT INTO t(a) VALUES (1); ALTER TABLE t ADD COLUMN b INT; SELECT @1, @2 FROM t; ?column? | ?column? -----------+----------- 1 | 11 (1 row) ``` Most users would expect the result of the `SELECT` statement to be `(1, NULL)` because `@1` should resolve to `a` and `@2` should resolve to `b`. Instead, the ordinal column reference `@2` circumvents logic that keeps inaccessible columns from being referenced, and resolves to the virtual computed column backing the secondary index. Epic: None Release note (sql change): Ordinal column references (e.g., `SELECT @1, @2 FROM t`) are now deprecated. By default, statements using this syntax will now result in an error. They can be allowed using the session setting `SET allow_ordinal_column_references=true`. Support for ordinal column references is scheduled to be removed in version 23.2.
93754: sql: deprecate ordinal column references r=mgartner a=mgartner #### sql: deprecate ordinal column references This commit disallows ordinal column references by default. Any statements using them will result in an error. The session setting `allow_ordinal_column_references` can be set to true to revert to previous behavior. Note that numeric tuple indexing (e.g., `SELECT ((1,2,3)).`@2`)` is still allowed. This deprecation is motivated by the inconsistent and surprising behavior that ordinal column references provide. Ordinal column references (e.g., `SELECT `@1` FROM t`) are a vestigial and undocumented feature originally motivated to aid the heuristic planner. The PR that added them, #10729, discourages their use by users because "they are not robust against schema changes". It also points out a subtle difference between ordinal column references and SQL standard ordinals that could confuse users. As an example, the following statements are not equivalent: ```sql SELECT `@2` FROM t ORDER BY `@1;` SELECT `@2` FROM t ORDER BY 1; ``` In the current implementation, an ordinal column reference ``@n`` resolves to the n-th column in the current scope's slice of columns. This has several implications: 1. Ordinal columns can refer to any column, not just columns of data sources in `FROM` clauses, as was originally intended. This makes it hard to reason about the resolution of an ordinal column reference. For example, it's somewhat surprising that the result of the `SELECT` below is not `(10, 1)`. ``` CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 10); SELECT `@2,` `@1` FROM (SELECT `@2,` `@1` FROM t); ?column? | ?column? -----------+----------- 1 | 10 (1 row) ``` 2. The ordering of columns in the scope's slice is not guaranteed to be consistent, so any reasonable ordinal column resolution occurs more-or-less by chance. As an example of unexpected behavior, consider: ``` CREATE TABLE t (a INT PRIMARY KEY, INDEX ((a+10))); INSERT INTO t(a) VALUES (1); ALTER TABLE t ADD COLUMN b INT; SELECT `@1,` `@2` FROM t; ?column? | ?column? -----------+----------- 1 | 11 (1 row) ``` Most users would expect the result of the `SELECT` statement to be `(1, NULL)` because ``@1`` should resolve to `a` and ``@2`` should resolve to `b`. Instead, the ordinal column reference ``@2`` circumvents logic that keeps inaccessible columns from being referenced, and resolves to the virtual computed column backing the secondary index. Epic: None Release note (sql change): Ordinal column references (e.g., `SELECT `@1,` `@2` FROM t`) are now deprecated. By default, statements using this syntax will now result in an error. They can be allowed using the session setting `SET allow_ordinal_column_references=true`. Support for ordinal column references is scheduled to be removed in version 23.2. Co-authored-by: Marcus Gartner <[email protected]>
This patch introduces a new semantic concept that is useful for
pretty-printing the AST back to valid SQL and again back to AST; this
will be useful during development work on optimizations. The
motivation is detailed below.
A side effect is that it also introduces a user-visible feature.
Because it is visible to user it will need to be documented, but some
caveats apply. More details are given below too.
Motivation
During optimizations, column references (IndexedVar) can move around
and be substituted in ways that may lose track of the original name
used in the query to name the column. More specifically, we can move a
reference to a position where there is no name known for the column
yet. For example we want to go from:
SELECT * FROM (SELECT * FROM (VALUES (1), (2))) AS a(x) WHERE x > 2
to:
SELECT * FROM (SELECT * FROM ... WHERE ??? > 2) AS a(x)
Now although we can move the IndexedVar to the position in the
???
it is not so clear which name to give this indexed var if we
pretty-print the resulting tree -- in this example the name "x" does
not exist at that position.
This patch makes it possible to side-step this issue, by introducing a
new SQL syntax that refers to columns at the "current level" by ordinal
position: "@n" where N is an integer literal.
With this patch the example above can then be reliably serialized to:
SELECT * FROM (SELECT * FROM ... WHERE @1 > 2) AS a(x)
User-visible changes
SQL already has a traditional, limited way to do refer to column
numerically in some syntactic positions, specifically ORDER BY and
GROUP BY. For example, in
SELECT a + b FROM foo ORDER BY 1
, "1"refers to the first value rendered, i.e.
a + b
.These are called "column ordinals"; they are supported in some SQL
engines, sometimes for backward compatibility, sometimes because of
historical reasons.
The feature added in this patch complements and extends this mechanism.
However, the use of column ordinals by client applications is also
customarily strongly discouraged. The use of the new column ordinal
references added in this patch should be equally discouraged. The
reason why is that they are not robust against schema updates. Say, a
table is initially created with columns
a, b, c
in this order. Thena query is designed to refer to column
a
by position, with number 1.Then later, independently a DB admin changes the schema and removes
column
a
, and adds a new version of columna
with e.g. a differenttype. Now the schema is
b, c, a
, and all the queries that expect torefer to
a
by position 1 are now broken. The new feature in thispatch is also subject to this limitation. It is intended primarily for
use during development when the schema updates are tightly controlled
by the operator manipulating the query.
Meanwhile, since the feature is visible to users it should still be
(minimally) documented. The salient aspects that should be
communicated are:
don't use this feature in client applications unless you 100%
understand the limitation described above.
the @ notation refers to a column number in the data source, not
in the rendered columns. The data source is the thing named after
FROM. For example, suppose a table
foo
has columnsa
andb
in this order. Then the query
SELECT b, a FROM foo WHERE @2 = 123
is equivalent to
SELECT b, a FROM foo WHERE b = 123
.point 2 above means that there is a difference between the new
column ordinal references and the traditional SQL ordinals, which
can be illustrated as follows. With SQL ordinals, the query
SELECT b, a FROM foo ORDER BY 1
sorts with column
b
, because this is the first value rendered(columns after SELECT); whereas
SELECT b, a FROM foo ORDER BY @1
sorts with column
a
, because this is the first column in the datasource (columns after FROM).
This feature is needed for #10632.
cc @nvanbenschoten.
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)