Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
26550: sql: auto-generate column names like in PostgreSQL r=knz a=knz Fixes cockroachdb#26236. Prior to this patch, CockroachDB would generate names for results not explicitly named with AS using a pretty-printing algorithm for the entire rendered expression. Example before: ``` > SELECT cos(x + y)::INT FROM xy; +-----------------+ | cos(x + y)::INT | +-----------------+ | 0 | +-----------------+ > SELECT 'abc'::BYTES; +--------------+ | 'abc'::BYTES | +--------------+ | abc | +--------------+ > SELECT 1 + 2 +-------+ | 1 + 2 | +-------+ | 3 | +-------+ ``` There were multiple drawbacks from doing so: - the generated column names were incompatible with PostgreSQL. This showed up in edge cases like `select nullif(a,b,c)` where the column name is expected to be "`nullif`" not the pretty-printed full expression `NULLIF(a, b, c)`. - the algorithm incurred many mandatory memory allocations (to assemble the string) for every single column rendered, in every query. - the tree recursion and string interpolations needed to construct the string were using CPU cycles that do not contribute to the query results. Instead, this patch implements PostgreSQL's algorithm, which has the following properties: - it only produces pre-existing strings in memory, either from static string constants or from the text of the SQL query (parsed tokens). No additional memory allocations or string computations are needed. - it traverses fewer nodes from the AST in memory to compute its result. - it produces the same results as PostgreSQL for the same input queries. Example after: ``` > SELECT cos(x + y)::INT FROM xy; +-----+ | cos | +-----+ | 0 | +-----+ > SELECT 'abc'::BYTES; +-------+ | bytes | +-------+ | abc | +-------+ > SELECT 1 + 2 +----------+ | ?column? | +----------+ | 3 | +----------+ ``` The algorithm produces a string and a confidence level (0-2) and is recursively defined as follows: - things that get constant strings with confidence 2: - column reference get named after the unqualified column name. - function applications get named after the name of the function, e.g. `cos(x)` gets named `cos`. - function-like operators get named after the operator. For example `coalesce(a,b)` gets named `coalesce`, `row(a,b)` gets named `row`, `nullif(a,b)` is `nullif`, `exists(subquery...)` is `exists`, etc. - `ARRAY(E)` and `ARRAY[...]` gets named `array`. - boolean literals (`true`/`false`) get named `bool` because pg internally implements them as `'t'::BOOL` and `'f'::BOOL`. - `(SELECT E AS x)` gets named just `x`. - `(E).x` gets named just `x`. - `((E) AS x)` gets named just `x`. - `(VALUES (E))` gets named just `column1` (for consistency with the standalone `VALUES` clause). - things that are named recursively: - `(E)` gets named after E. - `E[N]` gets named after E. - `E COLLATE X` gets named after E. - `(SELECT E)` (no `AS` in subquery) gets named after E. - `E::T` and `E:::T` gets named after E, and if the confidence is <= 1 then after T with confidence 1. e.g. `t::INT` is named `t`, `123::INT` is named `int`. - `CASE A THEN B ELSE C END` gets named after C, and if the confidence is <= 1 then just `case` with confidence 1. - `CASE A THEN B END` (no ELSE) gets named just `case` with confidence 1. - everything else receives no name with confidence 0. When the algorithm complete, if there was no name produced the final name becomes `"?column?"`. The original code is to be found in pg's source in `src/backend/parser/parse_target.c`. Release note (sql change): CockroachDB now computes automatic column names for SELECT expressions that do not use AS using a simpler and more efficient algorithm, which also produces names more compatible with PostgreSQL. Release note (backward-incompatible change): CockroachDB now uses a different algorithm to generate column names for complex expressions in SELECT clauses when AS is not used. The results are more compatible with PostgreSQL but may appear different to client applications. This does not impact most uses of SQL, where the rendered expressions are sufficiently simple (simple function applications, reuses of existing columns) or when AS is used explicitly. Co-authored-by: Raphael 'kena' Poss <[email protected]>
- Loading branch information