Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

graup · 2025-01-22T11:39:13Z

First off, I'm relatively new to both this codebase and Rust in general. But I thought it might be useful to start this topic anyway.

Epic: #1548

There are a bunch of places where parentheses can appear in SQL. Currently these are not handled consistently in the AST. This makes it difficult or impossible to attach Spans to all of them. @Nyrox who did much of the work introducing Spanned mentioned using AttachedToken to encode opening and closing parenthesis tokens in the AST. I think it would be good to come up with a standard way that can be applied to all of those nodes. Also it seems in many instances this might be a breaking change.

Attaching parentheses tokens to the AST is important for accurate spans: The span for the Function node CALL(something) should go all the way from C to ), not just CALL or even worse, CALL(something. Currently there's no way to encode this information.

On the other hand, parentheses are not semantically meaningful for the AST, so I don't want to needlessly complicate the structure.

One thing that's a bit vague to me is if the parentheses should be attached to the 'thing' itself or its container? E.g. there is Function -> FunctionArguments -> FunctionArgumentsList, to which of those do the parens belong? Similarly, Expr::InSubquery has a subquery field; do the parens belong to the subquery or to the InSubquery? (Reading Fmt::Display for Expr::InSubquery would imply the parens belong to it since it renders them and not the subquery which is just an Expr – but the subquery field could be a new Subquery type, which then would be able to render its own parens... 🤔)

To make matters even more complicated, some of these parentheses are optional (see FunctionArguments and also the somewhat related OneOrManyWithParens).

Here are some nodes that need to know about their parentheses (I might have missed some; to be systematic I'd probably go through every mention of LParen/RParen in the parser).

in ast/mod.rs

Expr::Subquery
Expr::InSubquery
Expr::InUnnest
Expr::Exists
Expr::AllOp.right
FunctionArguments
FunctionArgumentList
Statement::Execute.has_parentheses

in ast/query.rs

Cte.closing_paren_token

Here's an initial idea: instead of adding opening_paren_token and closing_paren_token everywhere, how about creating a new struct ParensPair { opening: AttachedToken, closing: AttachedToken }? Or might we augment OneOrManyWithParens for this?

The text was updated successfully, but these errors were encountered:

graup changed the title ~~Consistent AST encoding of parentheses of function and subquery to enable correct spans~~ Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

graup commented Jan 22, 2025 •

edited

Loading

Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

Comments

graup commented Jan 22, 2025 • edited Loading

graup commented Jan 22, 2025 •

edited

Loading