Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. #1676

Open
graup opened this issue Jan 22, 2025 · 0 comments

Comments

@graup
Copy link
Contributor

graup commented Jan 22, 2025

First off, I'm relatively new to both this codebase and Rust in general. But I thought it might be useful to start this topic anyway.

Epic: #1548

There are a bunch of places where parentheses can appear in SQL. Currently these are not handled consistently in the AST. This makes it difficult or impossible to attach Spans to all of them. @Nyrox who did much of the work introducing Spanned mentioned using AttachedToken to encode opening and closing parenthesis tokens in the AST. I think it would be good to come up with a standard way that can be applied to all of those nodes. Also it seems in many instances this might be a breaking change.

Attaching parentheses tokens to the AST is important for accurate spans: The span for the Function node CALL(something) should go all the way from C to ), not just CALL or even worse, CALL(something. Currently there's no way to encode this information.

On the other hand, parentheses are not semantically meaningful for the AST, so I don't want to needlessly complicate the structure.

One thing that's a bit vague to me is if the parentheses should be attached to the 'thing' itself or its container? E.g. there is Function -> FunctionArguments -> FunctionArgumentsList, to which of those do the parens belong? Similarly, Expr::InSubquery has a subquery field; do the parens belong to the subquery or to the InSubquery? (Reading Fmt::Display for Expr::InSubquery would imply the parens belong to it since it renders them and not the subquery which is just an Expr – but the subquery field could be a new Subquery type, which then would be able to render its own parens... 🤔)

To make matters even more complicated, some of these parentheses are optional (see FunctionArguments and also the somewhat related OneOrManyWithParens).

Here are some nodes that need to know about their parentheses (I might have missed some; to be systematic I'd probably go through every mention of LParen/RParen in the parser).

in ast/mod.rs

  • Expr::Subquery
  • Expr::InSubquery
  • Expr::InUnnest
  • Expr::Exists
  • Expr::AllOp.right
  • FunctionArguments
  • FunctionArgumentList
  • Statement::Execute.has_parentheses

in ast/query.rs

  • Cte.closing_paren_token

Here's an initial idea: instead of adding opening_paren_token and closing_paren_token everywhere, how about creating a new struct ParensPair { opening: AttachedToken, closing: AttachedToken }? Or might we augment OneOrManyWithParens for this?

@graup graup changed the title Consistent AST encoding of parentheses of function and subquery to enable correct spans Consistent AST encoding of parentheses to enable correct spans for function arguments, subqueries etc. Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant