-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Span to Tokens, AST nodes #161
Comments
I hope @andygrove and @benesch would comment. As I'm also interested in analysis, I'd love something like this implemented. Do you think this can be implemented without adding too much boilerplate? (In the tests, in particular, they're rather verbose as it is.) A tangentially related issue I've been mulling over is the "AST" types we have now and how can they support dialects. For a while now I wanted to experiment with adopting the rust-analyzer's design of having a separate untyped/homogenous syntax tree (similar to your |
I don't think it'll be too bad. I've hacked on this a little, and the initial approach I took was to exclude
Their design goal of |
I've pushed up a 'span' branch on my fork that implements a small slice of this, to serve as discussion fodder: https://github.com/mullr/sqlparser-rs/commits/span Adding Spans to the tokenizer out put was very easy. Getting them into the guts of the parser was a bit more, but straightforward. I added it to a single AST node (Ident), and I think I arrived at a pattern that isn't too disruptive, especially to the tests. |
Sorry, I probably won't have time to look at this until the weekend, but I will review this. |
The tests aren't affected, because of the A few quick thoughts:
|
Yeah, it felt pretty bad writing that, but I don't have a better idea either.
It's a tricky question, especially because the AST here is especially abstract. I would probably put it in places where there's already a struct that clearly corresponds to a piece of syntax, but wouldn't create new pieces of structure just to staple a Span on to. In your examples: A truly convincing answer to the question would probably require an intermediate representation, as discussed above.
Yep, that's likely where the bulk of the work is. "A pain" is a good way to describe it, as it should be fairly rote. |
One thing to note: only leaf AST nodes actually require the span to be attached to them as a field. Intermediate nodes can have an implementation if the |
I think that would only work for the rowan-like design, where concatenating the leaf nodes' text returns the original source. Coming back to the |
You're absolutely right, I hadn't thought of it that way. |
Some way of accessing spans would also enable better error reporting in e.g. LocustDB since it makes it possible to point to specific spans within the original query string. |
I wanted to propose an approach that may be verbose but could help at least in testing and in deriving a way to output SQL from AST per #18 (which I think would require spans to tokens implement). My goal is to read how each portion of SQL is parsed and eventually rewritten, and to choose how any comments are maintained or removed during the rewrite. I'd like to be able to visualize them in a git diff like view comparing the old to new, just stored within the AST somehow, and giving me the opportunity to omit unnecessary extra comments if needed.
|
Thanks @aochagavia -- The issue is definitely still relevant. It is an often requested feature It came up recently here, in fact: #839 (comment) |
I started collecting related work / tickets in the description of this ticket so we can track it more easily |
I'm also interested in being able to provide better/pretty errors like with https://lib.rs/crates/ariadne or https://github.com/zkat/miette. Adding text spans to: pub struct Location {
/// Line number, starting from 1
pub line: u64,
/// Line column, starting from 1
pub column: u64,
pub start:u64,
pub end:u64,
} Is probably a first step. |
We are trying to provide better error messages at least, so I forked and added: #[derive(Debug, Clone, PartialEq, Eq)]
pub enum ParserError {
TokenizerError(String),
ParserError(String),
ParserErrorWithToken {
msg: String,
token: TokenWithLocation,
},
RecursionLimitExceeded,
} Then I override the major parser methods and now I could do something like: Error: SQL parser error: unsupported binary operator LIKE
╭─[SQL:1:1]
│
1 │ SELECT * FROM c WHERE a LIKE 1
│ ──┬─
│ ╰─── Error here
───╯ I locate the spans with the info of the I think if the
|
Thank you @Nyrox -- we certainly need a hero to make this happen The key in my mind is to find a way to incrementally introduce this feature over a few releases so downstream users have time to integrate / update the changes and we minimize disruption |
I think this is a good idea. I have implemented in my fork source locations for a good subset of AST nodes in a way which minimally breaks downstream (essentially by avoiding the @alamb should I open a draft PR where we can discuss details? |
I'd like to use this library for some refactoring and analysis tools. It would be very helpful to have source line/column information available on the AST nodes, something along the lines of
Span
fromsyn
. (https://docs.rs/syn/1.0.17/syn/spanned/trait.Spanned.html)I see that line/column is already tracked in the tokenizer. I'm thinking something along the lines of:
The main objection I can think of is that it some bulk to
Token
s, and to each AST node. This could be alleviated, if desired, by hiding them behind a feature flag. I looked atsyn
to see how they do it; the approach there is very similar. Afict, their feature flagging doesn't actually affect the node representation / token representation. They just carry it all the time.Since the implementation here doesn't appear to be hyper-optimized, and since it's good enough for syn, it seems like this approach should be okay.
If I submitted a PR along these lines, would you consider accepting it?
Related work:
The text was updated successfully, but these errors were encountered: