Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tree-sitter-sql and improve highlight queries #5683

Merged
merged 6 commits into from
Jan 28, 2023

Conversation

@pascalkuthe
Copy link
Member

Does this also fix #5071? I would guess so since it was solved just by updating the SQL parser when that issue was first opened

@LeoniePhiline
Copy link
Contributor Author

I tested the SQL dump linked at #5071 (https://gist.github.com/tkellogg/bbf57f5c43489a1e122b4f1a8d0daa9b), which

  • opened fine (taking a long second),
  • was highlighted more or less correctly,
  • and I was able to scroll through it from top to bottom. (40851 lines)

The reporter of #5071 was on MacOS, while I am using Linux, with currently no access to a MacOS device to test on.

Since you were able to reproduce the issue on Linux on 9 December, I believe we can assume the crash issue might be fixed.

Note that typing is still unbearably slow in this 13 MB file. (No matter if inserting at the top or bottom of the file.)

@pascalkuthe
Copy link
Member

I tested the SQL dump linked at #5071 (gist.github.com/tkellogg/bbf57f5c43489a1e122b4f1a8d0daa9b), which

* opened fine (taking a long second),

* was highlighted more or less correctly,

* and I was able to scroll through it from top to bottom. (40851 lines)

The reporter of #5071 was on MacOS, while I am using Linux, with currently no access to a MacOS device to test on.

Since you were able to reproduce the issue on Linux on 9 December, I believe we can assume the crash issue might be fixed.

Note that typing is still unbearably slow in this 13 MB file. (No matter if inserting at the top or bottom of the file.)

Thanks for testing that! I think we can close that issue then once this PR is merged.

The performance can likely only be improved by speeding up the grammar although around those file-sizes TS sadly starts to hit its limit. We have enough open issues about that already and the specific crash seems to be solved

@pascalkuthe pascalkuthe added E-easy Call for participation: Experience needed to fix: Easy / not much S-waiting-on-review Status: Awaiting review from a maintainer. A-language-support Area: Support for programming/text languages labels Jan 26, 2023
@LeoniePhiline
Copy link
Contributor Author

I added "Fixes #5071" to the PR.

Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeoniePhiline
Copy link
Contributor Author

LeoniePhiline commented Jan 27, 2023

What do you mean? The PR already updates to the latest commit.

Edit: Might be that GitHub doesn't show me what you are seeing, as I checked your link on mobile.

@the-mikedavis
Copy link
Member

Oh yeah that diff is not expanding properly. I meant this line https://github.com/DerekStride/tree-sitter-sql/blob/4fe05b2d81565ddb689d2f415e07afdacc515c52/grammar.js#L1438

It could be added to the big [ ... ] @operator list or for future-proofing we could mark all binary and unary operators as @operator:

(binary_expression
  operator: _ @operator)

(unary_expression
  operator: _ @operator)

@LeoniePhiline
Copy link
Contributor Author

LeoniePhiline commented Jan 27, 2023

I've added "!" (bang) to the end of the @operator list.

(bang) at work:

Before: After:
image image

However, I am unfamiliar with the tree-sitter query syntax and was unsure how to implement your second request:

Do I understand correctly, that it would extract the operators from grammar.js's unary and binary expressions and add them to @operator, effectively replacing the current manual @operator list?
If so, then this sounds like it should be done. I'll try to figure this out - tree-sitter syntax and S-expressions in general are new to me.

@the-mikedavis
Copy link
Member

I mean that you could replace the block here:

[
"*"
"+"
"-"
"/"
"%"
"^"
"||"
"="
"<"
"<="
"!="
">="
">"
] @operator

with:

(binary_expression
  operator: _ @operator)

(unary_expression
  operator: _ @operator)

as-is: _ is a wildcard so it captures whatever node is there. operator is a field for both binary_expression and unary_expression (here and here).

Fields make it easy to capture some specific node in a pattern and can capture arbitrary things. For example you could use the right field from binary_expression to capture just the right-hand-side:

(binary_expression right: _ @something)

So they're nice for a case like this where we want to tag everything that could show up in that operator field as @operator.

@LeoniePhiline
Copy link
Contributor Author

LeoniePhiline commented Jan 27, 2023

Thanks! Makes perfect sense to me.

The difference here is that * is no longer highlighted (right-most column):

Original Explicit operators Extracted operators
image image image

Should I add the following?

[
    (all_fields)
] @operator

Not sure if * should really be marked as an operator or, rather as some other node type class (e.g. simply @keyword).

As a user, I would actually like * to be highlighted, but not in the same color as types and keywords (here: pink) or literals (here: yellow), as it is more of a "special value". This theme uses orange for table aliases (and nothing else).

After working quite expansively with SQL in hx in the recent weeks, I believe a lot more different colors should be used for various SQL node types. Maybe this shouldn't be solved at the level of individual themes, but rather by applying more fine-grained node names, so not (almost) everything is a @keyword any more.

Edit: This also appears to be the theme's problem - other themes apply far more colors, e.g. to table and field names:
image

Where can I see which node types helix uses to apply highlight?

For all_fields, see: https://github.com/DerekStride/tree-sitter-sql/blob/4fe05b2d81565ddb689d2f415e07afdacc515c52/grammar.js#L908-L922

Example:

(all_fields) @special

Yields:
image

@ivktac
Copy link
Contributor

ivktac commented Jan 27, 2023

When I was updating the queries I chose the semantically appropriate scopes. You can see it here:

(invocation
name: (identifier) @function.builtin
parameter: [(field)]? @variable.other.member)
(count
name: (identifier) @function.builtin
parameter: [(field)]? @variable.other.member)
(table_reference
name: (identifier) @namespace)
(relation
table_alias: (identifier) @variable.parameter)
(field
name: (identifier) @variable.other.member)
(field
table_alias: (identifier) @variable.parameter
name: (identifier) @variable.other.member)

I didn't know that some themes haven't @variable.other.member which I added. So if @all_fields may fix it, we probably need just remove all lines that have parameter: [(field)]? @variable.other.member and add (all_fields) @special

@ivktac
Copy link
Contributor

ivktac commented Jan 27, 2023

Also what about #5708, I'm not sure how helix loads tree-sitter grammars (line by line or in some other way), but I just swapped these lines and it seems the numbers are highlighted correctly now
зображення

@the-mikedavis
Copy link
Member

Yeah, the original SQL queries were most likely written for nvim-treesitter which has the reverse precedence: in nvim-treesitter, later queries override earlier ones and in helix earlier queries override later queries. So swapping those two clauses should fix those highlights.

@the-mikedavis
Copy link
Member

That @special highlight looks good for the all_fields node: that * is different from the * arithmetic operator so it will be nice to be able to distinguish between them 👍

@the-mikedavis
Copy link
Member

You can check the grammar.js for the full list of node definitions. You can also use :tree-sitter-subtree within helix to see the nodes pretty-printed in a popup which can be useful for writing queries.

@LeoniePhiline
Copy link
Contributor Author

LeoniePhiline commented Jan 28, 2023

That @special highlight looks good for the all_fields node: that * is different from the * arithmetic operator so it will be nice to be able to distinguish between them +1

I added that to the PR in 332ad7f

You can check the grammar.js for the full list of node definitions. You can also use :tree-sitter-subtree within helix to see the nodes pretty-printed in a popup which can be useful for writing queries.

Thank you, that was SO helpful! Together with the scopes at https://docs.helix-editor.com/themes.html#scopes and :tree-sitter-scopes I have all I need.

Also what about #5708, I'm not sure how helix loads tree-sitter grammars (line by line or in some other way), but I just swapped these lines and it seems the numbers are highlighted correctly now
Yeah, the original SQL queries were most likely written for nvim-treesitter which has the reverse precedence: in nvim-treesitter, later queries override earlier ones and in helix earlier queries override later queries. So swapping those two clauses should fix those highlights.

I flipped the two directives in 7e41845
(Looks like GitHub hasn't yet updated the PR's commits. You can view the commit in branch context though: LeoniePhiline@7e41845) Edit: The latest commit (LeoniePhiline@7e41845) is still not found in the PR even 8 minutes after pushing to the PR branch? How long does this usually take? I thought GitHub updates every minute.

Also added "Fixes #5708" to PR description.

Edit: Force-pushed to force GitHub to update the PR after it failed to import the last commit added to the branch. Also did a master rebase in the process. (22b3d3d..2c6bf6f)

Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for working on this!

@the-mikedavis the-mikedavis changed the title fix(tree-sitter-sql): aggregate functions, insert set syntax, & update join syntax Update tree-sitter-sql and improve highlight queries Jan 28, 2023
@the-mikedavis the-mikedavis merged commit 482cc22 into helix-editor:master Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-language-support Area: Support for programming/text languages E-easy Call for participation: Experience needed to fix: Easy / not much S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
4 participants