Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querying 'not in' #9

Closed
the-mikedavis opened this issue Oct 14, 2021 · 5 comments · Fixed by #10
Closed

querying 'not in' #9

the-mikedavis opened this issue Oct 14, 2021 · 5 comments · Fixed by #10

Comments

@the-mikedavis
Copy link
Member

Hi again 👋

So I'm pretty sure this is a bug in the tree-sitter query mechanism but I can't get the not in binary operator to match a query.

I see in docs (which are lovely by the way :) that not in is parsed with the external scanner, so my guess is that something in the lexer (either in scanner.cc or something in tree-sitter) is not getting the full information it needs to understand the not in token (byte/codepoint starts/stops maybe?). I also suspect it might be possible to fix this with extra rules in the grammar.js.

Comparing the query results from a standard binary_operator like in with not in we see:

$ echo "a in b" > in.exs
$ echo "a not in b" > not-in.exs
$ echo "(binary_operator operator: _ @operator)" > query.scm
$ tree-sitter query query.scm in.exs
in.exs
  pattern: 0
    capture: operator, start: (0, 2), text: "in"
$ tree-sitter query query.scm not-in.exs
not-in.exs

# no match! (╯°□°)╯︵ ┻━┻

(The in.exs can be replaced by other binary_operators such as ++ to the same effect.)

Looking at the parse trees, there's some peculiar behavior where not in doesn't show up but other binary operators do:

$ tree-sitter parse -x in.exs
(source [0, 0] - [1, 0]
  (binary_operator [0, 0] - [0, 6]
    left: (identifier [0, 0] - [0, 1])
    right: (identifier [0, 5] - [0, 6])))
<source>
  <binary_operator>
    <identifier type="left">a</identifier>
in
    <identifier type="right">b</identifier>
</binary_operator>
</source>

$ tree-sitter parse -x not-in.exs
(source [0, 0] - [1, 0]
  (binary_operator [0, 0] - [0, 10]
    left: (identifier [0, 0] - [0, 1])
    right: (identifier [0, 9] - [0, 10])))
<source>
  <binary_operator>
    <identifier type="left">a</identifier>

    <identifier type="right">b</identifier>
</binary_operator>
</source>

And I think it's odd that not in is not there between the <identifier>s 🤔. That code in the tree-sitter CLI I think is this block:

let start = node.start_byte();
let end = node.end_byte();
let value =
    std::str::from_utf8(&source_code[start..end]).expect("has a string");
write!(&mut stdout, "{}", html_escape::encode_text(value))?;

Which is why I suspect there might be some missing information about the start and stop bytes of the $._not_in rule. (Although as we'll see below, the parser does seem to know the start/stop bytes when $._not_in is changed to be a non-hidden rule.)

a possible but not-great workaround

One workaround which allows that query.scm to match (and therefore query/highlight not in the same as any other binary_operator) is to change the grammar.js's rule for $._not_in to $.not_in so as to unhide it. Then we see a parse result of

$ tree-sitter generate
$ tree-sitter parse -x not-in.exs
(source [0, 0] - [1, 0]
  (binary_operator [0, 0] - [0, 10]
    left: (identifier [0, 0] - [0, 1])
    operator: (not_in [0, 2] - [0, 8])
    right: (identifier [0, 9] - [0, 10])))
<source>
  <binary_operator>
    <identifier type="left">a</identifier>

    <not_in type="operator">not in</not_in>

    <identifier type="right">b</identifier>
</binary_operator>
</source>

And the query matches!

$ tree-sitter query query.scm not-in.exs
not-in.exs
  pattern: 0
    capture: operator, start: (0, 2), text: "not in"

It also works as expected with arbitrary whitespace like a not in b.

This seems pretty hacky to me though to throw an extra operator node in there just for the sake of making this match the query though.

I haven't dug too deep yet into the tree-sitter codebase yet to try to hunt down exactly what's going on. I thought I'd write out my findings here first in hopes that you might already know why this parses strangely and can't be queried.

@jonatanklosko
Copy link
Member

Hey, thanks for the detailed report! I was playing with this for a bit, and managed to narrow it down. In the end it's not even related to the use of external scanner, but just the hidden node. I will submit an issue to tree-sitter with minimal reproduction shortly :)

@jonatanklosko
Copy link
Member

Reported in tree-sitter/tree-sitter#1441! I actually found a reasonable solution, so will go ahead and use it :)

@jonatanklosko
Copy link
Member

@the-mikedavis it should works as expected now, let me know if you encounter any issues :)

@the-mikedavis
Copy link
Member Author

Ah you're awesome, great find!

I just merged those changes in to the helix PR and it works!

not-in-compare

@jonatanklosko
Copy link
Member

Beautiful! 🐱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants