Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow capturing multiple nodes in textobject queries #1611

Merged

Conversation

sudormrfbin
Copy link
Member

Treesitter captures can contain multiple nodes like so:

(line_comment)+ @comment

This would match each line in a comment as a separate @comment capture when what we actually want is the whole set of contiguous line_comment nodes to be captured under the @comment capture. This commit enables this behavior. Also required for making comment.around textobject (#1605) work.

Note that some cases of multi node capture is not implemented yet, see the commented out test cases for more.

/cc @EpocSquadron

@sudormrfbin sudormrfbin force-pushed the tree-sitter-capture-multiple-nodes branch 2 times, most recently from 5b130a0 to 76081db Compare February 1, 2022 13:44
@EpocSquadron
Copy link
Contributor

Does this catch grouped alternations? For example the hypothetical following query:

[
  (line_comment)
  (doc_comment)
]+

I think perl might have a case where this is needed.

@sudormrfbin
Copy link
Member Author

sudormrfbin commented Feb 2, 2022

Only quantifiers after nodes are supported right now, like (comment)+ @capture, so no yet. On another note, it gives me weird results when testing a similar query with rust:

query.scm

[
  (line_comment)
  (block_comment)
]+ @cap

test.rs

// line 1
/* line a */
/* line b */
// line 2
/* line c */
// line 3

Running with the tree-sitter cli:

tree-sitter query query.scm test.rs

test.rs
  pattern: 0
    capture: 0 - cap, start: (0, 0), end: (0, 9), text: `// line 1`
  pattern: 0
    capture: 0 - cap, start: (1, 0), end: (1, 12), text: `/* line a */`
  pattern: 0
    capture: 0 - cap, start: (2, 0), end: (2, 12), text: `/* line b */`
    capture: 0 - cap, start: (3, 0), end: (3, 9), text: `// line 2`
  pattern: 0
    capture: 0 - cap, start: (4, 0), end: (4, 12), text: `/* line c */`
    capture: 0 - cap, start: (5, 0), end: (5, 9), text: `// line 3`

The captures are in separate matches (with one "match" corresponding to matching a pattern uniquely), while I was expecting them to be one single match with multiple captures of @cap; i.e. if the output was like this:

test.rs
  pattern: 0
    capture: 0 - cap, start: (0, 0), end: (0, 9), text: `// line 1`
    capture: 0 - cap, start: (1, 0), end: (1, 12), text: `/* line a */`
    capture: 0 - cap, start: (2, 0), end: (2, 12), text: `/* line b */`
    capture: 0 - cap, start: (3, 0), end: (3, 9), text: `// line 2`
    capture: 0 - cap, start: (4, 0), end: (4, 12), text: `/* line c */`
    capture: 0 - cap, start: (5, 0), end: (5, 9), text: `// line 3`

the nodes would have been correctly grouped together.

@sudormrfbin
Copy link
Member Author

sudormrfbin commented Feb 2, 2022

Also it seems like

[
  (line_comment)
]+ @comment

works perfectly fine.

@EpocSquadron
Copy link
Contributor

EpocSquadron commented Feb 2, 2022

@sudormrfbin I took a quick look and didn't see any issues related to this in the tree-sitter repo. I'm on my phone and about to start my day, can you take the lead on opening an issue there? We could also ping max directly here but it seems like it would be a good one for an issue.

Edit: Don't mean to be bossy, I'll handle it you can't get to it.

@EpocSquadron
Copy link
Contributor

Another place this might be useful is in a hypothetical implementation of parameter.around where the entire parameter list is captured.

@archseer
Copy link
Member

archseer commented Feb 3, 2022

\cc @the-mikedavis regarding the comment queries

@the-mikedavis
Copy link
Member

Hmm those query results are stange 🤔

In the grammar line comments are parsed with regular grammar DSL rules but the block comments are done through an external scanner. Maybe there's a difference there with how the queries treat external rules? I know that there are some optimizations so that captures can be emitted eagerly (see here) but I'm not sure that logic is relevant to this case.

It's probably worth opening an issue upstream, that seems like a good reproduction case.

@sudormrfbin
Copy link
Member Author

I have opened an upstream issue: tree-sitter/tree-sitter#1639

Treesitter captures can contain multiple nodes like so:

```
(line_comment)+ @comment
```

This would match each line in a comment as a separate
`@comment` capture when what we actually want is the
whole set of contiguous `line_comment` nodes to be
captured under the `@comment` capture. This commit enables
this behaviour.
@sudormrfbin sudormrfbin force-pushed the tree-sitter-capture-multiple-nodes branch from 76081db to 645ab4f Compare February 26, 2022 14:53
@sudormrfbin sudormrfbin requested a review from archseer February 28, 2022 19:05
@archseer archseer merged commit e83cdf3 into helix-editor:master Mar 1, 2022
@sudormrfbin sudormrfbin deleted the tree-sitter-capture-multiple-nodes branch March 1, 2022 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants