Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix initial highlight layer sort order #5196

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

the-mikedavis
Copy link
Member

The purpose of the RefCell in this change is to remove the mutable self borrow on HighlightIterLayer::sort_key so that we can sort layers with the correct ordering using the Vec::sort function family. HighlightIterLayer::sort_key needs &mut self since it calls Peekable::peek which needs &mut self. Vec::sort functions only give immutable borrows of the elements to ensure the correctness of the sort.

I thought we could use an eager Peekable but it won't work in practice...

I tried an eager version of Peekable (reads the peeked element eagerly on new and next):

struct EagerPeekable<I: Iterator> {
    iter: I,
    peeked: Option<I::Item>,
}

impl<I: Iterator> EagerPeekable<I> {
    fn new(mut iter: I) -> Self {
        let peeked = iter.next();
        Self { iter, peeked }
    }

    fn peek(&self) -> Option<&I::Item> {
        self.peeked.as_ref()
    }
}

impl<I: Iterator> Iterator for EagerPeekable<I> {
    type Item = I::Item;

    fn next(&mut self) -> Option<Self::Item> {
        std::mem::replace(&mut self.peeked, self.iter.next())
    }
}

This would be a cleaner solution (notice how EagerPeekable::peek takes &self rather than &mut self), however this doesn't work in practice because the Items emitted by the tree_sitter::QueryCaptures Iterator must be consumed before the next Item is returned. Iterator::next on tree_sitter::QueryCaptures modifies the QueryMatch returned by the last call of next. This behavior is not currently reflected in the lifetimes/structure of the QueryCaptures type: tree-sitter/tree-sitter#608.


Previously the layers were initially sorted by the first range in the layer and then the layer depth descending as an approximation of HighlightIterLayer::sort_key (see the old "HAXXX" code). This could cause disappearing highlights when using combined injections:

Before After
before after

The purpose of this change is to remove the mutable self borrow on
`HighlightIterLayer::sort_key` so that we can sort layers with the
correct ordering using the `Vec::sort` function family.
`HighlightIterLayer::sort_key` needs `&mut self` since it calls
`Peekable::peek` which needs `&mut self`. `Vec::sort` functions
only give immutable borrows of the elements to ensure the
correctness of the sort.

We could instead approach this by creating an eager Peekable and using
that instead of `std::iter::Peekable` to wrap `QueryCaptures`:

```rust
struct EagerPeekable<I: Iterator> {
    iter: I,
    peeked: Option<I::Item>,
}

impl<I: Iterator> EagerPeekable<I> {
    fn new(mut iter: I) -> Self {
        let peeked = iter.next();
        Self { iter, peeked }
    }

    fn peek(&self) -> Option<&I::Item> {
        self.peeked.as_ref()
    }
}

impl<I: Iterator> Iterator for EagerPeekable<I> {
    type Item = I::Item;

    fn next(&mut self) -> Option<Self::Item> {
        std::mem::replace(&mut self.peeked, self.iter.next())
    }
}
```

This would be a cleaner approach (notice how `EagerPeekable::peek`
takes `&self` rather than `&mut self`), however this doesn't work in
practice because the Items emitted by the `tree_sitter::QueryCaptures`
Iterator must be consumed before the next Item is returned.
`Iterator::next` on `tree_sitter::QueryCaptures` modifies the
`QueryMatch` returned by the last call of `next`. This behavior is
not currently reflected in the lifetimes/structure of `QueryCaptures`.

This fixes an issue with layers being out of order when using combined
injections since the old code only checked the first range in the
layer. Layers being out of order could cause missing highlights for
combined-injections content.
@the-mikedavis the-mikedavis added C-bug Category: This is a bug A-tree-sitter Area: Tree-sitter S-waiting-on-review Status: Awaiting review from a maintainer. labels Dec 17, 2022
@archseer archseer requested a review from pascalkuthe December 18, 2022 02:55
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow that is just horrific API there in treesitter rust.
The query iterator should have never been an iterator to begin with. In my opinion the behavior breaks the API contract for an Iterator or is atleast a really bad antipattern. instead they should have used a normal method so they can return the right lifetimes.
You would have loose some ergomoics (like using while let instead of for) but correctness is waz more important here. Maybe we should send a PR for that upstream_They are talking about Streaming Iterators/GATs in the issue but if you use a non trait function you do not need GAT and there is (and likely won't be anytime soon) no standard StreamingIterator trait anyways

Usually I am not a huge fan of RefCell and prefer Cell with Cell::take. However that wouldn't work in this case as QueryCaptures does not implement Default. You are able to use get_mut in the highlight Iterator anyway to it should have no noticeable performance impact.

Just a sidenote: I think it would be nice to document the sort order a bit more at some point.
Using the bool for sorting is a tiny bit subtle and I had to think for a second why we would want layers where a capture ends before those where a capture starts (altough it makes perfect sense so I might just be dense).

This LGTM and is nice fix that required very little code changes but probably took some time to find 👍 Unless @archseer has further concerns we can merge this.

@pascalkuthe
Copy link
Member

pascalkuthe commented Dec 20, 2022

While looking at TS in more detail again while reviewing rainbow brackets I noticed that this is not just an unclear API but straight up undefined behavior. They are modifying a backing buffer while handing out an immutable pointer to rust without any interior mutability that is always UB! If the compiler had moved the query matches into registers while you called next you would have gotten the correct results here. I will open an upstream PR to fix this and consider filing a rustsec advisory.

@pascalkuthe pascalkuthe added this to the next milestone Jan 6, 2023
@archseer archseer merged commit d5f17d3 into helix-editor:master Feb 1, 2023
@the-mikedavis the-mikedavis deleted the md-sort-key-immut-ref branch February 1, 2023 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tree-sitter Area: Tree-sitter C-bug Category: This is a bug S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants