-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transition to nucleo for fuzzy matching #7814
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
I added the git dependency, I also added some further improvements upstream to nucleo (even further performance improvements and improving matching quality in a few cases). I also found/fixed the regression I had accidently introduced to the filename completer. i want to improve some more thing upstream in nulceo in the future but those shouldn't significantly affect helix (or usage) so I think this PR is ready for testing/review now. It would be nice if people could give this a spin so I can weed out any remaining upstream issues (if there are any) |
getting a really weird panic when searching in my dev/home directory which has a bunch of files and then deleting the search, ascinnema attached it's not been reproducible but I wanted to report it just incase and I'll dig a bit more into it later tonight (all the panics are the same though: full backtrace
|
Also does this fix #7645 as well? |
Thanks! I probably missed this earlier because I was running release builds (where this silently wraps which shoulst really cause too many problems at that particular subtraction). I will try to reproduce this. Might need to add a fuzzer for the high level API tough since that's probably hard to reproduce 😅 I will see if I can find a quick fix this tomorrow morining, I will be travelling the rest of the weekend so it make take a while if I don't find it quickly |
Yeah I forgot about that 👍 |
I found and fixed the bug, it was a pretty easy to fix oversight |
Let's get a couple days of testing in before this gets merged, I'm going to go ahead and test run this branch |
The file list has a fair bit of shuffling around as more files are scanned, it seems like the new files are added at the top of the list then sorted. So the files with the top score move slightly downwards, then are moved back up the list. |
Is there also some kind of total limit? I can't see a file tree if I open it in |
Hmmno there is no hard limit but the nix store may be hidden by some filter? I have tried with my entire root directory (which has about 3 million files). We do some filters around symlinks so it may be related to that I guess or you may have an ignore file somewhere? |
Hmm I noticed that too this moringing. But only in debug mode weirdly enough not in release mode but I did want to investigate that futrher. Did you run in debug mode or did you also see this in release mode? I actually was pretty careful to ensure this doesn't happen so I am bit surprised where this is coming from. I am travelling this weekend so I didn't have time ti investigate further, will take a closer look next week. |
I'm always building in release mode but I needed to execute on a very large tree so that the file list would take some time to load. I searched for |
Ah right, I think |
Hmm I didn't really rmmeber having thay issue when I tried it (streaming about 3 million files from my root directory, I searched fro changelog so there are quite a few matches streaming in). I will try to test on my laptop later. Maybe I can find a quick fix. Nucleo should always resort if there are new matches (in fact I have a dedicated flag in the matcher that controls where the match snapshot will be updated and that is only set to true if sorting happened) so it's hopefully just a small oversight |
i was getting this as well, it was easier to see when I was in my home or root directory and I turned off not showing hidden files ( |
Could this be because the files all end up with the same score and the sort isn't using a stable sort? |
While I am using an unstable (parallel) sort that shouldn't theoretically cause these problems because I am sorting by I suspect that the same items endsup in the list of matches (which is then sorted) twice somehow during streaming. |
ah seems like I found a bit of a memory leak after leaving open the file picker in the root directory. Steps look something like this
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work as always 🙂
Co-authored-by: Skyler Hawthorne <[email protected]>
d77029c
I have noticed that the RAM consumption is pretty high. Opening the file picker on my home directory takes about 1GB of RAM. When it's closed the memory is not freed, but when you reopen the picker, it rescans everything and takes another extra GB. @pascalkuthe |
Can you share how you made these measurements? If you're looking at |
I'm using bottom. It seems to be in use. I'm on NixOS so the binary name is hx-wrapped. |
Hmm the RAM usage does seem to be a lot higher for me now (a couple hundred megabytes) and I can replicate this type of memory usage growth |
I still think bottom/top don't report this properly. Looking at |
This is might be fixed by #7891 can you try changing your toolchain to 1.68 and retrying |
That's only scoped to /proc/, I tested by opening under /nix/store |
I can reproduce this and I found the reason too. In helix we keep the last picker around so that we can reopen it with For the file picker it might make more sense to recrawl the FS I guess and just save the current query and position. That will be a bit of a pain to implement tough and I am not quite sure if its the right approach and would require larger changes. I will add a workaround for now that immediately disables any old We could also add special case to not save pickers with more than 50k elements in last_picker as a temporary workaround. The good news is there is no actual memory leak in nucleo or the picker or any new bug introduced in this PR. It's just more noticable now since we capped the number of files to 10k in the past. |
not that @gabydd is also right here that a bug in the standard library has been fixed in 1.68 can also cause the crawler to become stuck which would lead to an actual memory leak. That is fixed just by compiling with a newer rustc version (we can probably raise the MSRV soon anyway) |
#8127 should fix this for the most part |
* transition to nucleo for fuzzy matching * drop flakey test case since the picker streams in results now any test that relies on the picker containing results is potentially flakely * use crates.io version of nucleo * Fix typo in commands.rs Co-authored-by: Skyler Hawthorne <[email protected]> --------- Co-authored-by: Skyler Hawthorne <[email protected]>
* transition to nucleo for fuzzy matching * drop flakey test case since the picker streams in results now any test that relies on the picker containing results is potentially flakely * use crates.io version of nucleo * Fix typo in commands.rs Co-authored-by: Skyler Hawthorne <[email protected]> --------- Co-authored-by: Skyler Hawthorne <[email protected]>
* transition to nucleo for fuzzy matching * drop flakey test case since the picker streams in results now any test that relies on the picker containing results is potentially flakely * use crates.io version of nucleo * Fix typo in commands.rs Co-authored-by: Skyler Hawthorne <[email protected]> --------- Co-authored-by: Skyler Hawthorne <[email protected]>
Closes #1707
Closes #1987
Closes #1543
Closes #5871
Closes #7645
Closes #7652
NOTE This PR is currently still experimental I want to tweak the fuzzy matching scoring calculations in nucleo a bit more, add more tests, cleanup the nucleo codebase. There might also be some issues with the completersThis PR fully replaces the skim matcher with used for all fuzzy matching in helix currently (completions, picker, ..) with
nucleo
. I have written nucleo to be as performant as possible (heavily referenced fzf). The matching algorithm itself can often be 8 times faster than skim.https://github.com/helix-editor/nucleo
Furthermore,
nucleo
offers a high-level API that essentially replaces the entire picker logic in helix. Compared to the current helixnucleo
s' high-level API comes with some nice features:I think the following demonstrates the practical use case well:
I am able to fuzzy match to the helix changelog very quickly.. From my root directory:
https://asciinema.org/a/MnRf8JGPFll1jw1GpHwNGGCXU
On the current helix version I have to wait for all (3 million) files to be crawled (with the editor frozen in the meantime) and matching is slowed to a crawl afterwards.
Just for completeness another capture of actually waiting for all files to stream in and fuzzy matching on the result (still just as fast/no noticeable lag)
https://asciinema.org/a/v1RZmq5wV80hf2BWSM1RUqaod
Nulceo is faster than fzf (see the comparison in the README) while yielding the same results (and even better results in some cases)