Sorting search results #21

corbanbrook · 2014-03-10T15:50:36Z

Multiple schemes can be employed to achieve results which are most relevant to the user. Predicting which file a user wants out of a long list of possible matches and presenting it first can help speed up development time/maintain flow and train of thought. Here are some ideas to discuss:

Sort least fuzzy results to the top. Fuzziness determined by search term run length in the result. ie. Search term of 'src' would score a higher fuzziness score on a file like .jshintrc than a file called app_src.js or files in the src/ directory. run length of first file is 2 while the second file has a run length of 3.
Filename match or directory match. A match on a filename should be sorted at a higher priority than a match on a full path, but what about matches which higher run length on a full path vs less run length on a filename. ie. search of 'src ap' currently displays images/destroy_discard.png higher than src/app.coffee. Although destroy_discard technically has more matched character within the filename than src/app.coffee src/app has more run length. (something to think about)
Dot files. Hidden files are less important than non hidden of equal score.
Ignored files (.gitignore) can be sorted at lower priority than files with equal score. ie. I want the index.html in my templates/ dir. not my build/ dir because this is in .gitignore. Editing temporary file by mistake can add confusion and lost work.
Sort files with most recent last modified date at a higher priority than those of equal score.
Filter out/sort to bottom files that are already open.

hkdobrev · 2014-03-10T16:03:46Z

@corbanbrook AFAIK this issue should be filed against the fuzzaldrin library which is used by fuzzy-finder to filter and score results.

corbanbrook · 2014-03-10T17:15:42Z

fuzzaldrin simply does scoring and sorting of arrays of strings or objects. Some of my above recommendations might be outside the scope of the project.

One solution would be for the fuzzaldrin to add option for custom filter/sorting callbacks.
Another solution would be to simply use fuzzaldrin to provide the initial score to use in further sorting schemes within this project like dot file priority, ignored file priority, and last modified priority.

jamby · 2014-03-12T11:38:40Z

Would also be nice if the fuzzy finder could have the results filtered in terms of importance for type of project. For instance, a Ruby on Rails project, if I start typing a model's name, have the first result usually be '/app/models/model_name.rb', instead of having the first result be 'spec/models/model_name_spec.rb'.

Most times I want to deal with the model, not the spec.

miletbaker · 2014-04-10T14:18:28Z

It would be nice as well to have more recency to the find logic, although it will mainly (it does seem intermittent especially of you switch to another app and back again) suggest the last file accessed to allow quick switching between files, it would be good if it always gave precedence on the file based on last access allowing to easily work between several files.

A good example of where this works well is Textmate's implementation of cmd-t find file. The sorting there works well.

dmnd · 2014-06-20T22:33:38Z

Here's an example where the order isn't great. The second result is what I want, and it's a much closer match, so I don't know why it's second.

dmnd · 2014-06-24T00:50:28Z

Even worse:

I wanted the last result in this instance.

(I hope these examples are useful, apologies if they're noise)

dmnd · 2014-07-12T00:05:34Z

Another one

adammw · 2014-07-16T06:37:32Z

Coming from ST3, the fuzzy matcher really drives me crazy that it lists the specs before the actual controllers I want.

Is there any config which changes how the fuzzy finder works, or do we need to improve the underlying fuzzy finding library to improve the searching?

lewispb · 2014-11-17T10:56:44Z

+1 for this

davepeck · 2014-11-25T22:19:26Z

I decided to play with Atom for the first time this weekend; I immediately found myself frustrated with the strange fuzzy ordering in Atom's select list views.

If we're going to improve fuzzy matching in Atom, there are lots of things to consider:

The "right" ordering is fundamentally subjective. It's clear from github issues and Atom forums that lots of people would like to see improvement, but it's equally clear that we won't ever fully agree on what's "best." At the very least, changes to fuzzaldrin should continue to respect the current scoring tests; these represent the only codified community judgment we have so far. We'll probably want to augment these tests with examples from real-world projects, too.
The "right" ordering is probably context dependent. There's a hint of this in fuzzaldrin's filter method, which takes the strange queryHasSlashes parameter and invokes the specialized scorer.basenameScore depending. I'd expect any update to filter (a) will need to be parameterizable by the caller — for example, to indicate separators, weights, etc. and (b) will need sensible defaults so it can be invoked without more than the needle and haystack. As an example, we might want path separators to have importance when invoking filter with a list of file names, but we might want the colon-space (:) to have importance when invoking filter from the command palette.
There's great prior art to learn from. TextMate's ranking algorithm is highly regarded, although at first glance I find the implementation hard to grok. (It seems to have a dynamic programming component in matrix but lacks essentially any useful comments.) Command-T also has a well-liked algorithm. Gary Bernhardt's selecta ranking algorithm was based on some interesting discussion that considered this prior art.
Especially in fuzzy-finder's case, there's a lot of metadata we can and probably should use to improve ranking. The venerable PeepOpen ranking algorithm takes into account file modification times, last opened, git status, etc. Probably this more sophisticated ranking belongs strictly in fuzzy-finder, as a new "meta scoring" layer; fuzzaldrin should continue to just be about ranking a needle in a haystack of strings.
My smartscore branch tries to codify some basic intuitions about what makes a match "better". These include: touching the "starts of words" counts for more; some separators are worth more than others (in file contexts, '/' is probably worth more than '-' or ' '); on the whole, we should prefer fewer contiguous runs of longer length; full word matches along the way are always preferable; etc.
Performance is a consideration. Right now every call to filter starts fresh. But it seems to me that (a) it may prove desirable to pre-process each string in the haystack before ever invoking filter, and (b) if the user is simply appending characters to the query string, it might (?) be possible to iteratively re-score the results.

Alright — hopefully this is useful/interesting to someone. I plan to slowly work on improvements to both fuzzaldrin and fuzzy-finder in my personal branches. Suggestions and feedback are most welcome!

(For fun, I started by replacing fuzzaldrin's current score method with a coffeescript re-implementation of TextMate 2's ranking algorithm; it works and, after a minor tweak, passes all fuzzaldrin tests.)

nj · 2015-04-14T13:57:03Z

👍 as the current solution is rather useless - and can even be faster to find the file manually

matugm · 2015-04-24T21:12:25Z

👍 Would be great if we could get some progress on this.

walles · 2015-07-10T12:57:46Z

Improved sorting / scoring:
atom/fuzzaldrin#22

The above pull request addresses at least some of the issues raised here.

ghost · 2015-07-24T13:38:15Z

Since I'm working on a lot of Rails projects with ActiveAdmin, I'm often annoyed when I end up in an ActiveAdmin file for a particular resource instead of a model file.

I was thinking about improving this by sorting the fuzzy-finder results by usage. I.e. if a files in some folder are worked on more often, they are ranked higher.

I'm happy to implement this experimentally and make a pull request if other people approve of this idea also.

Soleone · 2015-07-30T17:35:44Z

👍 for some improvements that make finding commonly used files easier. sublime seemed to have done a better job putting the file i actually want to open at the top (using rails here as well)

dmnd · 2015-09-08T18:39:16Z

Just in case further examples are helpful:

kevinsimper · 2015-12-02T22:28:49Z

@jeancroy Did #22 solve this issue?

jeancroy · 2015-12-02T22:32:54Z

There's now an "use Alternate Scoring" option in fuzzy finder that use it.
It address many issue about the search by file name / path.

But it does not cover any knowledge about the file themselves, such as preference for recent / frequent / certain files.

r-owen · 2015-12-03T18:51:35Z

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a huge improvement, though still not as good as Sublime Text. I have a project with a huge number of files, including Doxygen generated html files that I rarely want to look at. I tried to find a file named "matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right file as the first suggestion. In Atom Beta it is the ninth choice, preceded by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix, prefer names that match that suffix exactly over names that use that suffix as a prefix.

Another thing that might help (though I really hope it won't come to this, and it's not needed by Sublime) is to allow the user to disable directory patterns. In my case I might eliminate searches of Doxygen-generated html files and would definitely elimiate .os files (why in the world is it showing binary libraries?).

jeancroy · 2015-12-03T18:56:28Z

Ok please open an issue on fuzzaldrin-plus I can give it a look. I'd need
the result that come before to understand why they are preferred. Also full
path is useful, if private, a mock-up with same length and directory depth.

On Thu, Dec 3, 2015, 13:51 Russell Owen [email protected] wrote:

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a
huge improvement, though still not as good as Sublime Text. I have a
project with a huge number of files, including Doxygen generated html files
that I rarely want to look at. I tried to find a file named
"matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right
file as the first suggestion. In Atom Beta it is the ninth choice, preceded
by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix,
prefer names that match that suffix exactly over names that use that suffix
as a prefix.

Another thing that might help (though I really hope it won't come to this,
and it's not needed by Sublime) is to allow the user to disable directory
patterns. In my case I might eliminate searches of Doxygen-generated html
files and would definitely elimiate .os files (why in the world is it
showing binary libraries?).

—
Reply to this email directly or view it on GitHub
#21 (comment).

r-owen · 2015-12-04T00:29:49Z

I just submitted this issue. I hope it helps.

jeancroy/fuzz-aldrin-plus#12

Thank you very much for trying to improve Atom’s fuzzy search.

— Russell

On Dec 3, 2015, at 10:56 AM, Jean Christophe Roy [email protected] wrote:

Ok please open an issue on fuzzaldrin-plus I can give it a look. I'd need
the result that come before to understand why they are preferred. Also full
path is useful, if private, a mock-up with same length and directory depth.

On Thu, Dec 3, 2015, 13:51 Russell Owen [email protected] wrote:

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a
huge improvement, though still not as good as Sublime Text. I have a
project with a huge number of files, including Doxygen generated html files
that I rarely want to look at. I tried to find a file named
"matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right
file as the first suggestion. In Atom Beta it is the ninth choice, preceded
by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix,
prefer names that match that suffix exactly over names that use that suffix
as a prefix.

Another thing that might help (though I really hope it won't come to this,
and it's not needed by Sublime) is to allow the user to disable directory
patterns. In my case I might eliminate searches of Doxygen-generated html
files and would definitely elimiate .os files (why in the world is it
showing binary libraries?).

—
Reply to this email directly or view it on GitHub
#21 (comment).

—
Reply to this email directly or view it on GitHub.

tnrich · 2016-05-04T00:22:38Z

Does anyone know if there is an equivalent issue open discussing the Cmd-Shift-P search algorithm?

jeancroy · 2016-05-04T01:08:13Z

you're speaking of command palette? Should already be integrated. If you
have a problem you can try openings a issue on fuzzaldrin-plus repo

On Tue, May 3, 2016, 20:22 Thomas Rich [email protected] wrote:

Does anyone know if there is an equivalent issue open discussing the
Cmd-Shift-P search algorithm?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#21 (comment)

adamreisnz · 2017-05-17T01:13:55Z

This is a pretty ancient issue, so I have little hope of improvement arriving any time soon, but here's my two cents:

Two things wrong with the way the fuzzy finder currently works both illustrated with the above example.

Sort order (as already pointed out in this issue): based on my input, I would really expect the file member/edit/edit.html to show up on top. It does for some reason when I remove the l from html:

But with the full html it suddenly drops to second place which gets rather infuriating after having opened the wrong file several times.

So scoring should somehow take into account how close the search terms are to each other in the filename, and prioritize edit.html over edit-payment.html unless I include payment in my search query.

It should really prioritize full word matches rather than scattered letters. If you look at the above example, it actually matches member because it's in the app/components/admin/member folder, instead of simply matching to the member part of the path, because that's a whole matching word.

These two tweaks would make the search algorithm a lot stronger.

jeancroy · 2017-05-17T02:51:45Z

hi @adamreisnz , as a curiosity, is this happening with alternate scoring turned on ? There was a strong preference for word "togetherness" in that version.

From screenshot I'm guessing it's not, but if it is I'll add a few of those to test benchmark.
Previous algorithm would take first occurrence of m then first e then first m instead of waiting and trying for member

adamreisnz · 2017-05-17T02:59:10Z

@jeancroy thanks for looking into it, but yes, it's in fact enabled:

The version I'm using is 1.18.0-dev-f4a83b238

jeancroy · 2017-05-17T03:01:21Z

Another possibility is that alternate score is used for ranking while classic is used for highlighting.
The whole component below fuzzy finder has been rewritten recently. If that's the case the whole scattered letter is a false trail.

One feature of the new one is a bias toward file name (vs whole path) when we match file extension exactly I think you are batling against that when you are using keyword from the path but end with extensions

To sum up your request, you want the htm behavior to happens even in html case ? I'm not sure what the algorithm does because of how scambled the higligth is.

adamreisnz · 2017-05-17T03:04:55Z

Well, my use case as you might deduce from my example is that in a large project, there will be many components. Each component might have an edit sub component as in the example, and each of those components will have edit.html template, and edit.js module, and perhaps edit.ctrl.js controller.

So the way I tend to quickly open the file I want, is by specifying the parent component member, then the sub component edit and then extension if I know there's going to be more than one file.

This usually works fine, but in the above case it was messing it up due to the existence of another similar file in the same path (edit-payment.html).

I think my use case is fairly common, so I wouldn't expect to be "battling" against the fuzzy finder's system with it.

edit.html should still be preferred over edit-payment.html if you search for "edit html" imo, on account of it being the shorter and closer match.

jeancroy · 2017-05-17T05:44:15Z

You're right on all account, in this case it seems the algorithm just like the m of payment.
I guess the m of html manage to count twice, I'll see how to fix that.

Good news is that the issue is more constrained than say lack of prioritizing "full word matches". (Here - count as a word boundary)

I'll open a different issue for highlight regression it should group member appropriately

adamreisnz · 2017-05-17T08:45:03Z

Yeah that looks better in your screenshot, highlighting member properly. And interesting that it likes the m in payment and paykent is put at the bottom properly. Looks like it's just a few tweaks needed to fix those issues then 👍

adamreisnz · 2017-06-09T23:16:52Z

Looks like in the latest version (just built Atom from master yesterday) there's still some scoring issues. For example this result:

It should not prioritise cards/club-details.js over cards/details.js for the same reason as above, where it shouldn't prioritise the edit-payment.html file. cards/details.js is a closer match, because it has fewer non-matching characters between the matches.

I did not type a c character and it already matched card, so it's a bit baffling why it tries to mark the c of club and give that result a higher score than the more sensible result below that.

Note that when I type cards it does prioritise correctly (but still marks the c in the second result):

I think once a search term has been used/matched in the path, it should not try to match it again for another part of the path. In addition, results with the least amount of non-matching characters between the matches should probably score highest.

adamreisnz · 2017-08-04T02:16:11Z

Another example in Atom 1.20 dev where prioritisation is not what one would expect;

adamreisnz · 2017-10-21T19:54:40Z

Guys, any activity on this issue please? It's infuriating to keep opening the wrong files because the fuzzy finder sorting logic is off.

VSCode manages to do it correctly, why not Atom? Perhaps it would be worthwhile looking at their algorithm.

winstliu · 2017-10-21T23:00:28Z

@adamreisnz looks like this was fixed a month ago by @jeancroy but we're running an outdated version of fuzzaldrin-plus. Will create a PR.

corbanbrook mentioned this issue Mar 10, 2014

Improve the fuzzy results atom/fuzzaldrin#2

Closed

kevinsawicki added the enhancement label Apr 11, 2014

izuzak mentioned this issue Jun 26, 2014

Fuzzy search doesn't remember the last choice made #48

Open

thomasjo mentioned this issue Mar 28, 2015

File Search Result Order Is Unintuitive atom/atom#6148

Closed

winstliu mentioned this issue Apr 21, 2015

Command palette search results are poor atom/atom#6461

Closed

jeancroy mentioned this issue Sep 18, 2015

Rewrite scoring algorithm to support run of consecutive character, fix acronyms and add optimal selection of character. atom/fuzzaldrin#22

Closed

r-owen mentioned this issue Dec 3, 2015

Scoring of initialisms is much worse than sublime, emacs #57

Open

jeancroy mentioned this issue May 22, 2017

Regression in highlight with alternate scoring. #296

Closed

winstliu mentioned this issue Oct 21, 2017

Update fuzzaldrin-plus to 0.6.0 #328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sorting search results #21

Sorting search results #21

corbanbrook commented Mar 10, 2014

hkdobrev commented Mar 10, 2014

corbanbrook commented Mar 10, 2014

jamby commented Mar 12, 2014

miletbaker commented Apr 10, 2014

dmnd commented Jun 20, 2014

dmnd commented Jun 24, 2014

dmnd commented Jul 12, 2014

adammw commented Jul 16, 2014

lewispb commented Nov 17, 2014

davepeck commented Nov 25, 2014

nj commented Apr 14, 2015

matugm commented Apr 24, 2015

walles commented Jul 10, 2015

ghost commented Jul 24, 2015

Soleone commented Jul 30, 2015

dmnd commented Sep 8, 2015

kevinsimper commented Dec 2, 2015

jeancroy commented Dec 2, 2015

r-owen commented Dec 3, 2015

jeancroy commented Dec 3, 2015

r-owen commented Dec 4, 2015

tnrich commented May 4, 2016

jeancroy commented May 4, 2016

adamreisnz commented May 17, 2017

jeancroy commented May 17, 2017 •

edited

Loading

adamreisnz commented May 17, 2017 •

edited

Loading

jeancroy commented May 17, 2017 •

edited

Loading

adamreisnz commented May 17, 2017 •

edited

Loading

jeancroy commented May 17, 2017 •

edited

Loading

adamreisnz commented May 17, 2017

adamreisnz commented Jun 9, 2017 •

edited

Loading

adamreisnz commented Aug 4, 2017 •

edited

Loading

adamreisnz commented Oct 21, 2017

winstliu commented Oct 21, 2017

Sorting search results #21

Sorting search results #21

Comments

corbanbrook commented Mar 10, 2014

hkdobrev commented Mar 10, 2014

corbanbrook commented Mar 10, 2014

jamby commented Mar 12, 2014

miletbaker commented Apr 10, 2014

dmnd commented Jun 20, 2014

dmnd commented Jun 24, 2014

dmnd commented Jul 12, 2014

adammw commented Jul 16, 2014

lewispb commented Nov 17, 2014

davepeck commented Nov 25, 2014

nj commented Apr 14, 2015

matugm commented Apr 24, 2015

walles commented Jul 10, 2015

ghost commented Jul 24, 2015

Soleone commented Jul 30, 2015

dmnd commented Sep 8, 2015

kevinsimper commented Dec 2, 2015

jeancroy commented Dec 2, 2015

r-owen commented Dec 3, 2015

jeancroy commented Dec 3, 2015

r-owen commented Dec 4, 2015

tnrich commented May 4, 2016

jeancroy commented May 4, 2016

adamreisnz commented May 17, 2017

jeancroy commented May 17, 2017 • edited Loading

adamreisnz commented May 17, 2017 • edited Loading

jeancroy commented May 17, 2017 • edited Loading

adamreisnz commented May 17, 2017 • edited Loading

jeancroy commented May 17, 2017 • edited Loading

adamreisnz commented May 17, 2017

adamreisnz commented Jun 9, 2017 • edited Loading

adamreisnz commented Aug 4, 2017 • edited Loading

adamreisnz commented Oct 21, 2017

winstliu commented Oct 21, 2017

jeancroy commented May 17, 2017 •

edited

Loading

adamreisnz commented May 17, 2017 •

edited

Loading

jeancroy commented May 17, 2017 •

edited

Loading

adamreisnz commented May 17, 2017 •

edited

Loading

jeancroy commented May 17, 2017 •

edited

Loading

adamreisnz commented Jun 9, 2017 •

edited

Loading

adamreisnz commented Aug 4, 2017 •

edited

Loading