Fix scoring to favor exact substring matches #15

brandonwamboldt · 2015-04-25T01:55:51Z

So I made pretty radical changes to scoring and don't expect this to be accepted without tweaks, but I figured it would be a good way to get momentum on the problem.

We have an issue with exact substring matches being scored lower than non substring matches. My go to example is:

Find And Replace: Select All
Settings View: Install Packages And Themes

That's the sorted order when searching for "Install", clearly not what you'd expect. This changes the scoring system to heavily weight substring matches, so if you type a word exactly as it appears in the string, it will be weighted highly (nearly as a high as an exact match).

Should address atom/atom#6461 and atom/command-palette#28.

brandonwamboldt · 2015-04-25T02:08:40Z

FYI I force pushed a better version just now. My old way didn't work consistently, now I add a value to the finalized score for substring matches (like we do for basename matches), which yields more accurate results.

winstliu · 2015-04-25T02:09:46Z

spec/filter-spec.coffee

@@ -112,6 +112,9 @@ describe "filtering", ->
    expect(bestMatch(['z_a_b', 'a_b'], 'ab')).toBe 'a_b'
    expect(bestMatch(['a_b_c', 'c_a_b'], 'ab')).toBe 'a_b_c'

+  it "weights matches that are substring matches higher", ->


weighs 🎨

I dunno, I think this actually makes more sense with weights

brandonwamboldt · 2015-04-25T02:13:19Z

Also what's the etiquette on corrections? I just added a second commit to be safe, as if the person comments on a commit and not the PR diff, comments are lost when you force push, but you also don't want to merge in a PR full of unnecessary commits.

Just curious what you guys prefer? Or do you just squash & merge manually?

winstliu · 2015-04-25T02:18:16Z

@brandonwamboldt In most cases, the Atom team doesn't care too much about how many commits are in a PR when merging. A few good examples of this are the recent pane-resize PR (24 commits) and the gutter API PR (53 commits). Your choice in the end though :).

@mnquintana Generally it seems like weighs is to measure something while weights is to hold something down, so I'd lean towards weighs here.

brandonwamboldt · 2015-04-25T02:21:48Z

Roger that, thanks.

Also, to explain why I added 1 to the weight, which is 1 or lower at this point in the code, is that I think substring matches should always be ranked higher than non substring matches, and adding 1 is the only way to accomplish that. Now, substring matches are ranged from 1.0 to 2.0, and non substring matches are ranged from 0.0 to 1.0.

However, I'm also adding a value if:

The substring match is at the beginning of the string (highest bonus)
The substring match is right after a path separator (second highest)

mnquintana · 2015-04-25T02:22:44Z

@50Wliu: Generally it seems like weighs is to measure something while weights is to hold something down, so I'd lean towards weighs here.

From the Apple dictionary:

weight - verb [ with obj. ]
2) attach importance or value to

I've definitely seen weight used in this way to refer to search rankings before.

thomasjo · 2015-04-25T08:18:17Z

It is definitely correct to use weight (and hence weights) in this context.

mnquintana · 2015-05-06T13:40:24Z

/cc @atom/feedback

lee-dohm · 2015-05-06T13:53:08Z

spec/filter-spec.coffee

@@ -112,6 +112,9 @@ describe "filtering", ->
    expect(bestMatch(['z_a_b', 'a_b'], 'ab')).toBe 'a_b'
    expect(bestMatch(['a_b_c', 'c_a_b'], 'ab')).toBe 'a_b_c'

+  it "weighs matches that are substring matches higher", ->
+    expect(bestMatch(['/a/b/c/install.txt', 'inst-all.txt'])).toBe '/a/b/c/install.txt'


All the other examples pass in an array of candidates and then a search string. All you're passing in here are candidates?

walles · 2015-07-06T10:51:51Z

Could you add a test case for your real-world example that you mention in the initial comment? With "Settings View: Install Packages And Themes" being at place ten when searching for "install". Preferably with all intermediate lines in it as well.

That's a good way to know that the patch addresses the right problem.

walles · 2015-07-09T17:45:59Z

Given that PR #22 has test cases for some nice things that this PR does not, I'd prefer #22 being merged over this one.

Or that this PR is updated with the tests for git push and installfrom #22.

Fix scoring to favor exact substring matches

21e1861

brandonwamboldt force-pushed the fix-scoring-1 branch from c91837a to 21e1861 Compare April 25, 2015 02:07

winstliu reviewed Apr 25, 2015
View reviewed changes

Fix comment

eca4bb4

winstliu added the needs-review label Apr 25, 2015

lee-dohm reviewed May 6, 2015
View reviewed changes

kevinsawicki added requires-changes and removed needs-review labels Jun 8, 2015

This was referenced Jul 6, 2015

Wrongly ordered candidates with exact file name match #18

Open

attempt to improve search pertinence #19

Closed

thomasjo mentioned this pull request Jul 8, 2015

Fuzzy file finder should prefer direct match atom/atom#7783

Closed

brandonwamboldt closed this Aug 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scoring to favor exact substring matches #15

Fix scoring to favor exact substring matches #15

brandonwamboldt commented Apr 25, 2015

brandonwamboldt commented Apr 25, 2015

winstliu Apr 25, 2015

brandonwamboldt Apr 25, 2015

mnquintana Apr 25, 2015

brandonwamboldt commented Apr 25, 2015

winstliu commented Apr 25, 2015

brandonwamboldt commented Apr 25, 2015

mnquintana commented Apr 25, 2015

thomasjo commented Apr 25, 2015

mnquintana commented May 6, 2015

lee-dohm May 6, 2015

walles commented Jul 6, 2015

walles commented Jul 9, 2015

Fix scoring to favor exact substring matches #15

Fix scoring to favor exact substring matches #15

Conversation

brandonwamboldt commented Apr 25, 2015

brandonwamboldt commented Apr 25, 2015

winstliu Apr 25, 2015

Choose a reason for hiding this comment

brandonwamboldt Apr 25, 2015

Choose a reason for hiding this comment

mnquintana Apr 25, 2015

Choose a reason for hiding this comment

brandonwamboldt commented Apr 25, 2015

winstliu commented Apr 25, 2015

brandonwamboldt commented Apr 25, 2015

mnquintana commented Apr 25, 2015

thomasjo commented Apr 25, 2015

mnquintana commented May 6, 2015

lee-dohm May 6, 2015

Choose a reason for hiding this comment

walles commented Jul 6, 2015

walles commented Jul 9, 2015