Weak character matches given more priority than better acronym match #26

mdahamiwal · 2016-10-10T08:55:00Z

Hi @jeancroy, here is another scenario where I think acronym score is weak:
candidates:
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
sft/Tests/tft/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js

search query:
sft/gisp.js

results:
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js

expected:
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js

GssHelper.js doesn't even contain gisp in-order but still scored higher than better acronym matches.

The text was updated successfully, but these errors were encountered:

jeancroy · 2016-10-10T11:47:44Z

Hi, if you look at the code, there's the concept of "acronym prefrix".
Basically it's an heuristic that try to decide if you are looking for an acronym or consecutive letters.

And that heuristic require you to to put acronym to the start of the query.
Why the start ? Because that early there's no backtracking. (The cleanest way out of this I believe is multi-word support. That would be to try and segment the query into words and restart acronym prefix search on each words, however without backtracking there's no garanties those prefix wont overlap.)

So in query sft/gisp.js , sft/ is working against you.

Moreover, the path separator character (/ or \ depending on environement) is special sft/gisp.js is really interpreted as sft should be found in the immediate parent folder of gisp.js, and that's another reason sftRobotic ranks so well. I'll agree it's not documented or universally understood that way, so maybe I'll disable the behavior or make it optional.

For now i'll suggest using something like sgisp.js or stpgisp.js

prehaps another way to bias your way toward acronym is the uppercase-means-uppercase rule, while lowecase can match either case. It's not implemented but other fuzzy libraries are successful with it.

mdahamiwal · 2016-10-10T15:46:19Z

Aha! I see it now, thanks @jeancroy for explaining it in details.
I always thought a scoped acronym query sft/gisp.js is better that gisp.js but it turned out to be other way round.
The main concern is GssHelper.js is scored highest which in no way relates to query gisp and it becomes worse on highlighting which feels like a bug.

I understand that we don't want to backtrack so early especially to score acronym but how about a LCS b/w candidate's StartOfWord and query, something of this sort:
sft/Tests/Plugins/GulpImportStepPerformer.js -> s/T/P/GISP
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js -> s/T/t/R/T/t/d/s/GH

Having them, if we try LCS for query sft/gisp.js the first will be scored higher. Currently, the problem is we have a scoreAcronym routine that gives out ZERO for all three candidates in this case. It exhausts the query even before it hits the actual acronyms.

thoughts?

jeancroy · 2016-10-10T16:35:00Z

That may work.

However the computation of " is something an acronym ", camelCase, snake_case etc is actually super expensive. The trick I have found is to not compute it unless a[i] == b[j] which is infrequent.

In order to compare query with acronym form over lcs, we would require categorization of every character of the subject. (There might be some efficient way to store and reuse those, we'll have to think / benchmark )

jeancroy · 2016-10-10T20:45:59Z

I'm thinking along the following line

on this line
make accro_score an array instead of a constant.

so we sould have something like that after segmentation of query in words and then evaluation acronym of each words.

sft/gisp.js
00004444000

mdahamiwal · 2016-10-12T06:47:37Z

Agree on that, that will be way more cleaner and performant. Only concern I see is it make that code more unreadable where it is already a bit tough to follow and understand.

mdahamiwal · 2016-10-20T07:42:31Z

@jeancroy, any development for this issue? Thanks.

jeancroy · 2016-10-20T13:35:53Z

I'll put some time on making this works this weekend.

Seeing the other topic about edlo editor localisation mayne your lcs on acronym space is the most correct idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weak character matches given more priority than better acronym match #26

Weak character matches given more priority than better acronym match #26

mdahamiwal commented Oct 10, 2016 •

edited

Loading

jeancroy commented Oct 10, 2016 •

edited

Loading

mdahamiwal commented Oct 10, 2016

jeancroy commented Oct 10, 2016

jeancroy commented Oct 10, 2016

mdahamiwal commented Oct 12, 2016

mdahamiwal commented Oct 20, 2016

jeancroy commented Oct 20, 2016

Weak character matches given more priority than better acronym match #26

Weak character matches given more priority than better acronym match #26

Comments

mdahamiwal commented Oct 10, 2016 • edited Loading

jeancroy commented Oct 10, 2016 • edited Loading

mdahamiwal commented Oct 10, 2016

jeancroy commented Oct 10, 2016

jeancroy commented Oct 10, 2016

mdahamiwal commented Oct 12, 2016

mdahamiwal commented Oct 20, 2016

jeancroy commented Oct 20, 2016

mdahamiwal commented Oct 10, 2016 •

edited

Loading

jeancroy commented Oct 10, 2016 •

edited

Loading