Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weak character matches given more priority than better acronym match #26

Open
mdahamiwal opened this issue Oct 10, 2016 · 7 comments
Open

Comments

@mdahamiwal
Copy link
Collaborator

mdahamiwal commented Oct 10, 2016

Hi @jeancroy, here is another scenario where I think acronym score is weak:
candidates:
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
sft/Tests/tft/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js

search query:
sft/gisp.js

results:
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js

expected:
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js

GssHelper.js doesn't even contain gisp in-order but still scored higher than better acronym matches.

@jeancroy
Copy link
Owner

jeancroy commented Oct 10, 2016

Hi, if you look at the code, there's the concept of "acronym prefrix".
Basically it's an heuristic that try to decide if you are looking for an acronym or consecutive letters.

And that heuristic require you to to put acronym to the start of the query.
Why the start ? Because that early there's no backtracking. (The cleanest way out of this I believe is multi-word support. That would be to try and segment the query into words and restart acronym prefix search on each words, however without backtracking there's no garanties those prefix wont overlap.)

So in query sft/gisp.js , sft/ is working against you.

Moreover, the path separator character (/ or \ depending on environement) is special sft/gisp.js is really interpreted as sft should be found in the immediate parent folder of gisp.js, and that's another reason sftRobotic ranks so well. I'll agree it's not documented or universally understood that way, so maybe I'll disable the behavior or make it optional.

For now i'll suggest using something like sgisp.js or stpgisp.js

prehaps another way to bias your way toward acronym is the uppercase-means-uppercase rule, while lowecase can match either case. It's not implemented but other fuzzy libraries are successful with it.

@mdahamiwal
Copy link
Collaborator Author

Aha! I see it now, thanks @jeancroy for explaining it in details.
I always thought a scoped acronym query sft/gisp.js is better that gisp.js but it turned out to be other way round.
The main concern is GssHelper.js is scored highest which in no way relates to query gisp and it becomes worse on highlighting which feels like a bug.
image

I understand that we don't want to backtrack so early especially to score acronym but how about a LCS b/w candidate's StartOfWord and query, something of this sort:
sft/Tests/Plugins/GulpImportStepPerformer.js -> s/T/P/GISP
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js -> s/T/t/R/T/t/d/s/GH

Having them, if we try LCS for query sft/gisp.js the first will be scored higher. Currently, the problem is we have a scoreAcronym routine that gives out ZERO for all three candidates in this case. It exhausts the query even before it hits the actual acronyms.

thoughts?

@jeancroy
Copy link
Owner

That may work.

However the computation of " is something an acronym ", camelCase, snake_case etc is actually super expensive. The trick I have found is to not compute it unless a[i] == b[j] which is infrequent.

In order to compare query with acronym form over lcs, we would require categorization of every character of the subject. (There might be some efficient way to store and reuse those, we'll have to think / benchmark )

@jeancroy
Copy link
Owner

I'm thinking along the following line

on this line
make accro_score an array instead of a constant.

so we sould have something like that after segmentation of query in words and then evaluation acronym of each words.

sft/gisp.js
00004444000

@mdahamiwal
Copy link
Collaborator Author

Agree on that, that will be way more cleaner and performant. Only concern I see is it make that code more unreadable where it is already a bit tough to follow and understand.

@mdahamiwal
Copy link
Collaborator Author

@jeancroy, any development for this issue? Thanks.

@jeancroy
Copy link
Owner

I'll put some time on making this works this weekend.

Seeing the other topic about edlo editor localisation mayne your lcs on acronym space is the most correct idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants