-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weak character matches given more priority than better acronym match #26
Comments
Hi, if you look at the code, there's the concept of "acronym prefrix". And that heuristic require you to to put acronym to the start of the query. So in query Moreover, the path separator character ( For now i'll suggest using something like prehaps another way to bias your way toward acronym is the uppercase-means-uppercase rule, while lowecase can match either case. It's not implemented but other fuzzy libraries are successful with it. |
Aha! I see it now, thanks @jeancroy for explaining it in details. I understand that we don't want to backtrack so early especially to score acronym but how about a LCS b/w candidate's StartOfWord and query, something of this sort: Having them, if we try LCS for query thoughts? |
That may work. However the computation of " is something an acronym ", camelCase, snake_case etc is actually super expensive. The trick I have found is to not compute it unless a[i] == b[j] which is infrequent. In order to compare query with acronym form over lcs, we would require categorization of every character of the subject. (There might be some efficient way to store and reuse those, we'll have to think / benchmark ) |
I'm thinking along the following line on this line so we sould have something like that after segmentation of query in words and then evaluation acronym of each words.
|
Agree on that, that will be way more cleaner and performant. Only concern I see is it make that code more unreadable where it is already a bit tough to follow and understand. |
@jeancroy, any development for this issue? Thanks. |
I'll put some time on making this works this weekend. Seeing the other topic about |
Hi @jeancroy, here is another scenario where I think acronym score is weak:
candidates:
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
sft/Tests/tft/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js
search query:
sft/gisp.js
results:
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
expected:
sft/Tests/Plugins/GulpImportStepPerformer.js
sft/Tests/Plugins/Cloud/GulpImportStepPerformer.js
sft/Tests/tfat/Reporting/Tools/teams/dev50/sftRobotic/GssHelper.js
GssHelper.js
doesn't even containgisp
in-order but still scored higher than better acronym matches.The text was updated successfully, but these errors were encountered: