-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality of matches: Proper casing #27
Comments
Hi mrkishi , thanks for the report.
I call what you describe as smart-case. Uppercase means uppercase, lowercase can mean anything. There are different reasons I did not went that way. One of which is that I try to be agnostic of programming style. Also my main concern is reachability. Imagine you have a local variable named Once reachability is there, then I know that after a bit of learning curve we have an useful tool.
There was a LOT of pressure for proper casing. Often for CamelCase. But also some use case for proper casing as-is. What to do from here ? It might be a coincidence but both your example fall into what I call acronym exact match. (that is the acronym of the subject is exactly the query)
In theory it's also a strong bonus, but it grows with acronym length so I may investigate what to think here. Another possibility is to have an option switch to behave in smartCase mode. It's not that hard to do, and in the end, it would be about testing if it's too slow to maintain both code path. |
Thank you for the detailed (and prompt) response, @jeancroy! The reachability argument is extremely convincing, and I didn't think of that. However, I'm not sure I completely understand its impact on these examples. It doesn't seem like smart-case goes against reachability. On the But even disregarding smart-case, I still come across some odd behaviors. Let me preface this message with some (made-up, sorry) term definitions to minimize confusion (it's still pretty confusing..): Literal [pattern]: a pattern of consecutive letters
Acronym [pattern]: a pattern of consecutive start-of-word letters
Match: any combination of sequential literal and/or acronym patterns
Literal [exact] match: an acronym pattern that spans 100% of the query
Acronym [exact] match: a literal pattern that spans 100% of the query
Exact match: a literal or acronym match
Full-length literal [exact] match: a literal match that also spans 100% of the candidate
Full-length acronym [exact] match: an acronym match that also spans 100% of the candidate
Full-length [exact] match: a full-length literal or full-length acronym match For instance, proper casing is apparently not as influential on literal matches as it is on acronym matches: (['A PROPER CASE', 'a proper case'], 'pc') => 'a proper case'
(['A PROPER CASE', 'a proper case'], 'PC') => 'A PROPER CASE'
(['A PROPERCASE', 'a propercase'], 'pr') => 'A PROPERCASE'
(['A PROPERCASE', 'a propercase'], 'PR') => 'A PROPERCASE'
// factor in length
(['A PR', 'a pr'], 'pr') => 'A PR'
(['A PR', 'a pr'], 'PR') => 'A PR' A full-length literal match will "ignore" case errors, while an equivalent full-length acronym will not: (['PROPER CASE', 'a proper case'], 'pc') => 'a proper case'
(['PROPERCASE', 'a propercase'], 'propercase') => 'PROPERCASE'
(['PR', 'a pr'], 'pr') => 'PR'
--
(['A PROPER CASE', 'proper case'], 'PC') => 'A PROPER CASE'
(['A PROPERCASE', 'propercase'], 'PROPERCASE') => 'propercase'
(['A PR', 'pr'], 'PR') => 'pr' I have the feeling that acronyms would work better if these behaviors were aligned: either full-length acronyms should be more lenient towards case mismatches (like full-length literals), or literal matches should favor proper casing over being full-length. Personally, I think giving full-length acronyms the same text-casing tolerance as full-length literal matches would be the more useful approach: (['PROPER CASE', 'a proper case'], 'pc') => 'PROPER CASE'
(['PROPERCASE', 'a propercase'], 'propercase') => 'PROPERCASE' Thoughts? |
Thank you for the report . Typical definition of smart case I've seen is that proper case on lowercase For the other finding you may have found a bug. There's express lane for What I can do is implements smart case. Then lower bonus for case Your idea of being more lenient with exact acronym seems good. It's hard to ---------- Forwarded message --------- Thank you for the detailed (and prompt) response, @jeancroy The reachability argument is extremely convincing, and I didn't think of But even disregarding smart-case, I still come across some odd behaviors. Let me preface this message with some (made-up, sorry) term definitions to Literal [pattern]: a pattern of consecutive lettersAcronym [pattern]: For instance, proper casing is apparently not as influential on literal (['A PROPER CASE', 'a proper case'], 'pc') => 'a proper case' (['A PROPERCASE', 'a propercase'], 'pr') => 'A PROPERCASE' A full-length literal match will "ignore" case errors, while an equivalent (['PROPER CASE', 'a proper case'], 'pc') => 'a proper case' (['PR', 'a pr'], 'pr') => 'PR'(['A PROPER CASE', 'proper case'], 'PC') => 'A PROPER CASE' (['A PR', 'pr'], 'PR') => 'pr' I have the feeling that acronyms would work better if these behaviors were Personally, I think giving full-length acronyms the same text-casing (['PROPER CASE', 'a proper case'], 'pc') => 'PROPER CASE' Thoughts? — Reply to this email directly, view it on GitHub |
Hello, folks.
I've came across the following situation:
Wouldn't
Ten Pizzas
make more sense, here? I haven't studied the algorithm too deeply, yet, but I think this is caused by the proper casing rules, as evidenced by this version of the same test:Now, while proper casing is indeed important when choosing matches, I feel like it's not a good indicator on lowercase queries, and it's currently being given too much weight.
A query that contains uppercase characters conveys a proper casing intention quite strongly. The opposite, however, is not true: a lowercase query doesn't mean you'd prefer lowercase matches.
Consider these hypothetical queries:
(['Proper Case', 'A proper case'], 'pc') => 'Proper Case'
(['proper case', 'a Proper Case'], 'PC') => 'A Proper Case'
fuzzaldrin-plus gets the second one, but misses the first. Am I off-base here in what I consider better matches?
The text was updated successfully, but these errors were encountered: