-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match fails for Turkish Locale due to Turkish "i" problem #33
Comments
Question I ask because (at least in the case of atom) the library is focused on programming language & path. In case of doubt, I prefer to match sightly too often. In this case lowercase "i" could match both uppercase "I". That would support mixed language use. I would be ok with having user defined "lowercaseFunc(x) => x.toLocaleLowercase()" and similar "uppercaseFunc" if you are certain your user won't mix languages. |
Nope, if the current locale is Turkish, the two |
Basically the question is do Turkish people have different expectation for Turkish text and English text. |
I confirmed with an actual Turkish user and he mentioned that you should only take care of case insensitive match, as we can never know if search term is Turkish or English. So, as far as we are using |
I'm not sure I understand that sentence. Do you mean case sensitive instead ? (If one never know the language, one does not apply language specific transform, I think this one works out of the box now) Are you OK with this being an option ? I feel like the reasonable thing to do it letting more user interact with it and see if they like it. |
That means we should take care of matching for current locale (
Sounds good to me.
That brings out a good point, I did a benchmark run on chrome and Edge and However, as per our logic, the only place we do full string |
That's the spirit. Some user will type "nondescript" query, as part of word, or by taping the name of a folder with large sub-hierarchy. In the past I've toyed with the idea of writing my own case-insensitive "indexOf" and getting rid of transforming the subject. Alternatively some cache of lowercase subjects may make the problem disappear over multiple search. Per this file of exception in unicode case folding Since this issue is related to a few language Turkish, Azeri, Lithuanian and is data dependent -- I'll be working on making your changes option-dependent later today or tomorrow. |
Its a common problem in programming while comparing string in Turkish language eg.
Turkish language has four i's including case:
"ı"
(dotless lowercase) &"I"
(dotless uppercase)"i"
(dotted lowercase) &"İ"
(dotted uppercase)The library internally uses
toLowerCase()
andtoUpperCase()
, this results in mismatch if queryor candidates contains dotless
i
s as"I".toLowerCase() = "i"
What we need is
"I".toLocaleLowerCase() = "ı"
Fix: Use
toLocaleLowerCase()
to ensure that turkish locale is honored while lowercase/uppercase conversions.The text was updated successfully, but these errors were encountered: