-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unicode fuzzy search #18
Comments
seems to work https://runkit.com/meltuhamy/5b1faf06fba94100126a5bbb |
It would become significantly slower with Example that reports false positive: needle = "𠀨";
haystack = "𠀧𡀨";
// returns true
fuzzysearch(needle, haystack) Explanation of the above code: The
How to fix this: (without sacrificing performance) In UTF16 when a unicode codepoint (an actual unicode character) is encoded using two uint16 values (utf16 surrogate pair) then the first uint16 is always in the inclusive range If a charcode in EDIT: A fixed version would look like this: function fuzzysearch (needle, haystack) {
var hlen = haystack.length;
var nlen = needle.length;
if (nlen > hlen) {
return false;
}
if (nlen === hlen) {
return needle === haystack;
}
outer: for (var i = 0, j = 0; i < nlen; i++) {
var nch = needle.charCodeAt(i);
// handling a utf-16 surrogate pair
if (nch >= 0xD800 && nch <= 0xDBFF) {
if (++i >= nlen || j >= hlen) {
return false
}
var nch2 = needle.charCodeAt(i);
var hch = haystack.charCodeAt(j++);
while (j < hlen) {
var hch2 = haystack.charCodeAt(j++);
if (hch === nch && hch2 === nch2) {
continue outer;
}
hch = hch2;
}
return false
}
// no utf-16 surrogate pair
while (j < hlen) {
if (haystack.charCodeAt(j++) === nch) {
continue outer;
}
}
return false;
}
return true;
} |
I've just took a look at the source code and I saw
str.charCodeAt
is used instead ofstr.codePointAt
. So fuzzy searching unicode characters (multibytes characters for example) is probably not supported by this algorithm.The text was updated successfully, but these errors were encountered: