-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mishandling of utf8 replacement character #54
Comments
Though it is elegant and composes well, it is possible that moving away from package transform may end up making the code simpler, faster, and more robust. (Need to think about that a bit.) |
{"Ⱦ", "", false, -1}, is another interesting test case because its lowercase form has a different UTF-8 encoded length than its uppercase form. |
josharian
added a commit
to josharian/fuzzysearch
that referenced
this issue
May 4, 2023
This costs some CPU time, but of course it's better than panicking. Fixes lithammer#54 goos: darwin goarch: arm64 pkg: github.com/lithammer/fuzzysearch/fuzzy │ a │ b │ │ sec/op │ sec/op vs base │ Match-8 16.08n ± 2% 15.99n ± 1% ~ (p=0.361 n=10) MatchBigLate-8 1.005µ ± 1% 1.005µ ± 0% ~ (p=0.861 n=10) MatchBigEarly-8 12.60n ± 1% 12.56n ± 1% ~ (p=0.641 n=10) MatchFold-8 136.0n ± 1% 144.5n ± 1% +6.25% (p=0.000 n=10) MatchFoldBigLate-8 7.071µ ± 5% 7.520µ ± 1% +6.36% (p=0.000 n=10) MatchFoldBigEarly-8 6.093µ ± 2% 6.560µ ± 3% +7.67% (p=0.000 n=10) RankMatch-8 17.67n ± 1% 17.56n ± 1% ~ (p=0.319 n=10) RankMatchBigLate-8 1.008µ ± 2% 1.008µ ± 1% ~ (p=0.870 n=10) RankMatchBigEarly-8 1.228µ ± 1% 1.228µ ± 2% ~ (p=0.821 n=10) LevenshteinDistance-8 55.63n ± 1% 55.45n ± 2% ~ (p=0.225 n=10) LevenshteinDistanceBigLate-8 20.44µ ± 2% 20.44µ ± 3% ~ (p=0.739 n=10) LevenshteinDistanceBigEarly-8 20.47µ ± 3% 20.43µ ± 1% ~ (p=0.280 n=10) geomean 539.4n 547.4n +1.48%
This was referenced May 4, 2023
Merged
josharian
added a commit
to josharian/fuzzysearch
that referenced
this issue
May 4, 2023
This is about as minimal as it gets, but it was enough to catch lithammer#54. It's a bit weird to have a file called fuzz_test.go and another called fuzzy_test.go. I guess that's life in a fuzzy matching project.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this test case to
var fuzzyTests
, and run the tests:Result:
This existed prior to #53 (phew!). The root cause is that unicodeFoldTransformer.Transform is returning
n, n, err
, but when utf8.RuneError is present,nSrc
may differ fromnDst
. I'll try to put together a fix sometime soonish.(Found by fuzzing. Once the fuzz tests make it out of the gate without stumbling, I'll PR them.)
The text was updated successfully, but these errors were encountered: