Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mishandling of utf8 replacement character #54

Closed
josharian opened this issue May 4, 2023 · 2 comments · Fixed by #55
Closed

mishandling of utf8 replacement character #54

josharian opened this issue May 4, 2023 · 2 comments · Fixed by #55

Comments

@josharian
Copy link
Contributor

Add this test case to var fuzzyTests, and run the tests:

	{"\xffinvalid UTF-8\xff", "", false, -1},

Result:

--- FAIL: TestFuzzyMatchFold (0.00s)
panic: runtime error: slice bounds out of range [19:15] [recovered]
	panic: runtime error: slice bounds out of range [19:15]

goroutine 35 [running]:
testing.tRunner.func1.2({0x1031423e0, 0x14000114018})
	/Users/josh/go/1.20/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
	/Users/josh/go/1.20/src/testing/testing.go:1529 +0x384
panic({0x1031423e0, 0x14000114018})
	/Users/josh/go/1.20/src/runtime/panic.go:884 +0x204
golang.org/x/text/transform.String({0x103153b28, 0x103297c60}, {0x1030cb139, 0xf})
	/Users/josh/pkg/mod/golang.org/x/[email protected]/transform/transform.go:650 +0x9e4
github.com/lithammer/fuzzysearch/fuzzy.stringTransform({0x1030cb139, 0xf}, {0x103153b28?, 0x103297c60?})
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:242 +0x64
github.com/lithammer/fuzzysearch/fuzzy.match({0x1030cb139?, 0x7?}, {0x0, 0x0}, {0x103153b28, 0x103297c60})
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:55 +0x38
github.com/lithammer/fuzzysearch/fuzzy.MatchFold(...)
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:41
github.com/lithammer/fuzzysearch/fuzzy.TestFuzzyMatchFold(0x1400011cb60)
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy_test.go:65 +0xbc
testing.tRunner(0x1400011cb60, 0x103152088)
	/Users/josh/go/1.20/src/testing/testing.go:1576 +0x10c
created by testing.(*T).Run
	/Users/josh/go/1.20/src/testing/testing.go:1629 +0x368
exit status 2
FAIL	github.com/lithammer/fuzzysearch/fuzzy	0.131s

This existed prior to #53 (phew!). The root cause is that unicodeFoldTransformer.Transform is returning n, n, err, but when utf8.RuneError is present, nSrc may differ from nDst. I'll try to put together a fix sometime soonish.

(Found by fuzzing. Once the fuzz tests make it out of the gate without stumbling, I'll PR them.)

@josharian
Copy link
Contributor Author

Though it is elegant and composes well, it is possible that moving away from package transform may end up making the code simpler, faster, and more robust. (Need to think about that a bit.)

@josharian
Copy link
Contributor Author

{"Ⱦ", "", false, -1},

is another interesting test case because its lowercase form has a different UTF-8 encoded length than its uppercase form.

josharian added a commit to josharian/fuzzysearch that referenced this issue May 4, 2023
This costs some CPU time, but of course it's better than panicking.

Fixes lithammer#54

goos: darwin
goarch: arm64
pkg: github.com/lithammer/fuzzysearch/fuzzy
                              │      a      │                 b                  │
                              │   sec/op    │   sec/op     vs base               │
Match-8                         16.08n ± 2%   15.99n ± 1%       ~ (p=0.361 n=10)
MatchBigLate-8                  1.005µ ± 1%   1.005µ ± 0%       ~ (p=0.861 n=10)
MatchBigEarly-8                 12.60n ± 1%   12.56n ± 1%       ~ (p=0.641 n=10)
MatchFold-8                     136.0n ± 1%   144.5n ± 1%  +6.25% (p=0.000 n=10)
MatchFoldBigLate-8              7.071µ ± 5%   7.520µ ± 1%  +6.36% (p=0.000 n=10)
MatchFoldBigEarly-8             6.093µ ± 2%   6.560µ ± 3%  +7.67% (p=0.000 n=10)
RankMatch-8                     17.67n ± 1%   17.56n ± 1%       ~ (p=0.319 n=10)
RankMatchBigLate-8              1.008µ ± 2%   1.008µ ± 1%       ~ (p=0.870 n=10)
RankMatchBigEarly-8             1.228µ ± 1%   1.228µ ± 2%       ~ (p=0.821 n=10)
LevenshteinDistance-8           55.63n ± 1%   55.45n ± 2%       ~ (p=0.225 n=10)
LevenshteinDistanceBigLate-8    20.44µ ± 2%   20.44µ ± 3%       ~ (p=0.739 n=10)
LevenshteinDistanceBigEarly-8   20.47µ ± 3%   20.43µ ± 1%       ~ (p=0.280 n=10)
geomean                         539.4n        547.4n       +1.48%
josharian added a commit to josharian/fuzzysearch that referenced this issue May 4, 2023
This is about as minimal as it gets, but it was enough to catch lithammer#54.

It's a bit weird to have a file called fuzz_test.go
and another called fuzzy_test.go.
I guess that's life in a fuzzy matching project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant