-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Switch over to managed Marvin implementation for string hashing #17029
Conversation
Interop.Globalization.GetSortKey(_sortHandle, source, source.Length, pSortKey, sortKeyLength, options); | ||
if (Interop.Globalization.GetSortKey(_sortHandle, source, source.Length, pSortKey, sortKeyLength, options) != sortKeyLength) | ||
{ | ||
throw new ArgumentException(SR.Arg_MustBeString, nameof(source)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can source
not be a string? It's got invalid surrogates? Maybe message should say it has invalid content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And possibly attempt to include the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling for (theoretical?) errors coming out of GetSortKey was fairly inconsistent. It was combination of: Arg_MustBeString
argument exception, FailFast, and ignoring the errors.
I have standardized on Arg_MustBeString
because of it is what the original native implementation used: https://github.com/dotnet/coreclr/pull/17029/files#diff-5eb37e957a9809fb3b4d9574f91b8d12L126
However, I agree that this is really confusing message if/ever we hit it. I am changing it to Arg_ExternalException
that we are using in other places for unexpected errors coming out of globalization APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And possibly attempt to include the value
We are not including the string value for any other errors coming out of globalization. And I am not sure whether including them would be a good idea. If the globalization APIs fail, it will likely be because of they got passed in a weird string. Such strings are likely going to cause bad things to happen when they are printed, and so you end up with worse diagnostic experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
@@ -796,25 +800,29 @@ internal unsafe int GetHashCodeOfStringCore(string source, CompareOptions option | |||
|
|||
int sortKeyLength = Interop.Globalization.GetSortKey(_sortHandle, source, source.Length, null, 0, options); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we could supply a buffer that would often be big enough and avoid the need to call Getsortkey twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are many performance improvements that can be done in the globalization implementation. Not doing them in this PR to make it scoped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashing is possibly the most perf sensitive path in globalization? Perhaps it's worth an issue opened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks a lot Jan. |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline |
This makes
string.GetHashCode
a bit faster for small strings because of some of the FCall overhead goes away. E.g."Hello world!".GetHashCode()
is about 10% faster on Windows x64. The performance for larger strings is same.