Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Switch over to managed Marvin implementation for string hashing #17029

Merged
merged 3 commits into from
Mar 19, 2018

Conversation

jkotas
Copy link
Member

@jkotas jkotas commented Mar 19, 2018

This makes string.GetHashCode a bit faster for small strings because of some of the FCall overhead goes away. E.g. "Hello world!".GetHashCode() is about 10% faster on Windows x64. The performance for larger strings is same.

Interop.Globalization.GetSortKey(_sortHandle, source, source.Length, pSortKey, sortKeyLength, options);
if (Interop.Globalization.GetSortKey(_sortHandle, source, source.Length, pSortKey, sortKeyLength, options) != sortKeyLength)
{
throw new ArgumentException(SR.Arg_MustBeString, nameof(source));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can source not be a string? It's got invalid surrogates? Maybe message should say it has invalid content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And possibly attempt to include the value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling for (theoretical?) errors coming out of GetSortKey was fairly inconsistent. It was combination of: Arg_MustBeString argument exception, FailFast, and ignoring the errors.

I have standardized on Arg_MustBeString because of it is what the original native implementation used: https://github.com/dotnet/coreclr/pull/17029/files#diff-5eb37e957a9809fb3b4d9574f91b8d12L126

However, I agree that this is really confusing message if/ever we hit it. I am changing it to Arg_ExternalException that we are using in other places for unexpected errors coming out of globalization APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And possibly attempt to include the value

We are not including the string value for any other errors coming out of globalization. And I am not sure whether including them would be a good idea. If the globalization APIs fail, it will likely be because of they got passed in a weird string. Such strings are likely going to cause bad things to happen when they are printed, and so you end up with worse diagnostic experience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@@ -796,25 +800,29 @@ internal unsafe int GetHashCodeOfStringCore(string source, CompareOptions option

int sortKeyLength = Interop.Globalization.GetSortKey(_sortHandle, source, source.Length, null, 0, options);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we could supply a buffer that would often be big enough and avoid the need to call Getsortkey twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are many performance improvements that can be done in the globalization implementation. Not doing them in this PR to make it scoped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing is possibly the most perf sensitive path in globalization? Perhaps it's worth an issue opened.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViktorHofer
Copy link
Member

Great, thanks a lot Jan.

@jkotas
Copy link
Member Author

jkotas commented Mar 19, 2018

@dotnet-bot test Ubuntu x64 Checked corefx_baseline
@dotnet-bot test Windows_NT x64 Checked corefx_baseline

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants