Performance improvement for Guid.Equals using SSE #53012

bill-poole · 2021-05-20T07:58:17Z

Baseline performance:

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EqualsSame	1.785 ns	0.0558 ns	0.0494 ns	1.791 ns	1.702 ns	1.849 ns	-	-	-	-
EqualsOperator	1.709 ns	0.0133 ns	0.0125 ns	1.711 ns	1.687 ns	1.730 ns	-	-	-	-
NotEqualsOperator	1.728 ns	0.0188 ns	0.0167 ns	1.727 ns	1.701 ns	1.749 ns	-	-	-	-

Updated performance:

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EqualsSame	0.3771 ns	0.0073 ns	0.0061 ns	0.3786 ns	0.3650 ns	0.3852 ns	-	-	-	-
EqualsOperator	0.4260 ns	0.0101 ns	0.0095 ns	0.4285 ns	0.4060 ns	0.4413 ns	-	-	-	-
NotEqualsOperator	0.5709 ns	0.0349 ns	0.0327 ns	0.5577 ns	0.5437 ns	0.6605 ns	-	-	-	-

dnfadmin · 2021-05-20T07:58:31Z

All CLA requirements met.

gfoidl · 2021-05-20T09:22:21Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs

+                var result = Sse2.CompareEqual(g1, g2);
+                return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;


Suggested change

var result = Sse2.CompareEqual(g1, g2);

return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;

return g1.Equals(g2);

Results in same codegen and is simpler to read.
If this hinders inlining, etc. keep it as is (except var -> concrete type).

Ideally the JIT should emit code that takes SSE4.1's _mm_testz_si128 into account?
(Iif this variant is always faster on the supported cpus)

Vector128<T>.Equals(Vector128<T>) currently does not have a specific/optimized code path for SSE 4.1's _mm_testz_si128, which I why I called out an explicit code path for SSE 4.1 in Guid.EqualsCore.

But yes, under the SSE2 code path, it should be equivalent to invoke g1.Equals(g2) on the assumption that the JIT will inline the call, which it should because Vector128<T>.Equals(Vector128<T>) is decorated with "aggressive inlining".

currently does not have a specific/optimized code path for SSE 4.1's

Yeah, my question was intented for @tannergooding (?) to have a look at this on the JIT-side 😃

Ideally the JIT should emit code that takes SSE4.1's _mm_testz_si128 into account?

There are a few improvements that could happen for Vector128.Equals. I've been waiting on #49397 before tackling any of them, however.

gfoidl · 2021-05-20T09:22:55Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs

-            ref int rB = ref Unsafe.AsRef(in right._a);
+            if (Sse2.IsSupported)
+            {
+                var g1 = Unsafe.As<Guid, Vector128<byte>>(ref Unsafe.AsRef(left));


Note: use explicit types, not var, to follow the coding guidelines in this repo.

I have updated accordingly.

gfoidl · 2021-05-20T09:33:23Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs


-            // Compare each element
+            static bool SoftwareFallback(in Guid left, in Guid right)


Re in: just to double-check the discussion in the issue.

Happy to take direction on this either way. I have found in my performance tests thus far that passing a Guid by reference outperforms passing by value - I suspect because the JIT has difficulty keeping the Guid parameters in registers because they have 11 fields.

gfoidl · 2021-05-20T09:37:16Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs

-                && Unsafe.Add(ref rA, 3) == Unsafe.Add(ref rB, 3);
+                // Compare each element
+
+                return rA == rB


Should we special case 64-bits processes by comparing with longs?
E.g. on ARM 64 this code will be hit.
Cf. #35654

if (Environment.Is64BitProcess) { // code with long } else { // code with int }

The "fast out behavior" should be given with longs too, but I'm not sure how about alignment on ARM 64.

I'm happy to update with a special case for 64-bit software fallback if desired. The current Guid.EqualsCore method doesn't cater for this, but that's no reason to not cater for it now.

jkotas · 2021-05-20T17:42:30Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs

        private static bool EqualsCore(in Guid left, in Guid right)
        {
-            ref int rA = ref Unsafe.AsRef(in left._a);
-            ref int rB = ref Unsafe.AsRef(in right._a);
+            if (Sse2.IsSupported)


What about ARM64? We care about ARM64 as much as we care about x64.

Well, if SSE2 is not supported, we can just use a method without SSE2? Much like what we have been doing so far. But still, in most cases SSE2is supported.

jkotas · 2021-05-20T17:43:41Z

FWIW, I think that this is way too complex change for the benefit that it provides. I think it would be better to wait for the general Vector128 improvements.

ghost · 2021-06-14T13:21:42Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #52296.

Baseline performance:

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EqualsSame	1.785 ns	0.0558 ns	0.0494 ns	1.791 ns	1.702 ns	1.849 ns	-	-	-	-
EqualsOperator	1.709 ns	0.0133 ns	0.0125 ns	1.711 ns	1.687 ns	1.730 ns	-	-	-	-
NotEqualsOperator	1.728 ns	0.0188 ns	0.0167 ns	1.727 ns	1.701 ns	1.749 ns	-	-	-	-

Updated performance:

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EqualsSame	0.3771 ns	0.0073 ns	0.0061 ns	0.3786 ns	0.3650 ns	0.3852 ns	-	-	-	-
EqualsOperator	0.4260 ns	0.0101 ns	0.0095 ns	0.4285 ns	0.4060 ns	0.4413 ns	-	-	-	-
NotEqualsOperator	0.5709 ns	0.0349 ns	0.0327 ns	0.5577 ns	0.5437 ns	0.6605 ns	-	-	-	-

Author:	billpoole-mi
Assignees:	-
Labels:	`* NO MERGE *`, `area-System.Runtime`
Milestone:	-

jeffhandley · 2021-07-23T21:08:53Z

FWIW, I think that this is way too complex change for the benefit that it provides. I think it would be better to wait for the general Vector128 improvements.

Based on this comment and the lack of activity on this PR since, I'm going to go ahead and close it. Thanks for submitting the PR @billpoole-mi; it led to an interesting and valuable discussion here.

Update Guid.EqualsCore with SSE2/4.1 intrinsics. Fixes #52296.

851ccd9

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 20, 2021

gfoidl reviewed May 20, 2021

View reviewed changes

Updated to use explicit types rather than var as per coding guidelines.

efcc9c7

jkotas reviewed May 20, 2021

View reviewed changes

runfoapp bot mentioned this pull request May 20, 2021

InvokeCodeThatShouldFirEvents_EnsureEventsFired fails on OSX #52710

Closed

sandreenko removed the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2021

stephentoub added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jun 11, 2021

marek-safar added the area-System.Runtime label Jun 14, 2021

terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021

jeffhandley assigned tannergooding Jul 23, 2021

jeffhandley closed this Jul 23, 2021

ghost locked as resolved and limited conversation to collaborators Aug 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement for Guid.Equals using SSE #53012

Performance improvement for Guid.Equals using SSE #53012

bill-poole commented May 20, 2021

dnfadmin commented May 20, 2021 •

edited

Loading

gfoidl May 20, 2021

bill-poole May 20, 2021

gfoidl May 20, 2021

tannergooding May 20, 2021

gfoidl May 20, 2021

bill-poole May 20, 2021

gfoidl May 20, 2021

bill-poole May 20, 2021

gfoidl May 20, 2021

bill-poole May 20, 2021

jkotas May 20, 2021

FilipToth May 21, 2021 •

edited

Loading

jkotas commented May 20, 2021

ghost commented Jun 14, 2021

jeffhandley commented Jul 23, 2021

		var result = Sse2.CompareEqual(g1, g2);
		return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;

	var result = Sse2.CompareEqual(g1, g2);
	return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;
	return g1.Equals(g2);


		// Compare each element
		static bool SoftwareFallback(in Guid left, in Guid right)

Performance improvement for Guid.Equals using SSE #53012

Performance improvement for Guid.Equals using SSE #53012

Conversation

bill-poole commented May 20, 2021

dnfadmin commented May 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FilipToth May 21, 2021 • edited Loading

Choose a reason for hiding this comment

jkotas commented May 20, 2021

ghost commented Jun 14, 2021

jeffhandley commented Jul 23, 2021

dnfadmin commented May 20, 2021 •

edited

Loading

FilipToth May 21, 2021 •

edited

Loading