Improve Guid equality checks on 64-bit platforms #35654

BruceForstall · 2020-04-30T06:58:40Z

Current code does four 32-bit comparisons. Instead,
do two 64-bit comparisons. On x86, the JIT-generated
code is slightly different, but equally fast.

This will be even better on arm64 which passes everything
in 64-bit registers, after #35622 is addressed in the JIT.

Perf results:

x64:

Method	Tool	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD
EqualsSame	base	2.322 ns	0.0210 ns	0.0187 ns	2.324 ns	2.294 ns	2.351 ns	1.00	0.00
EqualsSame	diff	1.547 ns	0.0092 ns	0.0071 ns	1.547 ns	1.535 ns	1.559 ns	0.67	0.01
EqualsOperator	base	2.890 ns	0.3896 ns	0.4169 ns	3.030 ns	1.722 ns	3.074 ns	1.00	0.00
EqualsOperator	diff	1.346 ns	0.0160 ns	0.0150 ns	1.346 ns	1.331 ns	1.380 ns	0.49	0.12
NotEqualsOperator	base	1.738 ns	0.0306 ns	0.0255 ns	1.730 ns	1.712 ns	1.805 ns	1.00	0.00
NotEqualsOperator	diff	1.401 ns	0.0425 ns	0.0355 ns	1.389 ns	1.360 ns	1.476 ns	0.81	0.02

x86:

Method	Tool	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD
EqualsSame	base	3.164 ns	0.0234 ns	0.0208 ns	3.159 ns	3.136 ns	3.203 ns	1.00	0.00
EqualsSame	diff	3.079 ns	0.0327 ns	0.0306 ns	3.074 ns	3.041 ns	3.146 ns	0.97	0.01
EqualsOperator	base	2.736 ns	0.0252 ns	0.0236 ns	2.726 ns	2.710 ns	2.783 ns	1.00	0.00
EqualsOperator	diff	2.613 ns	0.0262 ns	0.0245 ns	2.600 ns	2.589 ns	2.662 ns	0.95	0.01
NotEqualsOperator	base	2.708 ns	0.0096 ns	0.0080 ns	2.705 ns	2.699 ns	2.723 ns	1.00	0.00
NotEqualsOperator	diff	2.573 ns	0.0666 ns	0.0591 ns	2.552 ns	2.526 ns	2.709 ns	0.95	0.02

Current code does four 32-bit comparisons. Instead, do two 64-bit comparisons. On x86, the JIT-generated code is slightly different, but equally fast. This will be even better on arm64 which passes everything in 64-bit registers, after dotnet#35622 is addressed in the JIT. Perf results: x64: | Method | Tool | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | |------------------ |------|-----------:|----------:|----------:|-----------:|-----------:|-----------:|------:|--------:| | EqualsSame | base | 2.322 ns | 0.0210 ns | 0.0187 ns | 2.324 ns | 2.294 ns | 2.351 ns | 1.00 | 0.00 | | EqualsSame | diff | 1.547 ns | 0.0092 ns | 0.0071 ns | 1.547 ns | 1.535 ns | 1.559 ns | 0.67 | 0.01 | | EqualsOperator | base | 2.890 ns | 0.3896 ns | 0.4169 ns | 3.030 ns | 1.722 ns | 3.074 ns | 1.00 | 0.00 | | EqualsOperator | diff | 1.346 ns | 0.0160 ns | 0.0150 ns | 1.346 ns | 1.331 ns | 1.380 ns | 0.49 | 0.12 | | NotEqualsOperator | base | 1.738 ns | 0.0306 ns | 0.0255 ns | 1.730 ns | 1.712 ns | 1.805 ns | 1.00 | 0.00 | | NotEqualsOperator | diff | 1.401 ns | 0.0425 ns | 0.0355 ns | 1.389 ns | 1.360 ns | 1.476 ns | 0.81 | 0.02 | x86: | Method | Tool | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | |------------------ |------|-----------:|----------:|----------:|-----------:|-----------:|-----------:|------:|--------:| | EqualsSame | base | 3.164 ns | 0.0234 ns | 0.0208 ns | 3.159 ns | 3.136 ns | 3.203 ns | 1.00 | 0.00 | | EqualsSame | diff | 3.079 ns | 0.0327 ns | 0.0306 ns | 3.074 ns | 3.041 ns | 3.146 ns | 0.97 | 0.01 | | EqualsOperator | base | 2.736 ns | 0.0252 ns | 0.0236 ns | 2.726 ns | 2.710 ns | 2.783 ns | 1.00 | 0.00 | | EqualsOperator | diff | 2.613 ns | 0.0262 ns | 0.0245 ns | 2.600 ns | 2.589 ns | 2.662 ns | 0.95 | 0.01 | | NotEqualsOperator | base | 2.708 ns | 0.0096 ns | 0.0080 ns | 2.705 ns | 2.699 ns | 2.723 ns | 1.00 | 0.00 | | NotEqualsOperator | diff | 2.573 ns | 0.0666 ns | 0.0591 ns | 2.552 ns | 2.526 ns | 2.709 ns | 0.95 | 0.02 |

BruceForstall · 2020-04-30T07:00:23Z

Additional benchmarks: dotnet/performance#1302

EgorBo · 2020-04-30T10:19:15Z

Is it safe for arm32 as well (accessing misaligned long)?

gfoidl · 2020-04-30T11:26:27Z

Did you consider a vectorized approach?
Only shown for SSE2, needs some specialization for Arm, and I don't know if it's worth it...

EgorBo · 2020-04-30T11:30:49Z

Did you consider a vectorized approach?
Only shown for SSE2, needs some specialization for Arm, and I don't know if it's worth it...

Last time I tried it was slower,
I think in most cases guids are different and early out check is enough.

tannergooding · 2020-04-30T15:09:23Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs

-                Unsafe.Add(ref g._a, 1) == Unsafe.Add(ref _a, 1) &&
-                Unsafe.Add(ref g._a, 2) == Unsafe.Add(ref _a, 2) &&
-                Unsafe.Add(ref g._a, 3) == Unsafe.Add(ref _a, 3);
+            return Unsafe.As<int, long>(ref g._a) == Unsafe.As<int, long>(ref _a) &&


Does this logic need to be repeated in multiple places rather than have everything forward to == for example?

I'd hope the JIT would properly inline the check in all the cases and the code would be equivalent.

Agreed, although that's a separate issue.

stephentoub · 2020-04-30T15:12:22Z

src/libraries/System.Private.CoreLib/src/System/Guid.cs

-                Unsafe.Add(ref g._a, 2) == Unsafe.Add(ref _a, 2) &&
-                Unsafe.Add(ref g._a, 3) == Unsafe.Add(ref _a, 3);
+            return Unsafe.As<int, long>(ref g._a) == Unsafe.As<int, long>(ref _a) &&
+                Unsafe.As<byte, long>(ref g._d) == Unsafe.As<byte, long>(ref _d);


I was under the impression that Int64-based operations would be slower than Int32-based operations on 32-bit, especially if the values weren't aligned (and that that's why @jkotas used Int32 here initially when switching these operations to be based on Unsafe). Is that not the case?

jkotas · 2020-04-30T15:23:23Z

Is it safe for arm32 as well (accessing misaligned long)?

+1. It would be better to use Unsafe.ReadUnaligned here to make the code portable.

BruceForstall · 2020-04-30T17:45:31Z

Did you consider a vectorized approach?

I didn't. I was specifically looking to improve arm64, but x64 as well (and be simple and cross-platform).

Is it safe for arm32 as well (accessing misaligned long)?

RyuJIT converts a 64-bit long access to two 32-bit int accesses on 32-bit platforms, so there is no difference in alignment for 32-bit. There could be a difference in alignment for 64-bit since presumably Guid could be 4-byte aligned.

fyi, here's the x86 code difference. It appears RyuJIT doesn't handle the Unsafe.Add calls well.

current x86 assembly

G_M51749_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp
                        ;; bbWeight=1    PerfScore 1.25
G_M51749_IG02:
       8B4518       mov      eax, dword ptr [ebp+18H]
       3B4508       cmp      eax, dword ptr [ebp+08H]
       7532         jne      SHORT G_M51749_IG05
                        ;; bbWeight=1    PerfScore 3.00
G_M51749_IG03:
       8D4518       lea      eax, bword ptr [ebp+18H]
       8B4004       mov      eax, dword ptr [eax+4]
       8D5508       lea      edx, bword ptr [ebp+08H]
       3B4204       cmp      eax, dword ptr [edx+4]
       7524         jne      SHORT G_M51749_IG05
       8D4518       lea      eax, bword ptr [ebp+18H]
       8B4008       mov      eax, dword ptr [eax+8]
       8D5508       lea      edx, bword ptr [ebp+08H]
       3B4208       cmp      eax, dword ptr [edx+8]
       7516         jne      SHORT G_M51749_IG05
       8D4518       lea      eax, bword ptr [ebp+18H]
       8B400C       mov      eax, dword ptr [eax+12]
       8D5508       lea      edx, bword ptr [ebp+08H]
       3B420C       cmp      eax, dword ptr [edx+12]
       0F94C0       sete     al
       0FB6C0       movzx    eax, al
                        ;; bbWeight=0.50 PerfScore 9.13
G_M51749_IG04:
       5D           pop      ebp
       C22000       ret      32
                        ;; bbWeight=0.50 PerfScore 1.25
G_M51749_IG05:
       33C0         xor      eax, eax
                        ;; bbWeight=0.50 PerfScore 0.13
G_M51749_IG06:
       5D           pop      ebp
       C22000       ret      32

new x86 assembly

G_M51749_IG02:
       8B442414     mov      eax, dword ptr [esp+14H]
       8B542418     mov      edx, dword ptr [esp+18H]
       33442404     xor      eax, dword ptr [esp+04H]
       33542408     xor      edx, dword ptr [esp+08H]
       0BC2         or       eax, edx
       751B         jne      SHORT G_M51749_IG05
                        ;; bbWeight=1    PerfScore 5.25
G_M51749_IG03:
       8B44241C     mov      eax, dword ptr [esp+1CH]
       8B542420     mov      edx, dword ptr [esp+20H]
       3344240C     xor      eax, dword ptr [esp+0CH]
       33542410     xor      edx, dword ptr [esp+10H]
       0BC2         or       eax, edx
       0F94C0       sete     al
       0FB6C0       movzx    eax, al
                        ;; bbWeight=0.50 PerfScore 2.75
G_M51749_IG04:
       C22000       ret      32
                        ;; bbWeight=0.50 PerfScore 1.00
G_M51749_IG05:
       33C0         xor      eax, eax
                        ;; bbWeight=0.50 PerfScore 0.13
G_M51749_IG06:
       C22000       ret      32

stephentoub · 2020-10-05T13:12:07Z

@BruceForstall, are you still working on this? Thanks.

BruceForstall · 2020-10-05T22:27:17Z

@BruceForstall, are you still working on this? Thanks.

I haven't, and likely won't get back to it. I'll go ahead and close the PR. I'd be happy if someone picked it up. The "only thing" is to try Jan's suggestion about using Unsafe.ReadUnaligned, and verifying performance on all platforms still improves.

Dotnet-GitSync-Bot added the area-System.Runtime label Apr 30, 2020

tannergooding reviewed Apr 30, 2020

View reviewed changes

stephentoub reviewed Apr 30, 2020

View reviewed changes

BruceForstall closed this Oct 5, 2020

jkotas mentioned this pull request Nov 13, 2020

Make System.Guid readonly #44629

Merged

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

BruceForstall deleted the ImproveGuidEqualityChecks branch February 28, 2021 03:08

BruceForstall restored the ImproveGuidEqualityChecks branch February 28, 2021 03:08

BruceForstall deleted the ImproveGuidEqualityChecks branch December 28, 2022 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Guid equality checks on 64-bit platforms #35654

Improve Guid equality checks on 64-bit platforms #35654

BruceForstall commented Apr 30, 2020

BruceForstall commented Apr 30, 2020

EgorBo commented Apr 30, 2020

gfoidl commented Apr 30, 2020

EgorBo commented Apr 30, 2020

tannergooding Apr 30, 2020

BruceForstall Apr 30, 2020

stephentoub Apr 30, 2020

jkotas commented Apr 30, 2020

BruceForstall commented Apr 30, 2020

stephentoub commented Oct 5, 2020

BruceForstall commented Oct 5, 2020

Improve Guid equality checks on 64-bit platforms #35654

Improve Guid equality checks on 64-bit platforms #35654

Conversation

BruceForstall commented Apr 30, 2020

BruceForstall commented Apr 30, 2020

EgorBo commented Apr 30, 2020

gfoidl commented Apr 30, 2020

EgorBo commented Apr 30, 2020

tannergooding Apr 30, 2020

Choose a reason for hiding this comment

BruceForstall Apr 30, 2020

Choose a reason for hiding this comment

stephentoub Apr 30, 2020

Choose a reason for hiding this comment

jkotas commented Apr 30, 2020

BruceForstall commented Apr 30, 2020

stephentoub commented Oct 5, 2020

BruceForstall commented Oct 5, 2020