[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659

matouskozak · 2023-09-06T08:03:28Z

Re-created PR that adds basic SIMD support for System.Numerics.Vector2 on arm64. Equaling the current support for System.Numerics.Vector4. Rename vector2_methods table to vector_2_3_4_methods to better reflect its usage.

Current SIMD support for Vector2 with mini/llvm:

SN_ctor
SN_Abs
SN_Add
SN_Clamp
SN_Divide (currently disabled Vector2 / float scenario, will enable in the next PR)
SN_Dot
SN_Max
SN_Min
SN_Multiply (same as with SN_Divide)
SN_Negate
SN_SquareRoot
SN_Subtract
SN_get_Item
SN_get_One
SN_get_UnitX
SN_get_UnitY
SN_get_Zero
SN_op_Addition
SN_op_Division
SN_op_Equality
SN_op_Inequality
SN_op_Multiply
SN_op_Subtraction
SN_op_UnaryNegation
SN_set_Item

Future work on the missing intrinsic is tracked here #91394.
Contributes to: #73462

p.s. These getters currently use 128-bit code paths for emitting const values (emit_xconst_v128) even for Vector2 (64-bit vector):

SN_get_UnitX
SN_get_UnitY
SN_get_One

Comment from @jandupej on the original PR:
You can use a fmov to flood the lower two floats with 1.0f. This gives you the fastest SN_get_One possible (there is a 64-bit variant of this, with q=0). To make SN_get_UnitX/Y you can shift the vector left or right as doubles by 32. Zeros are shifted in, so this will give you a (0.0f, 1.0f) or reverse. This will destroy the upper 64 bits of the register, but it shouldn't be a problem as only the lower 64 bits are of importance.

This reverts commit 27d244b.

matouskozak · 2023-09-06T12:56:23Z

/azp run runtime-extra-platforms

azure-pipelines · 2023-09-06T12:56:51Z

Azure Pipelines successfully started running 1 pipeline(s).

src/mono/mono/mini/mini-arm64.c

matouskozak · 2023-09-06T18:42:24Z

Perf_Vector2 microbenchmarks on osx arm64 JIT-mini:

	before	after	speed-up
CreateFromScalar	1.95	1.25	36%
OneBenchmark	1.96	0.88	55%
UnitXBenchmark	1.92	0.86	55%
UnitYBenchmark	1.91	0.89	54%
ZeroBenchmark	1.19	3.60	-202%
AddOperatorBenchmark	4.03	0.80	80%
DivideByVector2OperatorBenchmark	4.26	0.85	80%
DivideByScalarOperatorBenchmark	6.124	2.3218	62%
EqualityOperatorBenchmark	1.411	0.0359	97%
InequalityOperatorBenchmark	2.202	0.0221	99%
MultiplyOperatorBenchmark	4.081	0.7258	82%
MultiplyByScalarOperatorBenchmark	5.829	1.8867	68%
SubtractOperatorBenchmark	4.058	0.7137	82%
NegateOperatorBenchmark	4.143	0.7393	82%
AbsBenchmark	19.989	0.6328	97%
AddFunctionBenchmark	4.055	0.6135	85%
ClampBenchmark	15.859	0.8157	95%
DivideByVector2Benchmark	4.273	0.7721	82%
DivideByScalarBenchmark	6.453	2.4231	62%
DotBenchmark	2.085	0.1414	93%
MaxBenchmark	6.354	0.7172	89%
MinBenchmark	6.214	0.6811	89%
MultiplyFunctionBenchmark	4.261	0.8311	80%
NegateBenchmark	4.287	0.8247	81%
SquareRootBenchmark	10.355	0.7057	93%
SubtractFunctionBenchmark	4.239	0.7174	83%

Vector2.Zero is reporting 202% regression even though the emitted code looks correct:

0000000000000000        stp     x29, x30, [sp, #-0x50]!
0000000000000004        mov     x29, sp
0000000000000008        eor.8b  v0, v0, v0
000000000000000c        str     d0, [x29, #0x10]
0000000000000010        ldr     s0, [x29, #0x10]
0000000000000014        ldr     s1, [x29, #0x14]
0000000000000018        mov     sp, x29
000000000000001c        ldp     x29, x30, [sp], #0x50
0000000000000020        ret

jandupej · 2023-09-07T08:17:53Z

Perf_Vector2 microbenchmarks on osx arm64 JIT-mini:

These are some impressive speedups, nice!

As for Vector2.Zero, what you did is correct. However our register allocator likes to spill and reload every value you create, especially in FP/SIMD (see the instructions at 0x0c and 0x10). If you load a constant instead, maybe it will forgo spilling and only load from memory (?). Still, I'd keep what you did. If there are future improvements to constant folding or the reg allocator, this will likely go away.

matouskozak · 2023-09-07T08:59:23Z

/azp run runtime-extra-platforms

azure-pipelines · 2023-09-07T08:59:46Z

Azure Pipelines successfully started running 1 pipeline(s).

SamMonoRT · 2023-09-07T12:15:23Z

@LoopedBard3 - if the aot-llvm arm64 local testing script ready, please add a link to the documentation and @matouskozak you should try to get numbers for aot-llvm arm64 also if possible via that script.

matouskozak · 2023-09-18T11:53:14Z

The test failures are tracked/unrelated to this PR.

jandupej · 2023-09-19T09:21:57Z

src/mono/mono/mini/mini-arm64.c

@@ -3966,7 +3969,10 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb)
 		case OP_EXPAND_R4:
 		case OP_EXPAND_R8: {
 			const int t = get_type_size_macro (ins->inst_c1);
-			arm_neon_fdup_e (code, VREG_FULL, t, dreg, sreg1, 0);
+			if (ins->opcode == OP_EXPAND_R8)
+				arm_neon_fdup_e (code, VREG_FULL, t, dreg, sreg1, 0);


OP_EXPAND_R8 can be simplified to a mov dreg, sreg1 or nothing if dreg == sreg1.

src/mono/mono/mini/mini-arm64.c

fanyang-mono

LGTM!

matouskozak added 15 commits August 30, 2023 14:28

enable Vector2 on arm64

a1ead5f

Vector2 fabs arm64 instrinsics

429bd7f

Vector2 fmin and fmax arm64 intrinsics

cfee109

Vector 2 fdiv arm64 intrinsics

13e08d0

Vector2 Dot product intrinsics arm64

f223d29

Vector2 sub and sqrt intrinsics support for arm64

23c0003

Vector2 op_equality fix or arm64

c12429a

Support

e29c5ef

OP_XZERO 64-bit vector support arm64

9cab6af

OP_EXPAND/EXTRACT_R4/R8 for 64 bit vector arm64

27d244b

Revert "OP_EXPAND/EXTRACT_R4/R8 for 64 bit vector arm64"

9493095

This reverts commit 27d244b.

fix get/set_Item for Vector2

b191c22

remove Vector2 SIMD support from amd64 and wasm if

0691b2b

faddv 64-bit, extract VREG_FULL only

fe66c97

Merge branch 'main' into arm64-vector2-intrinsics

cf43f87

ghost assigned matouskozak Sep 6, 2023

dotnet-issue-labeler bot added the area-Codegen-JIT-mono label Sep 6, 2023

jandupej reviewed Sep 6, 2023

View reviewed changes

src/mono/mono/mini/mini-arm64.c Outdated Show resolved Hide resolved

src/mono/mono/mini/mini-arm64.c Outdated Show resolved Hide resolved

fix incorrect 64bit support for 64bit elements

763eb0c

rename methods table

a2873b4

matouskozak marked this pull request as ready for review September 6, 2023 19:46

matouskozak requested review from fanyang-mono, vargaz, lambdageek and SamMonoRT as code owners September 6, 2023 19:46

This was referenced Sep 7, 2023

[browser][MT] The WebSocket is not connected. #88084

Closed

[browser] Fragile WebSocket CI tests #90257

Closed

build-analysis bot mentioned this pull request Sep 7, 2023

Networking certificate test failures #91705

Closed

jandupej approved these changes Sep 19, 2023

View reviewed changes

fanyang-mono approved these changes Sep 19, 2023

View reviewed changes

matouskozak merged commit 09e796a into dotnet:main Sep 21, 2023

matouskozak mentioned this pull request Oct 3, 2023

JIT/SIMD/CircleInConvex_[r]/[ro] are failing on arm64 Mono interpreter #92925

Closed

ghost locked as resolved and limited conversation to collaborators Oct 21, 2023

matouskozak deleted the arm64-vector2-intrinsics branch October 3, 2024 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659

[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659

matouskozak commented Sep 6, 2023 •

edited

Loading

matouskozak commented Sep 6, 2023

azure-pipelines bot commented Sep 6, 2023

matouskozak commented Sep 6, 2023

jandupej commented Sep 7, 2023

matouskozak commented Sep 7, 2023

azure-pipelines bot commented Sep 7, 2023

SamMonoRT commented Sep 7, 2023

matouskozak commented Sep 18, 2023

jandupej Sep 19, 2023

fanyang-mono left a comment

[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659

[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659

Conversation

matouskozak commented Sep 6, 2023 • edited Loading

matouskozak commented Sep 6, 2023

azure-pipelines bot commented Sep 6, 2023

matouskozak commented Sep 6, 2023

jandupej commented Sep 7, 2023

matouskozak commented Sep 7, 2023

azure-pipelines bot commented Sep 7, 2023

SamMonoRT commented Sep 7, 2023

matouskozak commented Sep 18, 2023

jandupej Sep 19, 2023

Choose a reason for hiding this comment

fanyang-mono left a comment

Choose a reason for hiding this comment

matouskozak commented Sep 6, 2023 •

edited

Loading