-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659
[mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91659
Conversation
This reverts commit 27d244b.
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
Perf_Vector2 microbenchmarks on osx arm64 JIT-mini:
|
These are some impressive speedups, nice! As for |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
@LoopedBard3 - if the aot-llvm arm64 local testing script ready, please add a link to the documentation and @matouskozak you should try to get numbers for aot-llvm arm64 also if possible via that script. |
The test failures are tracked/unrelated to this PR. |
@@ -3966,7 +3969,10 @@ mono_arch_output_basic_block (MonoCompile *cfg, MonoBasicBlock *bb) | |||
case OP_EXPAND_R4: | |||
case OP_EXPAND_R8: { | |||
const int t = get_type_size_macro (ins->inst_c1); | |||
arm_neon_fdup_e (code, VREG_FULL, t, dreg, sreg1, 0); | |||
if (ins->opcode == OP_EXPAND_R8) | |||
arm_neon_fdup_e (code, VREG_FULL, t, dreg, sreg1, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OP_EXPAND_R8
can be simplified to a mov dreg, sreg1
or nothing if dreg == sreg1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Re-created PR that adds basic SIMD support for
System.Numerics.Vector2
on arm64. Equaling the current support forSystem.Numerics.Vector4
. Renamevector2_methods
table tovector_2_3_4_methods
to better reflect its usage.Current SIMD support for
Vector2
with mini/llvm:Vector2
/float
scenario, will enable in the next PR)Future work on the missing intrinsic is tracked here #91394.
Contributes to: #73462
p.s. These getters currently use 128-bit code paths for emitting const values (
emit_xconst_v128
) even for Vector2 (64-bit vector):Comment from @jandupej on the original PR:
You can use a fmov to flood the lower two floats with 1.0f. This gives you the fastest SN_get_One possible (there is a 64-bit variant of this, with q=0). To make SN_get_UnitX/Y you can shift the vector left or right as doubles by 32. Zeros are shifted in, so this will give you a (0.0f, 1.0f) or reverse. This will destroy the upper 64 bits of the register, but it shouldn't be a problem as only the lower 64 bits are of importance.