-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Fix over-shifting logic when constant-folding AdvSimd.ShiftRight*
and AdvSimd.ShiftLeft*
#105847
Conversation
cc @dotnet/jit-contrib, @tannergooding PTAL, thanks! |
if (AdvSimd.IsSupported) | ||
{ | ||
var vr6 = Vector64.Create<long>(1); | ||
var vr7 = AdvSimd.ShiftRightLogicalScalar(vr6, 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check an out of range value like 65
? With the other change it should now be executing the ShiftLogical
path rather than throwing, so we want to ensure its computing the right result between the unoptimized vs optimized as well.
It may be interesting to check it for a non-constant or a value that only becomes constant after importation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. #105777 only added the fallback implementation for ShiftRightLogical
, so ShiftRightLogicalScalar
, ShiftRightArithmetic
, and ShiftRightArithmeticScalar
still throw in Debug. Are you ok with me adding their fallback implementations in this PR, or would you prefer that work in a separate one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't using the fallback APIs for non-const immediates either, so I added the HW_Flag_NoJmpTableIMM
flag to the ShiftRight*
APIs so we always use the fallback instead of a jump table.
Using the fallback intrinsics instead of jump tables has some nice diffs. |
if (AdvSimd.IsSupported) | ||
{ | ||
var vr3 = Vector64.Create<long>(128); | ||
var vr4 = AdvSimd.ShiftRightArithmeticScalar(vr3, getLongImmOOB()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add equivalent tests for ShiftLeftLogical
and ShiftLeftLogicalScalar
?
It should help ensure we don't have +1
more issues to handle from some missed edge case and will help ensure correct behavior in any future refactorings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, looks like ShiftLeft*
suffers from the same issues we ran into initially with ShiftRight*
. In Debug, we don't use ShiftLogical
as a fallback intrinsic, and we end up throwing an exception for out-of-range immediates. In Release, we constant-fold GT_LSH
operands, but we don't allow over-shifting on ARM64, so the constant-folded behavior doesn't match what ShiftLogical
would do. I'll add the fallback implementation in this PR to fix this in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. Would be nice to explicitly cover ShiftLeftLogical
in the tests as well
AdvSimd.ShiftRight*
AdvSimd.ShiftRight*
and AdvSimd.ShiftLeft*
@tannergooding I added fallback intrinsics for the shift-left APIs, and added test coverage. Can you PTAL? Thanks! |
Fixes #105817.
AdvSimd.ShiftRightLogical
,AdvSimd.ShiftRightLogicalScalar
,AdvSimd.ShiftRightArithmetic
, andAdvSimd.ShiftRightArithmeticScalar
all have immediate bounds of[1, sizeof(T) * 8]
, meaning they support "over-shifting" values bysizeof(T) * 8
bits such that they become zero. The constant folding logic for these shift operations erroneously assumes over-shifting is unique to xarch intrinsics; since the above intrinsics are the only ARM64 intrinsics that leverage this folding logic at the moment, allowing over-shifting on these paths for ARM64 intrinsics seems to be the simplest solution.