-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT: Fix logic to test bits in a constant vector (Vector256) #47385
Conversation
🤔 the code in my PR is not used on arm64 but arm64 legs crash with
on my tests. So I guess there is a different bug there. Repro: using System;
using System.Runtime.Intrinsics;
public static class Program
{
public static void Main()
{
Console.WriteLine(Vector64.Create(0, 0));
Console.ReadKey();
}
} |
src/coreclr/jit/lowerxarch.cpp
Outdated
if ((argCnt == 1) || | ||
((vecCns.i64[0] == vecCns.i64[1]) && ((simdSize <= 16) || (vecCns.i64[2] == vecCns.i64[3])))) | ||
bool i64ComponentsEqual = true; | ||
for (UINT32 i = 0; i < simdSize / 8; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually a small bug here. simdSize / 8
will be 1
for simdSize == 12
and so it will miss the upper 4 bytes.
Likewise, doing a full secondary 8-byte comparison will fail for simdSize == 12
because the unused bytes 12-15 will always be zero.
The general intent of the first group is to catch the case where all elements are functionally equal and therefore replacing it with get_Zero
or get_AllBitsSet
is possible and so future replacements can likewise take place, if appropriate (e.g. it might be better to optimize to be broadcast
or shuffle
in some cases, like Clang
does).
In practice, it handles this for argCnt == 1
by ensuring all elements are filled in here: https://github.com/dotnet/runtime/pull/47385/files#diff-8fe1c38ca575e327ad3227c06d009135287a438045cecbe3301d9a633a46cf0cR1465
However, I think it misses the mark when the args are explicitly defined or when simdSize == 12
and so the logic here likely needs adjusting.
The bytewise comparison that was being done before also misses the mark on checking "are all elements the same"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this probably needs to be backported, it might be good to have a simple fix there and to have a slightly more complex fix for .NET 6+?
In particular, the logic to determine "are all elements equal?" could be put into the HandleArgForHWIntrinsicCreate
logic here: https://github.com/dotnet/runtime/pull/47385/files#diff-8fe1c38ca575e327ad3227c06d009135287a438045cecbe3301d9a633a46cf0cR1430-R1470, so we aren't looping or checking values unnecessarily.
We could then trivially check:
if (allElementsEqual)
{
if (allBitsZero)
{
}
else if (allBitsSet)
{
}
// future logic handling other cases
}
Thoughts?
…1 into fix-vector256-create-cns
src/coreclr/jit/lowerxarch.cpp
Outdated
if (vecCns.i64[0] == 0) | ||
bool allBitsAreUnset = true; | ||
bool allBitsAreSet = true; | ||
int i32ComponentsToInspect = argCnt == 1 ? 2 : simdSize / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert that simdSize % 4 == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure, I was about to comment on your feedback but wanted to fix the arm64 issue first which is unrelated to my changes but fails on CI on my new tests.
@tannergooding PTAL new approach, IMO should be the most transparent in case of back-porting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct and like it matches the existing functionality without the bug minus the handling for simdSize == 12
on x86 when all elements are 0
.
I think we want to visit this again for .NET 6 and better handle the intended use-case for this scenario and so we can ensure all the relevant sizes are handled.
@dotnet/jit-contrib PTAL and do we want to backport it? |
@sandreenko does it look good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, few nits/questions.
Co-authored-by: Sergey Andreenko <[email protected]>
Fixes #47236 bug
The bug was introduced in .NET 5 (doesn't reproduce on
netcoreapp3.1
) - do we want to backport the fix?/cc @dotnet/jit-contrib