Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Fix logic to test bits in a constant vector (Vector256) #47385

Merged
merged 13 commits into from
Feb 16, 2021

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jan 24, 2021

Fixes #47236 bug

The bug was introduced in .NET 5 (doesn't reproduce on netcoreapp3.1) - do we want to backport the fix?

/cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 24, 2021
@EgorBo
Copy link
Member Author

EgorBo commented Jan 24, 2021

🤔 the code in my PR is not used on arm64 but arm64 legs crash with

Assert failure(PID 6060 [0x000017ac], Thread: 7556 [0x1d84]): Assertion failed 'isValidArrangement(size, opt)' in 'Program:TestVector64()' during 'Generate code' (IL size 1221)

    File: F:\workspace\_work\1\s\src\coreclr\jit\emitarm64.cpp Line: 380

on my tests. So I guess there is a different bug there.

Repro:

using System;
using System.Runtime.Intrinsics;

public static class Program
{
    public static void Main()
    {
        Console.WriteLine(Vector64.Create(0, 0));
        Console.ReadKey();
    }
}

if ((argCnt == 1) ||
((vecCns.i64[0] == vecCns.i64[1]) && ((simdSize <= 16) || (vecCns.i64[2] == vecCns.i64[3]))))
bool i64ComponentsEqual = true;
for (UINT32 i = 0; i < simdSize / 8; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a small bug here. simdSize / 8 will be 1 for simdSize == 12 and so it will miss the upper 4 bytes.
Likewise, doing a full secondary 8-byte comparison will fail for simdSize == 12 because the unused bytes 12-15 will always be zero.

The general intent of the first group is to catch the case where all elements are functionally equal and therefore replacing it with get_Zero or get_AllBitsSet is possible and so future replacements can likewise take place, if appropriate (e.g. it might be better to optimize to be broadcast or shuffle in some cases, like Clang does).

In practice, it handles this for argCnt == 1 by ensuring all elements are filled in here: https://github.com/dotnet/runtime/pull/47385/files#diff-8fe1c38ca575e327ad3227c06d009135287a438045cecbe3301d9a633a46cf0cR1465

However, I think it misses the mark when the args are explicitly defined or when simdSize == 12 and so the logic here likely needs adjusting.
The bytewise comparison that was being done before also misses the mark on checking "are all elements the same"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this probably needs to be backported, it might be good to have a simple fix there and to have a slightly more complex fix for .NET 6+?

In particular, the logic to determine "are all elements equal?" could be put into the HandleArgForHWIntrinsicCreate logic here: https://github.com/dotnet/runtime/pull/47385/files#diff-8fe1c38ca575e327ad3227c06d009135287a438045cecbe3301d9a633a46cf0cR1430-R1470, so we aren't looping or checking values unnecessarily.

We could then trivially check:

if (allElementsEqual)
{
    if (allBitsZero)
    {
    }
    else if (allBitsSet)
    {
    }
    // future logic handling other cases
}

Thoughts?

if (vecCns.i64[0] == 0)
bool allBitsAreUnset = true;
bool allBitsAreSet = true;
int i32ComponentsToInspect = argCnt == 1 ? 2 : simdSize / 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that simdSize % 4 == 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure, I was about to comment on your feedback but wanted to fix the arm64 issue first which is unrelated to my changes but fails on CI on my new tests.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 2, 2021

@tannergooding PTAL new approach, IMO should be the most transparent in case of back-porting.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct and like it matches the existing functionality without the bug minus the handling for simdSize == 12 on x86 when all elements are 0.

I think we want to visit this again for .NET 6 and better handle the intended use-case for this scenario and so we can ensure all the relevant sizes are handled.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 4, 2021

@dotnet/jit-contrib PTAL and do we want to backport it?

@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Feb 8, 2021
@JulieLeeMSFT JulieLeeMSFT added the Priority:2 Work that is important, but not critical for the release label Feb 8, 2021
@sandreenko sandreenko self-requested a review February 10, 2021 07:59
@EgorBo
Copy link
Member Author

EgorBo commented Feb 11, 2021

@sandreenko does it look good?

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few nits/questions.

src/coreclr/jit/lower.h Outdated Show resolved Hide resolved
src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved
Co-authored-by: Sergey Andreenko <[email protected]>
@EgorBo EgorBo merged commit 4823e7c into dotnet:master Feb 16, 2021
EgorBo added a commit to EgorBo/runtime-1 that referenced this pull request Feb 22, 2021
Anipik pushed a commit that referenced this pull request Mar 10, 2021
…#48613)

* [5.0] Fix logic to test bits in a constant vector (Vector256)
Backport of #47385

* rename folder

* rename test files
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Incorrect codegen for Vector256.Create(0, 0, 0, 0, 1, 1, 1, 1)
4 participants