Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204

Merged
merged 15 commits into from
Apr 18, 2020

Conversation

tannergooding
Copy link
Member

This resolves #34094 by updating several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs when their isn't a direct comparison mode supported by the underlying hardware.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 27, 2020
@tannergooding
Copy link
Member Author

CC. @CarolEidt, @echesakovMSFT

As called out in #34094 this is a breaking change if one of the inputs was NaN, but it was also a bug and caused a difference in behavior if you were using FloatComparisonMode directly on AVX enabled hardware.

@tannergooding tannergooding added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Mar 27, 2020

if (compSupports(InstructionSet_AVX))
{
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, op2, gtNewIconNode(14), NI_AVX_Compare, baseType, simdSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On AVX hardware, we can just use the hardware supported comparison mode. On non-AVX enabled hardware, we have to fallback to doing a different operation with the operands swapped.

nullptr DEBUGARG("Clone op1 for Sse.CompareScalarGreaterThan"));

retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op2, op1, NI_SSE_CompareScalarLessThan, baseType, simdSize);
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, clonedOp1, retNode, NI_SSE_MoveScalar, baseType, simdSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the scalar versions, the non-AVX path needs to ensure that CopiesUpperBits is still respected, so we have to do an additional MoveScalar operation.

@tannergooding
Copy link
Member Author

@BruceForstall, @jeffhandley, @terrajobst; What is the current process for getting sign-off on breaking changes like this?

@tannergooding
Copy link
Member Author

Just a note, this is the kind of change that could have been easily handled in managed land (rather than in importation) if we weren't blocked due to the more complex trees that it produces (#956).

@terrajobst
Copy link
Member

@BruceForstall, @jeffhandley, @terrajobst; What is the current process for getting sign-off on breaking changes like this?

I don't believe we have centralized sign-off process for this anymore. @ericstj @PriyaPurkayastha what are your thoughts?

@PriyaPurkayastha
Copy link

Correct, there is no compat council that reviews/approves breaking changes. Review happens during Tactics. General guidelines that are given to teams is to determine what are the driving factors for making the breaking change (is this a customer reported issue etc.). What is the cost to make the change in a compatible way - can opt-in/opt-out switches be provided? Should additional data be gathered to understand impact of change? (e.g. contact .NET Technical Insights team). Other action items are to add/update functional tests for code paths changed and documenting the breaking change by using the https://github.com/dotnet/docs/issues/new?template=dotnet-breaking-change.md issue template.
@marklio do you have additional comments/feedback for this proposed breaking change?

@marklio
Copy link

marklio commented Mar 30, 2020 via email

@tannergooding
Copy link
Member Author

@marklio, see below

  • It sounds like whether this is breaking depends slightly on what kind of hardware you’re on a) is that correct, and b) do we have data or an intuition about what the split is on the hardware distribution?

Not quite. Basically there exists two instruction sets for the purposes of this discussion. SSE2 (which has been around since 2000 and is a baseline requirement for .NET Core) and AVX (which has been around since 2011 and which, to my knowldege, is available in all Intel/AMD based VMs on Azure).

Today, if using the 8x Sse2.Compare* APIs that do GreaterThan, GreaterThanOrEqual, NotGreaterThan or NotGreaterThanOrEqual comparisons you get a behavior that doesn't correctly handle NaN (not-a-number) inputs. These APIs are available on any x86 (Intel/AMD) machine running .NET Core 3.0 or later. This is because the correct implementation of these functions must be "emulated" as there doesn't exist direct hardware support and the overloads were provided for completeness and parity with the equivalent feature in C/C++, Rust, and other langugages. floating-point behaves a bit differently form normal math, especially around NaN inputs, and so the normal inversion rules don't quite apply (that is NotGreaterThan can't be implemented as LessThanOrEqual). This was missed in review, hence the bug.

If using the related Avx.Compare* APIs, you get a different behavior that does correctly handle NaN because there is direct hardware support. This is only available on machines from 2011 and later, but is still fairly prominent being nearly 10 years old and with it being available on machines in Azure, AWS, etc.

  • Are there APIs we can look for in the ecosystem to get an idea of the impact of a change?

These are new APIs only introduced in .NET Core 3.0 (Sep 2019, already end of support) and available in .NET Core 3.1 (Dec 2019, supported until Dec 2022).

They are also extremely low level/advanced APIs that are designed to be (and were documented as being) essentially a 1-to-1 mapping with certain instructions exposed by the underlying hardware. You can only use certain APIs if your hardware supports it and they are meant to be used in high-performance/unsafe scenarios.

The usages, as such, would likely be limited (new and advanced use-case API) and hard to find. We aren't using them in the framework ourselves.

  • How would customers encounter this as a break? Different output from floating point comparisons? Exceptions? How do these behaviors compare to the documentation? (are we standardizing on documented behavior? Or do the docs not cover this)

You would get a different result as part of the comparison if either input contained a NaN floating-point value. As for documented behavior, we document ourselves to be compatible with the native equivalents of the intrinsics and with the underlying hardware instructions.

  • How does the behavior compare to .NET Framework? (if relevant)

This is not supported on .NET Framework, it is only available on .NET Core 3.0 and later.

@marklio
Copy link

marklio commented Mar 30, 2020

Cool. So what scenarios would lead folks to calling the problematic overloads on Sse2 over the AVX ones? For the sake of argument, are those scenarios worth fixing? Should they just be deprecated? Should we handle this with an analyzer/fixer that calls the "working" API?

I assume that fixing the bug in 3.x wouldn't meet the servicing bar?

To be clear, you're under no obligation to convince me of anything. I'm just trying to help build a case for taking the fix and deciding how help customers through any pain. It seems like anyone who knows what they're doing will expect appropriate behavior from these APIs, and we haven't gotten feedback probably because very few people are using these APIs, and those who are probably aren't pushing NaN's through them. In which case, documenting the breaking change and making it probably makes the most sense.

@tannergooding
Copy link
Member Author

Cool. So what scenarios would lead folks to calling the problematic overloads on Sse2 over the AVX ones? For the sake of argument, are those scenarios worth fixing? Should they just be deprecated? Should we handle this with an analyzer/fixer that calls the "working" API?

The primary reason would be wanting to support downlevel hardware while also accelerating further (such as by operating on 256-bits per iteration, rather than 128-bits) on newer hardware.
We could fix it with an analyzer/obsoleting the existing API, but seems like a poor alternative to just fixing it, given how new the API is and that the alternatives aren't as straightforward to use.

That is, the Avx overloads are primarily for 256-bit operations (vs the 128-bit operations that Sse2 provides) and the Avx overload that does operate on 128-bits takes a much more complex enum where the operation mapping isn't as straightforward (CompareGreaterThan(left, right) would be Compare(left, right, FloatComparisonMode.OrderedGreaterThanSignaling);). Likewise, on hardware without Avx support, you have to know that CompareGreaterThan(left, right) needs to be implemented as CompareLessThan(right, left) and that you need an additional MoveScalar(result, left) to ensure bit propagation remains for the Scalar variants of the APIs.

I assume that fixing the bug in 3.x wouldn't meet the servicing bar?

I'm unsure whether or not this would meet the bar and would defer to @jeffhandley and/or @BruceForstall.

@marklio
Copy link

marklio commented Mar 30, 2020

I definitely wouldn't want to break folks in servicing for this. It was more of a rhetorical question about whether it had been considered. I definitely could have phrased that more clearly.

@jeffhandley
Copy link
Member

I definitely wouldn't want to break folks in servicing for this.

So is fixing it only in .NET 5.0+ the best approach here, @marklio? @tannergooding, do you know of any reason to lobby for it to be fixed in 3.x, or would 5.0 be OK with you?

@marklio
Copy link

marklio commented Mar 31, 2020

So is fixing it only in .NET 5.0+ the best approach here, @marklio?

Yes, that would be the position I'd take to tactics. Take the fix in 5.0 and document through the breaking change process.

We wouldn't bring this for 3.x because:

  • It is a breaking change.
  • We don't have any customers hitting this. (likely few folks using it, and they may not be passing NaN
  • We think people using these features are advanced folks who will likely be moving to bleeding edge to get more of these features, and won't mind moving to 5 to get this fix (this is conjecture on my part, so we have any data to back this up? I'll look for some tomorrow).

We want the fix in 5.0 because:

  • It is "correct" behavior WRT specification, and moreover represents a consolidation of similar behaviors being added to the product.
  • We have time to get feedback, and a feature audience that is likely adopting new versions quickly (again, conjecture)

I'd also watch for signal in previews of anyone playing with 5.0 that encounters this as a break in their code. That might lead us to other mitigations.

@BruceForstall
Copy link
Member

A minor point is that if we do NOT fix it in servicing, and people start using this relatively new API in their code, we could end up with MORE user code that needs to be updated in the long run.

I wonder (and this is a general query, not just for this case) if there is a way we can (or should) annotate the 3.1 documentation now indicating a post-3.1 breaking change has been made to a particular API, to try and encourage people not to take a dependency on the subsequently-broken behavior.

@saucecontrol
Copy link
Member

In this case, there's a clean and forward-compatible workaround, so it's not a big deal if it isn't fixed in 3.1. But if it weren't as simple, the argument that the breakage footprint will never be smaller than now is really important.

@jeffhandley
Copy link
Member

We solidified the decision today, @tannergooding:

  1. Make the fix in .NET 5.0
  2. Do not apply the fix to 3.x
  3. Update documentation to call out the bug and the upcoming breaking change, illustrating a forward-compatible workaround

@tannergooding
Copy link
Member Author

I've rebased ontop of current master and this should be ready for review. I'll get a change up for the docs-repo that calls out the breaking change before merging.

@PriyaPurkayastha
Copy link

Here's the breaking change issue template to be used https://github.com/dotnet/docs/issues/new?template=dotnet-breaking-change.md

@tannergooding
Copy link
Member Author

Had to update to use the new compExactlyDependsOn method.

@tannergooding
Copy link
Member Author

Test failures are unrelated and tracked by #34905

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I just had a question about whether it could be simplified.


// The Prefetch and StoreFence intrinsics don't take any SIMD operands
// and have a simdSize of 0
assert((simdSize == 16) || (simdSize == 0));

switch (intrinsic)
{
case NI_SSE_CompareGreaterThan:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these cases are all basically the same, with different constants, it would seem that you could add a function to determine the constant to use for the AVX intrinsic, based on the constant associated with the SSE intrinsic. Does that make sense or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we probably could define a getInverseFloatingComparison method or something similar. A similar function could also be useful in lowering as it would allow either operand to be contained.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new function HWIntrinsicInfo::lookupFloatingComparisonForSwappedArgs and cleaned up the importation logic.

@tannergooding

This comment has been minimized.

@BruceForstall
Copy link
Member

You can disable the formatter around this section if that makes the most sense.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (minus comment about C++ comments :) )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI breaking-change Issue or PR that represents a breaking API or functional change over a prerelease.
Projects
None yet