-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204
Conversation
CC. @CarolEidt, @echesakovMSFT As called out in #34094 this is a breaking change if one of the inputs was NaN, but it was also a bug and caused a difference in behavior if you were using |
|
||
if (compSupports(InstructionSet_AVX)) | ||
{ | ||
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, op2, gtNewIconNode(14), NI_AVX_Compare, baseType, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On AVX hardware, we can just use the hardware supported comparison mode. On non-AVX enabled hardware, we have to fallback to doing a different operation with the operands swapped.
nullptr DEBUGARG("Clone op1 for Sse.CompareScalarGreaterThan")); | ||
|
||
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op2, op1, NI_SSE_CompareScalarLessThan, baseType, simdSize); | ||
retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, clonedOp1, retNode, NI_SSE_MoveScalar, baseType, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the scalar versions, the non-AVX path needs to ensure that CopiesUpperBits
is still respected, so we have to do an additional MoveScalar
operation.
@BruceForstall, @jeffhandley, @terrajobst; What is the current process for getting sign-off on breaking changes like this? |
Just a note, this is the kind of change that could have been easily handled in managed land (rather than in importation) if we weren't blocked due to the more complex trees that it produces (#956). |
I don't believe we have centralized sign-off process for this anymore. @ericstj @PriyaPurkayastha what are your thoughts? |
Correct, there is no compat council that reviews/approves breaking changes. Review happens during Tactics. General guidelines that are given to teams is to determine what are the driving factors for making the breaking change (is this a customer reported issue etc.). What is the cost to make the change in a compatible way - can opt-in/opt-out switches be provided? Should additional data be gathered to understand impact of change? (e.g. contact .NET Technical Insights team). Other action items are to add/update functional tests for code paths changed and documenting the breaking change by using the https://github.com/dotnet/docs/issues/new?template=dotnet-breaking-change.md issue template. |
Things I would be interested in:
* It sounds like whether this is breaking depends slightly on what kind of hardware you’re on a) is that correct, and b) do we have data or an intuition about what the split is on the hardware distribution?
* Are there APIs we can look for in the ecosystem to get an idea of the impact of a change?
* How would customers encounter this as a break? Different output from floating point comparisons? Exceptions? How do these behaviors compare to the documentation? (are we standardizing on documented behavior? Or do the docs not cover this)
* How does the behavior compare to .NET Framework? (if relevant)
It definitely sounds like the new behavior is “more correct” and that we definitely want it to be the behavior going forward. So, the discussion is likely going to be about how we help customers absorb the change (rather than whether we would “approve” the change).
[Edited to remove the gunk that ended up here due to responding to the github email]
|
@marklio, see below
Not quite. Basically there exists two instruction sets for the purposes of this discussion. SSE2 (which has been around since 2000 and is a baseline requirement for .NET Core) and AVX (which has been around since 2011 and which, to my knowldege, is available in all Intel/AMD based VMs on Azure). Today, if using the 8x If using the related
These are new APIs only introduced in .NET Core 3.0 (Sep 2019, already end of support) and available in .NET Core 3.1 (Dec 2019, supported until Dec 2022). They are also extremely low level/advanced APIs that are designed to be (and were documented as being) essentially a 1-to-1 mapping with certain instructions exposed by the underlying hardware. You can only use certain APIs if your hardware supports it and they are meant to be used in high-performance/unsafe scenarios. The usages, as such, would likely be limited (new and advanced use-case API) and hard to find. We aren't using them in the framework ourselves.
You would get a different result as part of the comparison if either input contained a
This is not supported on .NET Framework, it is only available on .NET Core 3.0 and later. |
Cool. So what scenarios would lead folks to calling the problematic overloads on Sse2 over the AVX ones? For the sake of argument, are those scenarios worth fixing? Should they just be deprecated? Should we handle this with an analyzer/fixer that calls the "working" API? I assume that fixing the bug in 3.x wouldn't meet the servicing bar? To be clear, you're under no obligation to convince me of anything. I'm just trying to help build a case for taking the fix and deciding how help customers through any pain. It seems like anyone who knows what they're doing will expect appropriate behavior from these APIs, and we haven't gotten feedback probably because very few people are using these APIs, and those who are probably aren't pushing NaN's through them. In which case, documenting the breaking change and making it probably makes the most sense. |
The primary reason would be wanting to support downlevel hardware while also accelerating further (such as by operating on 256-bits per iteration, rather than 128-bits) on newer hardware. That is, the
I'm unsure whether or not this would meet the bar and would defer to @jeffhandley and/or @BruceForstall. |
I definitely wouldn't want to break folks in servicing for this. It was more of a rhetorical question about whether it had been considered. I definitely could have phrased that more clearly. |
So is fixing it only in .NET 5.0+ the best approach here, @marklio? @tannergooding, do you know of any reason to lobby for it to be fixed in 3.x, or would 5.0 be OK with you? |
Yes, that would be the position I'd take to tactics. Take the fix in 5.0 and document through the breaking change process. We wouldn't bring this for 3.x because:
We want the fix in 5.0 because:
I'd also watch for signal in previews of anyone playing with 5.0 that encounters this as a break in their code. That might lead us to other mitigations. |
A minor point is that if we do NOT fix it in servicing, and people start using this relatively new API in their code, we could end up with MORE user code that needs to be updated in the long run. I wonder (and this is a general query, not just for this case) if there is a way we can (or should) annotate the 3.1 documentation now indicating a post-3.1 breaking change has been made to a particular API, to try and encourage people not to take a dependency on the subsequently-broken behavior. |
In this case, there's a clean and forward-compatible workaround, so it's not a big deal if it isn't fixed in 3.1. But if it weren't as simple, the argument that the breakage footprint will never be smaller than now is really important. |
We solidified the decision today, @tannergooding:
|
I've rebased ontop of current master and this should be ready for review. I'll get a change up for the docs-repo that calls out the breaking change before merging. |
Here's the breaking change issue template to be used https://github.com/dotnet/docs/issues/new?template=dotnet-breaking-change.md |
2774dc6
to
0e2130f
Compare
Had to update to use the new |
0e2130f
to
3991a2c
Compare
Test failures are unrelated and tracked by #34905 |
3991a2c
to
e68d7ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I just had a question about whether it could be simplified.
|
||
// The Prefetch and StoreFence intrinsics don't take any SIMD operands | ||
// and have a simdSize of 0 | ||
assert((simdSize == 16) || (simdSize == 0)); | ||
|
||
switch (intrinsic) | ||
{ | ||
case NI_SSE_CompareGreaterThan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these cases are all basically the same, with different constants, it would seem that you could add a function to determine the constant to use for the AVX intrinsic, based on the constant associated with the SSE intrinsic. Does that make sense or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we probably could define a getInverseFloatingComparison
method or something similar. A similar function could also be useful in lowering as it would allow either operand to be contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new function HWIntrinsicInfo::lookupFloatingComparisonForSwappedArgs
and cleaned up the importation logic.
fc167f4
to
a7467da
Compare
This comment has been minimized.
This comment has been minimized.
You can disable the formatter around this section if that makes the most sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (minus comment about C++ comments :) )
This resolves #34094 by updating several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs when their isn't a direct comparison mode supported by the underlying hardware.