-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saturating casts for integers, like _mm_packus_epi16
and _mm_packs_epi16
#369
Comments
Is there a saturating cast for regular (scalar) integers? Not that that's a requirement, but it's a good guide as to what to include. I'm not opposed to adding it, though. It's worth noting that |
Not that I'm aware of in Part of what motivated this issue was that I was using |
That's definitely an LLVM issue, I'm not sure why it only happens with the 4-length vectors. I opened the linked issue to track this. I reviewed the LLVM source and /// Detect patterns of truncation with unsigned saturation:
///
/// 1. (truncate (umin (x, unsigned_max_of_dest_type)) to dest_type).
/// Return the source value x to be truncated or SDValue() if the pattern was
/// not matched.
///
/// 2. (truncate (smin (smax (x, C1), C2)) to dest_type),
/// where C1 >= 0 and C2 is unsigned max of destination type.
///
/// (truncate (smax (smin (x, C2), C1)) to dest_type)
/// where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
///
/// These two patterns are equivalent to:
/// (truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)
/// So return the smax(x, C1) value to be truncated or SDValue() if the
/// pattern was not matched.
static SDValue detectUSatPattern(SDValue In, EVT VT, SelectionDAG &DAG,
const SDLoc &DL) { |
Thanks for that. I've possibly come across another issue with 4-vectors and casting from Feel free to move this to its own issue if needed. |
Both optimization patterns have now been fixed upstream. The saturating truncating cast fix is in LLVM 18, available on nightly |
My most common use case is for saturating narrowing casts, such as from
i16
tou8
. It would be great to take advantage of native instructions without first having to usesimd_clamp
.saturating_cast
would reduce code and chances of errors from clamping with incorrect values.I'm not sure about the availability of this feature on other architectures. It might make sense to split this for narrowing and widening features based on availability. I'm not familiar with other architectures, but it looks like ARM has narrowing casts with
VQMOVN
/VQMOVUN
.Perhaps there could be a fallback which does
simd_clamp().cast()
(or other optimal construction) for architectures that don't have these instructions.The text was updated successfully, but these errors were encountered: