Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect UQRSHL implementation. #1272

Merged
merged 1 commit into from
Jan 31, 2025
Merged

Conversation

Syonyk
Copy link
Contributor

@Syonyk Syonyk commented Jan 30, 2025

UQRSHL was incorrect in several ways, and did not match hardware. Shift checks for saturating were not modified from the signed version, leading to incorrect behavior at (untested) edge cases.

Further, for the 32-bit and 64-bit versions, the rounding math was incorrect with a -32 or -64 shift, and did not match hardware. This has been special cased to do the proper thing at the edge case. The core issue is overflowing the stock 32-bit or 64-bit types used at the limit.

Tests have been modified with substantially increased test cases to properly exercise these changes. The new tests should cause failures in the unsigned cases without the modified code, and should pass with the changes. Additional signed tests were added to handle the edge cases, though no incorrect behavior was observed on those.

Test cases generated on a Google C4a ARMv8.4 machine.

Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Syonyk !

UQRSHL was incorrect in several ways, and did not match hardware.
Shift checks for saturating were not modified from the signed
version, leading to incorrect behavior at (untested) edge cases.

Further, for the 32-bit and 64-bit versions, the rounding math was
incorrect with a -32 or -64 shift, and did not match hardware.  This
has been special cased to do the proper thing at the edge case.  The
core issue is overflowing the stock 32-bit or 64-bit types used at
the limit.

Tests have been modified with substantially increased test cases to
properly exercise these changes.  The new tests should cause failures
in the unsigned cases without the modified code, and should pass with
the changes.  Additional signed tests were added to handle the edge
cases, though no incorrect behavior was observed on those.

Test cases generated on a Google C4a ARMv8.4 machine.
@mr-c mr-c enabled auto-merge (rebase) January 31, 2025 13:12
@mr-c mr-c merged commit 2c6adb6 into simd-everywhere:master Jan 31, 2025
113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants