Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX128: Prescale addresses in gathers if possible #3825

Merged
merged 2 commits into from
Jul 6, 2024

Conversation

Sonicadvance1
Copy link
Member

@Sonicadvance1 Sonicadvance1 commented Jul 5, 2024

If the host supports SVE128, if the address element size and data size is 64-bit, and the scale is not one of the two that is supported by SVE; Then prescale the addresses.
64-bit address overflow masks the top bits so is well defined that we
can scale the vector elements and still execute the SVE code path in
that case. Removing the ASIMD code paths from a lot of gathers.

Fixes #3805

It's kind of cute that since the AVX256 implementation will fall back to the AVX128 implementation, that it naturally gains some of the uplift as well.

If the host supports SVE128, if the address element size and data size is 64-bit, and the scale is not one of the two that is supported by SVE; Then prescale the addresses.
64-bit address overflow masks the top bits so is well defined that we
can scale the vector elements and still execute the SVE code path in
that case. Removing the ASIMD code paths from a lot of gathers.

Fixes FEX-Emu#3805
@Sonicadvance1 Sonicadvance1 merged commit 47d077f into FEX-Emu:main Jul 6, 2024
11 checks passed
@Sonicadvance1 Sonicadvance1 deleted the scale_64bit_gather branch July 6, 2024 02:10
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Jul 6, 2024
When loading 256-bits of data with only 128-bits of address indices, we
can sign extend the source indices to be 64-bit. Thus falling down the
ideal path for SVE where each 128-bit lane is loading the data to
addresses in a 1:1 element ratio.

This means we use the SVE path more often because of this.

Based on top of FEX-Emu#3825 because the prescaling behaviour was introduced
there. This implements its own prescaling when the sign extension occurs
because ARM's SSHLL{,2} instruction gives us that for free.

This additionally fixes a bug where we were accidentally loading the top
128-bit half of the addresses for gathers when it was unnecessary, and
on the AVX256 side it was duplicating and doing some additional work
when it shouldn't have.

It'll be good to walk the commits when looking at this one, as there are
a couple of incremental changes that are easier to follow that way.

Fixes FEX-Emu#3806
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Jul 7, 2024
When loading 256-bits of data with only 128-bits of address indices, we
can sign extend the source indices to be 64-bit. Thus falling down the
ideal path for SVE where each 128-bit lane is loading the data to
addresses in a 1:1 element ratio.

This means we use the SVE path more often because of this.

Based on top of FEX-Emu#3825 because the prescaling behaviour was introduced
there. This implements its own prescaling when the sign extension occurs
because ARM's SSHLL{,2} instruction gives us that for free.

This additionally fixes a bug where we were accidentally loading the top
128-bit half of the addresses for gathers when it was unnecessary, and
on the AVX256 side it was duplicating and doing some additional work
when it shouldn't have.

It'll be good to walk the commits when looking at this one, as there are
a couple of incremental changes that are easier to follow that way.

Fixes FEX-Emu#3806
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AVX128: Prescale gathers with 64-bit index and scale factor of 2 or 4
3 participants