AVX128: Prescale addresses in gathers if possible #3825

Sonicadvance1 · 2024-07-05T23:26:23Z

If the host supports SVE128, if the address element size and data size is 64-bit, and the scale is not one of the two that is supported by SVE; Then prescale the addresses.
64-bit address overflow masks the top bits so is well defined that we
can scale the vector elements and still execute the SVE code path in
that case. Removing the ASIMD code paths from a lot of gathers.

Fixes #3805

It's kind of cute that since the AVX256 implementation will fall back to the AVX128 implementation, that it naturally gains some of the uplift as well.

If the host supports SVE128, if the address element size and data size is 64-bit, and the scale is not one of the two that is supported by SVE; Then prescale the addresses. 64-bit address overflow masks the top bits so is well defined that we can scale the vector elements and still execute the SVE code path in that case. Removing the ASIMD code paths from a lot of gathers. Fixes FEX-Emu#3805

When loading 256-bits of data with only 128-bits of address indices, we can sign extend the source indices to be 64-bit. Thus falling down the ideal path for SVE where each 128-bit lane is loading the data to addresses in a 1:1 element ratio. This means we use the SVE path more often because of this. Based on top of FEX-Emu#3825 because the prescaling behaviour was introduced there. This implements its own prescaling when the sign extension occurs because ARM's SSHLL{,2} instruction gives us that for free. This additionally fixes a bug where we were accidentally loading the top 128-bit half of the addresses for gathers when it was unnecessary, and on the AVX256 side it was duplicating and doing some additional work when it shouldn't have. It'll be good to walk the commits when looking at this one, as there are a couple of incremental changes that are easier to follow that way. Fixes FEX-Emu#3806

Sonicadvance1 added 2 commits July 5, 2024 16:47

InstcountCI: Update for gather prescaling

6e8ca3b

Sonicadvance1 force-pushed the scale_64bit_gather branch from 0f1eab6 to 6e8ca3b Compare July 5, 2024 23:47

alyssarosenzweig approved these changes Jul 6, 2024

View reviewed changes

lioncash approved these changes Jul 6, 2024

View reviewed changes

Sonicadvance1 merged commit 47d077f into FEX-Emu:main Jul 6, 2024
11 checks passed

Sonicadvance1 deleted the scale_64bit_gather branch July 6, 2024 02:10

Sonicadvance1 mentioned this pull request Jul 6, 2024

AVX128: Extend 32-bit address indices when possible #3826

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX128: Prescale addresses in gathers if possible #3825

AVX128: Prescale addresses in gathers if possible #3825

Sonicadvance1 commented Jul 5, 2024 •

edited

Loading

AVX128: Prescale addresses in gathers if possible #3825

AVX128: Prescale addresses in gathers if possible #3825

Conversation

Sonicadvance1 commented Jul 5, 2024 • edited Loading

Sonicadvance1 commented Jul 5, 2024 •

edited

Loading