-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate SVE for 80bit load/stores when possible #4166
base: main
Are you sure you want to change the base?
Conversation
Probably not a big win in practice. A single 80-bit store required 2 stores (64 + 16). Now, we require three instructions: mov + whilelt + st1b. You need three 80-bit stores in a block to get to a draw instruction-wise. The next step would be not to assemble the predicate register every time, which I will do next, but we'll still require in practice at least three stores per block for it to "win" instruction-wise. |
Converting to draft, so it's not merged by mistake. |
In preparation for FEX-Emu#4166 which should improve on these results.
106dc77
to
163f2a2
Compare
In preparation for FEX-Emu#4166 which should improve on these results.
3836479
to
4dff75c
Compare
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
4dff75c
to
e01a568
Compare
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
In preparation for FEX-Emu#4166 which should improve on these results.
af05434
to
e0ccc9c
Compare
e0ccc9c
to
070e833
Compare
This is almost ready - the bit missing is really that the predicate register is not yet being cached. I thought RA would take care of this but... apparently there's still some magic missing. |
Fixes #4126