-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Dot33 to SSE2 #17584
Merged
Merged
Convert Dot33 to SSE2 #17584
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might crash MSVC builds on x86, especially Debug. Not really a huge issue, but
vec
is not guaranteed to be aligned if passed on the stack (the stack is only aligned to 16 on x86_64.) I guess we could add a loadu and hope the compiler optimizes it out, but otherwise our "solve" has been to avoid intrinsics for x86_32 when they access XMM args directly.-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof. Didn't realize that MSVC had problems with it, so things like
https://stackoverflow.com/questions/10484422/msvc-cannot-send-function-parameters-of-16byte-alignment-on-x86
https://stackoverflow.com/questions/28488986/formal-parameter-with-declspecalign16-wont-be-aligned
are news to me. Sounds kind of bad.
Sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I forgot about it too when merging this. Funnily enough it might not actually even be noticable on recent CPUs since Intel relaxed the alignment requirement for SSE memory operands a while back.. but will affect old ones - which are the more likely ones to run in 32-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert just the ifdef change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, as I've heard other people say this same thing recently. But I was pretty sure this was ONLY when a VEX prefix is used (i.e. encoded as AVX/AVX2, regardless of using YMMs or not), which none of the non-jit code in PPSSPP is, at least on Windows. At least that's the case on Coffee Lake for sure, which isn't that old.
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, ok. I guess I remember that one wrong then - I've avoided those accesses anyway whenever I can of course..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just mention that
movups
is as fast asmovaps
for aligned data starting with Nehalem (Core i7). In fact, compilers often usemovups
even when implementing_mm_load_ps
or equivalent (copying, dereferencing). I think the guarantee is even stronger: you would need to cross a cache line boundary to get a slowdown.See e.g.:
https://stackoverflow.com/questions/52147378/choice-between-aligned-vs-unaligned-x86-simd-instructions
https://stackoverflow.com/questions/42697118/visual-studio-2017-mm-load-ps-often-compiled-to-movups
Adding
_mm_loadu_ps
can also changeaddps reg,mem
intomovups reg,mem
+addps reg,reg
, which... might not actually be slower. MSVC (and GCC) seems rather decent at eliminatingmovups reg,reg
, from the simple examples I tried.So, 'pepper everything with
_mm_loadu_ps
' approach doesn't seem to have significant drawbacks, other than visual litter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely compilers often optimize out loadu, but I have trust issues.
For example, here. On x86_64, ideally, I don't want this on the stack/memory, I want it from a register. An example is
Dot33<useSSE4>(H.NormalizedOr001(useSSE4), worldnormal)
. Realistically, worldnormal is probably spilled, but at least the normalized H (and H itself) could've been XMMs directly. I've seen where a loadu convinces MSVC that it ought to spill the normalized H (although to aligned) just to load it later. I mean, the code DID say to load it from memory, so I guess it's not "wrong".But some of that may have improved. I just feel like every time I look at a hot func that uses SIMD which MSVC is optimizing poorly, it's because it spills things it should NOT spill like crazy, or even unvectorizes things nonsensically. So I don't want to give it excuses to do that to me.
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
I suppose it's possible to
assuming we care enough.