-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge CheckAlpha into texture decoding #15474
Conversation
int w, int h, int bufw, bool reverseColors, bool useBGRA, | ||
u32 *alphaSum, u32 *fullAlphaMask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we should use a struct. My general rule of thumb is if I have so many arguments that I'm motivated to put some on another line, then I have too many arguments.
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's probably time to clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not gonna bother for now.. I got rid of the new parameters from the calling function DecodeTextureLevel
though.
9c51930
to
fe035dc
Compare
OK, this is probably nearly ready to go, barring any more silly bugs. Greatly speeds up decoding of large textures in Vulkan, though there's a bit more to eke out here and there. |
bde5737
to
a5ee188
Compare
Thanks for looking, fixed the issues! This currently also includes #15481 as the first commit, rebased on that. But let's merge that one before this one. |
Fixes a recently discovered performance issue with CheckAlpha - on Vulkan and D3D we write decoded texture data directly to video memory, which can be super slow to read from, depending on driver, platform, phase of moon, etc.
This changes the code to "sum up" (AND together) the colors, and then check against the full alpha mask at the end.
Since we can remove CheckAlpha entirely this should improve performance despite the slight extra checking. And even so, it doesn't increase memory traffic, just some extra arithmetic which rarely is the bottleneck in modern out-of-order CPUs.
Disabled the optimization to deswizzle raw textures directly to VRAM, but no big deal as they're not that common and it's pretty fast anyway. Also, I check the input separately in a few cases to avoid writing more special case functions, so there's still room for minor optimizations.