Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor VRAM download, optimize rendered CLUTs #8389

Merged
merged 10 commits into from
Jan 5, 2016

Conversation

unknownbrackets
Copy link
Collaborator

This fixes #8252. I set out to fix that and I ended up greatly improving the performance of Brave Story in battles with the CLUT download.

Turns out, we were downloading too much, and doing it over and over. This optimization is huge for Brave Story, pretty much as good as #8246... but note that one without a bunch more work, that won't work with texture scaling.

Anyway, the only real downside of this is that it adds another flag that needs to be reset on every render. I think the driver overhead is larger and this is fairly cheap, hopefully.

Improvement:
Boss battle (fairy complicated enemies): 192% -> 696%
Rabbit battle: 469% -> 745%

These changes don't prevent #8246, but for reference, the same numbers with that branch:

Boss battle: 640%
Rabbit battle: 800%

Based on the way the game renders (using a few different CLUTs for a set of textures that draw the enemy), we end up re-rendering several textures many times. In the boss battle especially. This is why the changes in this pull are actually faster for that example.

-[Unknown]

This way the GPU doesn't think it needs to load anything, it's all being
overwritten.  If we're only using part of the framebuffer, the other parts
don't matter.
This avoids triggering logic that tries to get the sizing right, or
optimize frequent copies.  CLUTs often get estimated wrong, so it's better
to copy just the correct range, always.
Sometimes we don't need the full width, such as when we're downloading a
CLUT.  In Brave Story, the CLUTs overlap in detected width, so this is a
real improvement.
In Brave Story, the game reloads the CLUT frequently, but doesn't actually
render to the CLUT that often.  It also switches between a few different
rendered CLUTs - so caching that we've downloaded is a HUGE win.

In case someone reading this message is interested, it actually renders
these CLUT tables from what appears to be a color wheel.  Crazy huh?
@daniel229
Copy link
Collaborator

Great, but broken in kurohyo 2.
Start menu
01

@unknownbrackets
Copy link
Collaborator Author

Hmm, something wrong in 28a07c7.

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

Okay, so this probably explains my troubles with this game in the other pull. I assumed it would start the CLUT at x=0. This one is using several CLUTs on the same row.

I already tried adjusting it to load using an offset, but I have a bug somewhere. It only works if I download the full 384 bytes the first time... hmm.

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

D'oh, and right after I posted that I saw the typo.

-[Unknown]

Used by Kurohyo 2.  Highly unlikely to be a mis-estimate within stride.
@daniel229
Copy link
Collaborator

Broke brave story
03

@unknownbrackets
Copy link
Collaborator Author

Sorry, I'm doing a terrible job testing. I just forgot to make sure it took the closest one, should be good now in both.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Jan 5, 2016

Hm, I'm sure I'm missing something, but why not look for an exact match?

@unknownbrackets
Copy link
Collaborator Author

Kurohyo loads from an offset X, so an exact match doesn't work. It loads 64 pixels of bytes or something at the same interval from a framebuffer that is 1px tall.

In contrast, Brave Story uses multiple framebuffers right next to each other, at 0x400 bytes apart.

-[Unknown]

@unknownbrackets
Copy link
Collaborator Author

By the way, that's the same reason my on GPU stuff doesn't work for the game; since it never downloads the CLUT for the offsets, and doesn't match the framebuffer for them, those all render with a garbage CLUT. I'm pretty sure applying this offset will help there, but have not done it yet.

-[Unknown]

@daniel229
Copy link
Collaborator

Look good now.

@hrydgard
Copy link
Owner

hrydgard commented Jan 5, 2016

Ah I see, right.

hrydgard added a commit that referenced this pull request Jan 5, 2016
Refactor VRAM download, optimize rendered CLUTs
@hrydgard hrydgard merged commit 8b27bc5 into hrydgard:master Jan 5, 2016
@unknownbrackets unknownbrackets deleted the gpu-memcpy branch January 5, 2016 15:37
@hrydgard
Copy link
Owner

@unknownbrackets Strangely, this badly breaks Ridge Racer on my S6 - lots of blackness and missing road. (To bisect back here I had to retroactively fix a bug where we would generate different shader versions for VS and FS).

@unknownbrackets
Copy link
Collaborator Author

What caused that was that CheckGLFeatures() was not called, so all the gstate_c.Supports() flags were wrong, because they assumed you have no gl extensions. I don't think the stuff this touches even happens during Ridge Racer, at least not for me?

-[Unknown]

@hrydgard
Copy link
Owner

Well, reverting this merge fixes Ridge Racer, so somehow it does. But I agree, I don't think it downloads at all... Oh, and it works fine on desktop, only on my S6 is it broken (haven't tried other android devices yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Brave Story crashed in boss fight
3 participants