Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lens flare effects #15923

Open
hrydgard opened this issue Aug 29, 2022 · 14 comments
Open

Lens flare effects #15923

hrydgard opened this issue Aug 29, 2022 · 14 comments
Labels
GE emulation Backend-independent GPU issues
Milestone

Comments

@hrydgard
Copy link
Owner

hrydgard commented Aug 29, 2022

Lens flare issues, categorized:

CPU peeking into the depth buffer to check coverage

Framebuffer->CLUT tricks

Framebuffer alpha accumulation tricks:

Not yet investigated in detail:

References:

https://github.com/hrydgard/ppsspp/commits/c3bb9437669a4a (old PR for framebuffer CLUTs)

Lens flares are a typical problematic effect on GPUs of the PSP's generation. They are supposed to be drawn only when the sun (or other light source) is visible, but there are no occlusion queries you can use to figure out if it is directly on the GPU, neither is it practical to copy the texel to an image and then use multitexturing to blend the lens effect texture with the copied texel, since multitexturing is not a thing.

So games make use of a variety of dirty tricks.

Let's start with Wipeout Pure, #13344. I started by hacking the interpreter to log out CPU reads from VRAM. For some reason there are a whole bunch that happen every frame, but these stand out:

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5ec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5ec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f8

!!! Observation: These are cached addresses, so the game must be doing a cache invalidate at this location, maybe interesting to catch.

In the EUR version of Wipeout, the lhu instruction doing these reads is at 0888c16c (function starting at 0888C0A8), then there are some additional reads being done by 0881e0c0 (function starting at 0881E098, no idea what it's doing).

It's using lhu instructions (load 16-bit) and it looks to me like it's sampling a 4x4 rectangle around the sun's screen position from the depth buffer, skipping every other pixel - it is situated at 110000 in VRAM which starts at 04000000, plus the 600000 deswizzle offset that's needed to linearize the depth buffer in 8888 mode. A zero value it treats as sky, that is, sun is not occluded and it will draw the lense flare. As expected, as the sun slides across the image when the camera moves, these addresses, which are read from every frame, change accordingly. The game must be synchronized here since the depth buffer is not double-buffered.

For this to work correctly, we have to read back the depth buffer every frame to emulated PSP VRAM, which introduces a massive sync point between the GPU and CPU. This is not really desirable (although we should implement it as an option), so I've been thinking about ways to get around it:

  • Unsynchronized readbacks (Vulkan)
    • Schedule readbacks on the GPU's timeline, but don't wait for them to complete, instead have a background thread wait on a fence from the GPU, copying to the PSP's VRAM when done, whenever that is, in the background. For the purposes of lens flare visibility, this might be fine. No CPU stall, but the readback might be delayed, which could in theory (and given our luck, probably in practice too) corrupt important data during level transitions and similar.
  • Virtual readbacks with hooks
    • As above, but we don't actually read back to the PSP's VRAM but to an alternative buffer, then for each affected game, hook the PSP function that reads from the depth buffer to read the data from the alternative buffer instead. This has the advantage that we don't even need to copy the whole depth buffer from GPU memory to PSP RAM, instead we can read directly from there (need to be careful about memory types for performance), but requires game specific work.
  • Automatic hooks
    • Same as the last one, but we protect the memory that the depth buffer belongs to, and when we get a memory exception, we mark that code block as to be recompiled with a special memory check, that then goes and reads VRAM addresses from our special depth buffer. So the same but no manual work per-game.

Anyway, I think the first step will be to create the correct-ish but slow solution of doing hard-synced readbacks to PSP VRAM. The question is when exactly in the frame we should do these. "When finished rendering the main depth buffer" is presumably the best option, but there's no clear way to detect that. Maybe just do it when the main framebuffer is displayed, or something.

Syphon Fitler : Dark Mirror flares

I think this will be a good candidate to test a fast-and-loose Z rasterizer, along with Wipeout.

We basically just need to rasterize static meshes (walls), we can ignore skinned characters.

So I think the way to go will be to write a hyper optimized Z-only rasterizer. It can be very loose, maybe even render at half resolution in one or both dimensions.

Syphon Filter walls are drawn with this setup:

  • u16 tc, 565 color, s8 normal, s16 positions. Strips.

Wipeout uses similar vertices, also s16 positions. Strips and indexed lists

So for a proof of concept we need to handle the above. "Just" do a custom vertex decoder that picks the positions out and collect into a triangle list on the side, then bin and raster on each framebuffer switch (or stall?) using a custom rasterizer. Something like Intel's.

.... To be continued

@ghost
Copy link

ghost commented Aug 30, 2022

Artic Edge
#11100 (comment)
NFS Shift
#11100 (comment)

@hrydgard
Copy link
Owner Author

Thanks, added to the list.

@QrTa
Copy link

QrTa commented Aug 30, 2022

Syphon Filter Logan's Shadow
#10229 (comment)

@71knight
Copy link

71knight commented Sep 2, 2022

I don't know how aethersx2 PS2 emulator does it, but they get accurate full speed readback emulation using opengl on Android. I can play need for speed hot pursuit 2 without underclocking the emulator and have accurate readbacks turned on and maintain full speed emulation with no slowdowns and the sun hides when it's supposed to. And I think the PS2 has double the resolution of PSP. I do cheat a little though... I keep all my cores frequencies maxed out and GPU set at 3/4 speed on my rooted phone (SD 855+). The phone doesn't get too hot.... About 147° F on average. So I know ppsspp wouldn't be too demanding with accurate readbacks. I think what helps them is they have CPU affinity option that keeps the heaviest threads on the biggest cores of the phone.

@ghost
Copy link

ghost commented Sep 2, 2022

Aethersx2 is the pcsx2 mobile port btw.

@hrydgard
Copy link
Owner Author

hrydgard commented Sep 9, 2022

An alternative to readbacks for the games that peek the Z-buffer using the CPU, as commented elsewhere by @unknownbrackets , would be to run both the software and hardware renderers side by side, that way we'll always have accurate depth in CPU-accessible memory, at the right time.

This is expensive though, and to make it less so, it would be possible to have the software renderer only render depth buffers, and just ignore color - depth is a lot less complex so I think this would be way faster than running the full software renderer. This wouldn't work for cases where games reinterpret color and Z like Kuroyou, but I don't think that applies to any of these cases.

Also gonna have to look into what PCSX2 does. Maybe SX2 Aether does something special on top, hard to say given it's close source..

@hrydgard hrydgard added this to the Future-Prio milestone Sep 9, 2022
@hrydgard hrydgard added the GE emulation Backend-independent GPU issues label Sep 9, 2022
@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Sep 10, 2022

I will say, the loop to interpolate triangle data is the slowest part of the software renderer now, I think. Texture sampling is still fairly slow as well.

We'd still need to texture (because of alpha tests/color tests), but we could skip alpha blend and logicops. Skipping blending would save time, but I don't think it'd make a huge difference overall.

Maybe we could have a "fast and loose" mode where it ignores color and alpha tests, though, or at least skip sampling/etc. when they're not enabled (which would be safe.) That would also allow us to skip lighting which is quite expensive.

-[Unknown]

@hrydgard
Copy link
Owner Author

Yeah I think we can go very fast and loose for Z-only. Texturing only needs to be done when we know there's alpha. And we could skip filtering and mipmapping for example..

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Sep 10, 2022

Right. My biggest concern would be "depth boxes" from alpha testing. For example, if some far away trees or clouds were drawn to cover the sun, but without alpha testing they cover the entire thing. If we can safely skip alpha testing, it probably helps the potential speed a lot, because it cuts out many, many things.

We might end up in a place where we're using heuristics to skip alpha testing, though. For example, it's probably mainly an issue with flat Z - models probably don't need alpha testing for depth to be correct.

-[Unknown]

@ghost
Copy link

ghost commented Oct 12, 2022

Socom US Navy Seal: Tactical Strike is also affected.
#15071
Screenshot_2022-10-13-02-23-08-77

UCUS98649.ppdmp.zip

@hrydgard
Copy link
Owner Author

Thanks, added to list.

@ghost
Copy link

ghost commented Feb 4, 2023

Burnout Dominator sun flares is glitchy using the recently build PPSSPP.
Screenshot_20230204_194908_2f85358b2198d26f8aca533d68bee793
ULUS10236.zip

@hrydgard
Copy link
Owner Author

hrydgard commented Feb 4, 2023

Yeah, I'll have to take a look at those again.

@ghost
Copy link

ghost commented Mar 17, 2023

Resistance Retribution

Software
Screenshot_20230318_065453_2f85358b2198d26f8aca533d68bee793

Vulkan/OpenGL
Screenshot_20230318_065718_2f85358b2198d26f8aca533d68bee793

GE Dump
UCES01184.ppdmp.zip

Edit: fixes by [ReadbackDepth] compatibility but makes the game slower and make my opponent invisible :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GE emulation Backend-independent GPU issues
Projects
None yet
Development

No branches or pull requests

4 participants