-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU readback ideas and plans #16900
Comments
I would definitely want this to be optional in some way. I don't think I'd want this, especially in some games, depending on how it's implemented. I know we used to use PBOs and pull the previous frame so I remember some of the problems it caused. Incorrect text, flickering, etc. will be the result in some games - kinda like the absolutely safe and perfect vertex caching. Of course it can work fine in some games, but I expect it to be most safe in the type of games I don't really like playing anyway, so I'm biased. In my mind the safest way to do this, which would probably not be "ideal", would be to use frame-late readbacks in these conditions:
But all of these would make it stutter initially so I anticipate that your planned implementation will do this based on a compat flag, and because it will help mobile a lot, probably be enabled everywhere except a few games someone has specifically realized the graphics bugs are caused by this behavior. -[Unknown] |
Yeah the primary use cases for this are Syphon Filter style lensflares, and Motorstorm's brightness adaptation. I'm not sure there are others that are safe enough. Compat flag or enabling by trapping CPU reads from the swizzled VRAM regions will be a must, indeed - can't enable globally. |
Well, there are games that literally do a readback every frame for the sole purpose of having a screenshot ready in case you decide to save. They never read it, or they read it only to memcpy it somewhere else. That's a use case that would be fine. Unfortunately, iirc, some of the same games also do other readbacks that wouldn't be as safe at other times. One thing async readbacks could be useful for (but this would only hurt performance) is to handle expiring framebuffers better. There are many cases where games render to an area once or for a while, and then stop. For example, the discolored arms in the NBA games, Me and My Katamari, etc. But in other cases, games will reuse an old framebuffer after a while (Danganronpa, #8359.) If we async a safe region every frame, but only populated VRAM on the next draw, it would solve a bunch of these issues. That said, it'd definitely be slower than what we do now (not downloading.) -[Unknown] |
Does Dangan Ronpa also read back memory from the GPU ? |
Yes it does, the only hacky part is that we force a readback there where our normal checking doesn't detect one. But yes, Dangan Ronpa could probably fairly safely use these delayed readbacks that I'm implementing, for better performance. |
Alright, I've gotten delayed depth readbacks working in Syphon Filter. With a full 3 frames of buffering, this works a lot worse than I expected. Turns out the game does really tight depth comparisons of the computed depth of each light, compared to a few samples of the depth buffer. This works great when standing still, but when moving around, the depths are quite a bit off from what the game expects (since it's delayed by 3 frames) and the result is unstable, flickery lights. Not good. I think the fully-async method will have better results, but the only way to make the strictly frame-delayed variant look good is to reduce the amount of buffered frames, or to add a rather big fudge factor to the readback values to compensate, which will have its own issues. |
I'm struggling to implement this in any sane way for OpenGL unfortunately, the issue is that to map the buffer for read, we have to be on the GL thread, and not on the main thread. So this requires another level of staging data, which gets quite ugly. I have something that works but it's unexpectedly slow. Currently leaning towards initially making this stuff Vulkan-only... |
In #16916 I'm making the Dangan Ronpa readback delayed like this, improving performance on Vulkan. With the decoupling of the operation and the actual readback, we can imagine a bunch of other timing modes, that might be suitable for various games.
Some of these will be mostly equivalent to reducing the amount of frames of buffered graphics commands though. It might be that we should simply enforce it to be 1 or 2 for problematic readback situations. |
Something that would be slightly unsafe but might work, which I've considered doing for software rendering as a speedhack (I've already tested, and there it does make it faster.) Mark any readback during rendering "pending" either as a flag or in an array:
Afterward, if the game:
Synchronously download the pending operations. On mobile this could be bad as it might group our stalls more, though. Note: texture cache might need to look at the pending readbacks when building textures. This sounds a bit similar to what you're saying, but I think in many games it wouldn't have an observable difference (as long as the game is written to "correctly" wait for sync before peeking with the CPU.) I did a bit of this as a test in softgpu (not the block transfer part, which still immediately stalled) and didn't see any issues in several games. But technically it could mean bugs for people with different CPU speeds/core counts etc... -[Unknown] |
Yeah, something like that could work, but not sure how large the benefit would be. Likely worth trying. Unrelated note, in #11669 , it's mentioned that Ys Seven does a readback for the minimap a lot. Likely a good candidate for async/delayed readbacks. |
I believe that game also does some readbacks in only certain areas for fire effects, etc. -[Unknown] |
Can we enable GPU readback from random proportion (0.5,0.3 ,etc) ? |
That's an interesting idea, I think it has come up before but I forgot about it. However, it feels like it could lead to quite uneven framerates - still, that might be better than every frame being slow, for games where it would work. |
A huge performance/accuracy problem happen when games try to read back memory from the GPU to make it accessible to the CPU.
Readbacks are needed for some games to render at all (Tactics Ogre, which also does some redundant readbacks though), or to simulate the CPU reading from the depth buffer (Syphon Filter, Wipeout, for lens flares, not yet actually implemented), or for automatic brightness adjustment (Motorstorm), etc.
On the PC and even more so on mobile devices, it's essential to have a frame or two "in progress", pipelined between the CPU and GPU at the same time. The CPU runs a frame or two ahead of the GPU. Stopping this pipeline to read back data basically tells the system to sleep, so not only do we lose time waiting, but CPUs and GPUs get clocked down and performance gets even worse.
RPCS3 has a funny trick where when they stop to wait for the GPU to catch up, they give the CPU some useless work to do in the meantime, reducing performance drops. This we could also do, but I have some other ideas.
Many of the uses of readbacks (excluding Tactics Ogre's) are pretty fuzzy - lens flares in Wipeout and Syphon Filter would still look OK if the readback was a frame late or so. If we could simply add a frame or two of latency, readbacks would no longer require stopping the GPU and waiting, but things would still look generally fine.
I have two different ideas for enabling this:
Also there's a minor point that I haven't addressed - the correct time to stop the CPU and wait for the GPU is not exactly when the readback happens, as we do now, but the next time the game calls sceGeDrawSync because that's how the game knows that the GPU is done. Or I guess there's a signal mechanism too...
Implementation plan:
Then we have the two options outlined above, which will be implemented for both color and depth readbacks:
We might want both, the first one when using Vulkan and the other one for the other backends that can support it.
The text was updated successfully, but these errors were encountered: