Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subpixel precision (GTE accuracy) #28

Open
simias opened this issue Mar 5, 2016 · 22 comments
Open

Subpixel precision (GTE accuracy) #28

simias opened this issue Mar 5, 2016 · 22 comments

Comments

@simias
Copy link
Owner

simias commented Mar 5, 2016

I'm developing a prototype in the subpixel branch. More details to come...

@Nucleoprotein
Copy link

After removing garbage some games are not affected by this hack at all like mentioned before Tomb Raider II (SLUS-00437) and Silent Hill (SLUS-00707) ie. they look exactly same as without it.
Also TR2 screenshot here: https://imgur.com/a/a0XZo with broken coords - not broken ones are fixed point 😛

@ADormant
Copy link

ADormant commented Mar 6, 2016

I believe Crash Team Racing and Wipeout are problematic too.

@simias
Copy link
Owner Author

simias commented Mar 7, 2016

Okay, finally got my debugger up and running.

So the first thing I looked into was how the BIOS displayed the "PlayStation" logo at the start. I used the SCPH7003 BIOS (NA, version 3.0).

The first GTE XY FIFO access is an SWC2 at PC 0x8004e7d0:

=> 0x8004e7d0:  swc2    $12,0(a3)       /* $a3: 0x80086eb8 */
   0x8004e7d4:  swc2    $13,0(t0)
   0x8004e7d8:  swc2    $14,0(t1)
   0x8004e7dc:  swc2    $8,0(t2)
   0x8004e7e0:  nop
   0x8004e7e4:  c2      0x158002d
   0x8004e7e8:  lw      t1,28(sp)
   0x8004e7ec:  mfc2    t0,$7
   0x8004e7f0:  mfc2    t0,$7
   0x8004e7f4:  nop

Then I expected the BIOS to use the DMA to upload the completed commands to the GPU but it's not the case, instead this code uploads the data to the GPU:

=> 0x80050b38:  lw      t6,0(a0)        /* $a0: 0x80086eb8 */
   0x80050b3c:  move    v0,a1
   0x80050b40:  addiu   a1,a1,-1
   0x80050b44:  addiu   a0,a0,4
   0x80050b48:  bnez    v0,0x80050b38
=> 0x80050b4c:  sw      t6,0(v1)        /* $v1: 0x1f801810 (GPU GP0) */

So we can see that instead of using the DMA the BIOS copies the commands from the RAM towards the GPU in software using regular LW/SW.

This is an interesting situation for subpixel precision because in order to handle this situation we need to tie the enhanced precision vertex data with one of the CPU's general purpose registers ($t6 here).

Of course the BIOS is not the most interesting test case for subpixel precision and it's not really a big deal if it breaks for the PlayStation logo but i wouldn't be surprised if some games did something similar.

@i30817
Copy link

i30817 commented Mar 8, 2016

Ehhh, can that situation be detected and logged? If there are only a few games doing that, no offense, but i'd rather have them broken or at least to have a fast and slow path (for those games) than slowdown everything significantly. I know it's a hack, but the feature itself is a hack.

@simias
Copy link
Owner Author

simias commented Mar 8, 2016

Yeah maybe, I'm going to test more games. I haven't really settled on a solution yet. I was just interested to test the BIOS because I noticed that my current implementation didn't work there and wanted to figure out why it didn't.

@simias
Copy link
Owner Author

simias commented Mar 8, 2016

Also maybe it could be made optional, the hack could have various levels of complexity which could be turned on and off depending on the game and the capabilities of the host computer.

@simias
Copy link
Owner Author

simias commented Mar 8, 2016

I managed to get it working with Crash Bandicoot but not Spyro for some reason.

@i30817 Do also try to get perspective correct mapping working or just subpixel precision? Since I'm using an OpenGL renderer I thought I might try to get the z-coordinate with the floating point coordinates but that doesn't seem to work well so far.

@i30817
Copy link

i30817 commented Mar 9, 2016

I have no good idea of graphical programming so i can't answer that about the z-coordinate precision on the GTE.

In general I guess if you manage to surpass the other emulators at graphical enhancement of ps1 games it would be a powerful draw to users, but making the feature optional and with as many fast vs slow paths as possible seems best for the final solution (simpler prototyping is ok).

If you manage to detect when the simpler technique fails and replace it with the more complex one without false positives or missed events; that would be best (certainly better than per-game configs, which sound troublesome with the ps1 library size, as well as too coarse a measure since surely most games that need the more complex technique might not need it everywhere?).

@simias
Copy link
Owner Author

simias commented Mar 9, 2016

I see. Currently I manage to run Crash at full speed with the expensive version of the hack but I'm almost maxing out my CPU.

I think I'm going to try to get better compatibility with my emu before I continue with this hack, I can't really test all the games I want.

@ADormant
Copy link

ADormant commented Mar 9, 2016

Is it possible to make this emulator multithreaded? For example on one thread CPU and GTE on the other SPU and MDEC or even CPU and GTE on different threads? Though It'd be better if GTE could be emulated on a GPU. By the way could you give option to switch between these hacks?

@simias
Copy link
Owner Author

simias commented Mar 9, 2016

I implemented it in a way that would make it possible to make it an option (with no performance hit when the option is disabled) but I haven't actually implemented the option yet.

The GPU could be multithreaded but I'm not sure if there's a point since it's already de-facto offloaded to the host GPU through OpenGL so it shouldn't take too much CPU time.

For the rest it's more difficult, the GTE is so tightly coupled with the CPU that it's going to be hard to make it run in a separate thread. The MDEC is coupled with the CPU and the DMA which is itself coupled with RAM (and CPU) so it's going to be pretty difficult too, although probably less so than the GTE.

The MDEC has pretty specific use cases (FMV, pre-rendered backgrounds...) and generally it runs during loading times or while video is being displayed and the rest of the system is pretty much idle (except for SPU and CD-ROM, probably) so I'm not sure you'd see any significant improvement by threading the MDEC.

The SPU might be doable, I'm not sure at this point if it's worth it.

@ADormant
Copy link

ADormant commented Mar 9, 2016

Audio is rather power hungry in many emulated consoles.

@simias
Copy link
Owner Author

simias commented Mar 9, 2016

Yeah but in order for threading to give us performance we must offset the cost of the resynchronizations. If the average game tinkers with the SPU very frequently (reading registers, uploading audio, waiting for interrupts...) the thread might spend all of its time resync'ing which might well end up being slower than optimized single threaded code. There is no such thing as a free lunch.

@Nucleoprotein
Copy link

I get about 25FPS on BIOS screen using master branch, so rustation is very slow for me, but I have old CPU - Q6600.

@ADormant
Copy link

@simias By the way did you already implement perspective corrext texture mapping in that subpixel branch since you mentioned it here:

Do also try to get perspective correct mapping working or just subpixel precision?

@simias
Copy link
Owner Author

simias commented Mar 10, 2016

@Tapcio ouch, that is pretty slow. I haven't really spent time optimizing yet, hopefully that will be improved in the future. It still runs decently well on my core i5-2450M @ 2.5GHz.

@ADormant I tried. In the subpixel branch I store the Z coordinate of the vertex alongside the precise X and Y values and I feed them to OpenGL. I don't really know if it works though, Crash Bandicoot doesn't have a lot of obvious texture warping going on. I'm going to get more games working and give it an other try.

@simias
Copy link
Owner Author

simias commented Apr 2, 2016

What is this exactly? @Tapcio's code or yet an other implementation? It looks similar to what we were trying to do here as far as I can tell at a glance from the code.

@ADormant
Copy link

ADormant commented Apr 27, 2016

@simias iCatButler implemented perspective-correct texture mapping. Trilinear and anisotropic filtering should be doable with it.
iCatButler/pcsxr@216c2ff
iCatButler/PeteOpenGL2Tweak@a32aba6
iCatButler/pcsxr@153c8eb
https://drive.google.com/file/d/0Bz8IYcLfu84zNVVBeVQ5VHk1R0E/view?pref=2&pli=1

@ADormant
Copy link

ADormant commented May 10, 2016

@simias Reagarding iCat's implementation PGXP it's still not perfect and even more advanced implementation may require Getting the remaining vertex data will either mean much more widespread mirroring of CPU operations or some form of mesh reconstruction that will make a best guess at the exact 3D position from the low precision coordinates.
CPU iCatButler/pcsxr@f700823

http://ngemu.com/threads/peteopengl2tweak-tweaker-for-peteopengl2-plugin-w-gte-accuracy-hack.160319/page-47

@ADormant
Copy link

ADormant commented Aug 12, 2016

Dynarec for PGXP
iCatButler/pcsxr@36ef727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@i30817 @simias @Nucleoprotein @ADormant and others