Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool to help process SYM files #5

Open
mewmew opened this issue Jul 16, 2018 · 13 comments
Open

Tool to help process SYM files #5

mewmew opened this issue Jul 16, 2018 · 13 comments

Comments

@mewmew
Copy link
Contributor

mewmew commented Jul 16, 2018

The last few days I've been playing with the idea of processing SYM files to output scripts which will add all symbol information to IDA. These scripts may be either in IDC format (e.g. notes.idc) or Python format (e.g. make_diablo.py and set_funcs).

As a first step to do this, we need to be able to process SYM files to the same capabilities as DUMPSYM.EXE.

For this purpose, the https://github.com/sanctuary/sym repo was create; and as of today it is capable of parsing DIABPSX.SYM from both the Japanese SLPS-01416 release (i.e. jap_05291998.out) and the 1997-12-12 Easy as Pie release (i.e. pal_12121997.out), to produce identical output as the DUMPSYM.EXE tool of the Psy-Q SDK.

This is the initial step, and based on this we now have data structures to further process the information and produce useful high-level information; e.g. scripts for importing this information into IDA. Or, outputting C header files; etc.

Just wanted to post about the process and start to get input on directions to take, or ideas on what to play with :)

cc: @7i @galaxyhaxz @seritools

@ghost
Copy link

ghost commented Jul 16, 2018

That's great! While there's a plugin to work with the .MAP files, they unfortunately do not contain structs and certain data. It would've been really nice to have everything filled in a few months ago in IDA haha!

A potential challenge I see is that DIABPSX.BIN was split into several files, so the .SYM file will have to be interpreted accordingly and likely split into a file for each binary.

It would also be nice to find a way to use the .SYM files in an emulator as well, so we can debug real-time without the console!

@mewmew
Copy link
Contributor Author

mewmew commented Jul 16, 2018

A potential challenge I see is that DIABPSX.BIN was split into several files, so the .SYM file will have to be interpreted accordingly and likely split into a file for each binary.

This is actually part of the SYM file format :)

The SYM file specifies overlays that will appear at the end of the executable. In the case of Diablo, there are four BIN files loaded in a similar fashion to dynamically linked libraries, and unloaded when no longer needed in order to save memory. These files are FMV.BIN, FRONTEND.BIN, PREGAME.BIN and GAME.BIN.

find . -type f -iname '*.bin' | xargs ls -l | sort | xin
-rw-r--r-- 1 u users 126064 Jun 11 17:38 ./lump/FMV.BIN
-rw-r--r-- 1 u users 143924 Jun 11 17:38 ./lump/FRONTEND.BIN
-rw-r--r-- 1 u users 171468 Jun 11 17:38 ./lump/PREGAME.BIN
-rw-r--r-- 1 u users 172584 Jun 11 17:38 ./lump/GAME.BIN

Each overlay has an overlay ID and an associated length specified in the SYM file.

000008: $800b031c overlay length $000009e4 id $4
000015: $800b031c overlay length $00000004 id $5
000022: $80139bf8 overlay length $00023234 id $b
00002f: $80139bf8 overlay length $00029dcc id $c
00003c: $80139bf8 overlay length $0002a228 id $d
000049: $80139bf8 overlay length $0001ec70 id $e

Overlay $4 is 2532 bytes in length.
Overlay $5 is 4 bytes in length.
Overlay $b is 143924 bytes in length.
Overlay $c is 171468 bytes in length.
Overlay $d is 172584 bytes in length.
Overlay $e is 126064 bytes in length.

From this we can determine the associated file of each overlay ID.

  • Overlay ID $b: FRONTEND.BIN
  • Overlay ID $c: PREGAME.BIN
  • Overlay ID $d: GAME.BIN
  • Overlay ID $e: FMV.BIN

In later parts of the SYM file, the set overlay symbol specifies the start of symbol definitions related to a specific overlay, using the symbol header value to specify the overlay ID.

# Unknown overlay.
118b97: $00000004 set overlay
118b9c: $800b0320 94 Def class EXT type FCN VOID size 0 name VID_OpenModule__Fv
118bbc: $800b03e0 94 Def class STAT type FCN VOID size 0 name InitScreens__Fv

# FRONTEND.BIN
119981: $0000000b set overlay
119986: $80139bfc 94 Def class EXT type FCN VOID size 0 name PresOnlyTestRoutine__Fv
1199ab: $80139c24 94 Def class EXT type FCN VOID size 0 name FeInitBuffer__Fv

# PREGAME.BIN
12444f: $0000000c set overlay
124454: $80139bfc 94 Def class EXT type FCN VOID size 0 name PreGameOnlyTestRoutine__Fv
12447c: $8013bcb0 94 Def class STAT type FCN VOID size 0 name DRLG_PlaceDoor__Fii

# GAME.BIN
1442c1: $0000000d set overlay
1442c6: $80139bfc 94 Def class EXT type FCN VOID size 0 name GameOnlyTestRoutine__Fv
1442eb: $80139c04 94 Def class EXT type FCN VOID size 0 name GetDamageAmt__FiPiT1

# FMV.BIN
165c37: $0000000e set overlay
165c3c: $80155e1c 94 Def class EXT type FCN VOID size 0 name _cd_seek
165c52: $80155e54 94 Def class EXT type FCN VOID size 0 name init_cdstream

It would also be nice to find a way to use the .SYM files in an emulator as well, so we can debug real-time without the console!

Definitely! Which PS1 emulators have SYM file support?

@ghost
Copy link

ghost commented Jul 19, 2018

Definitely! Which PS1 emulators have SYM file support?

Unfortunately, none that I know of. Currently the only option is to use a real-debugging unit hooked up to a PC. There was talk from the author about adding SYM support in No$PSX, but I don't think it ever happened. No$PSX is still the best emulator out there, and the only one with decent debugger support.

@mewmew
Copy link
Contributor Author

mewmew commented Jul 20, 2018

Ok, starting to reach something useful.

Install sym_dump

go get -u github.com/sanctuary/sym/cmd/sym_dump

Dump type definitions

sym_dump -types DIABPSX.SYM

Dump type definitions, variable and function declarations

sym_dump -c DIABPSX.SYM

Contents of header files stored at https://github.com/sanctuary/psx

One thing that is recovered which was not part of https://github.com/sanctuary/psx is block information. This may help us understand how many for loops etc were in the original source, also, the local variable are associated with the correct block. E.g.

// address: 0x8015F504
int SyncPutItem__FiiiiUsiUciiiiiUl(int pnum, int x, int y, int idx, int icreateinfo, int iseed, int Id, int dur, int mdur, int ch, int mch, int ivalue, unsigned long ibuff) {
	int ii;
	int d;
	int dy;
	{
		{
			{
				{
					{
						{
							unsigned char done;
							{
								int l;
								{
									{
										int j;
										{
											int yy;
											{
												int i;
												{
													int xx;
												}
											}
										}
									}
								}
							}
						}
					}
				}
			}
		}
	}
}

There is further information not yet pretty printed (e.g. line numbers, file name, but that information is now part of the Go data structures, so easy enough to print).

In the next few weeks I hope to be able to clean this up a bit and provide a -ida flag which outputs scripts for importing the symbol information directly into IDA.

Good night :)
/u

@mewmew
Copy link
Contributor Author

mewmew commented Jul 22, 2018

I had forgotten we put these online a few months back.

The scripts located at https://github.com/sanctuary/psx/tree/master/plugins can be used to import the symbols into IDA when analyzing the PSX version of Diablo.

Run the following Python scripts from IDA to import the function signatures.

base_types.py
name_diabpsx.py

Just uploaded the latest version of the SYM dump to https://github.com/sanctuary/psx where overlay symbols have been split into dedicated header files.

@mewmew
Copy link
Contributor Author

mewmew commented Jul 23, 2018

And we are good to go!

IDA Python scripts uploaded to https://github.com/sanctuary/psx/tree/master/ida

These are generated by running sym_dump -ida DIABPSX.SYM.

To install sym_dump, run go get -u github.com/sanctuary/sym/cmd/sym_dump

Below are step by step instructions for loading DIABPSX.BIN into IDA and running the above scripts to import symbol information.

Load Diablo binary and overlay in IDA

  1. Load DIABPSX.BIN (from the Japanese SLPS-01416 release) in IDA.
  2. Processor type: MIPS R5900 (Sony Playstation 2) little endian
  3. ok
  4. ROM: ROM start address: 0x80010000
  5. Input file: Loading address: 0x80010000
  6. File -> Load file -> Another binary file: FMV.BIN, FRONTEND.BIN, GAME.BIN or PREGAME.BIN. (Repeat from step 1 to create one IDB database per overlay file)
  7. Loading segment: 0x8001000 (in paragraphs)
  8. Loading offset: 0x129BF8 (0x80139BF8-0x80010000)

Run IDA Python scripts

The scripts are named as follows, with overlay IDs as below.

  • ida/make_psx.py set names of symbols

  • ida/set_funcs.py set function signatures

  • ida/set_vars.py set types of global variables

  • ida/overlay_c/make_psx.py set names of symbols in overlay with ID $c

  • Overlay ID $b: FRONTEND.BIN

  • Overlay ID $c: PREGAME.BIN

  • Overlay ID $d: GAME.BIN

  • Overlay ID $e: FMV.BIN

Assuming GAME.BIN was loaded as overlay in step 6 above, then use overlay ID $d for the Python scripts.

Step by step instruction for running scripts.

  1. Options -> Compiler...
  2. Compiler -> Visual C++
  3. ok
  4. File -> Load file -> Parse C header file... (or [crtl] + F9)
  5. Select types.h and press Open
  6. File -> Script file... (or [alt] + F7)
  7. Select set_vars.py and press Open
  8. Wait until the analysis finishes and it reads AU: idle in the status bar of IDA
  9. Repeat steps 6 through 8 for the following scripts (order matters, set_vars.py should be run before other scripts).
    • overlay_d/set_vars.py (note _d is used for ID $d)
    • make_psx.py
    • set_funcs.py
    • overlay_d/make_psx.py
    • overlay_d/set_funcs.py

@ghost
Copy link

ghost commented Jul 23, 2018

Would have been great to have that 6 months ago! Too bad IDA doesn't have a decompiler for the PSX though, and even then it doesn't seem to plug in the $gp variables properly. Still could help us though, to double errors with the PC release.

@mewmew
Copy link
Contributor Author

mewmew commented Jul 23, 2018

Would have been great to have that 6 months ago!

Hehe, yeah. Sorry for being late ^^

Too bad IDA doesn't have a decompiler for the PSX

Indeed. I've been playing a bit with writing tools for PSX decompilation. Don't think they'll reach anything close to IDA in the near future, but it's been a lot of fun to play with.

it doesn't seem to plug in the $gp variables properly. Still could help us though, to double errors with the PC release.

Just for reference, how to calculate addresses from $gp.

; How to calculate global variable addresses in MIPS

$gp = 0x8011A780

; Example from drlg_l1.cpp___L5firstRoom(void)

seg003:8013D9C4                 sb      $zero, 0x215A($gp)

address: 0x8011A780 + 0x215A = 0x8011C8DA

sbss:8011C8DA HR3:            .space 1                 ; PSX ref: 0x8011C8DA
sbss:8011C8DA                                          ; PSX def: unsigned char HR3

@ghost
Copy link

ghost commented Jul 23, 2018

For the PAL "easy as pie" release: $gp = 0x8012A3F0

@mewmew
Copy link
Contributor Author

mewmew commented Jul 24, 2018

Oh, and to extract *.DIR files (with associated *.BIN files), e.g.

  • lump.dir and lump.bin
  • stream5.dir and stream5.bin

Install the dir_dump tool by running go get -u github.com/sanctuary/exp/cmd/dir_dump, and then extract the archives using dir_dump lump.dir

The executable files diabpsx.bin and the overlays (shared libraries) fmv.bin, frontend.bin, game.bin and pregame.bin are contained within these *.DIR archives.

Edit: the dir_dump tool is experimental, so if you find some issues (there will surely be some), please feel free to report them :)

@ghost
Copy link

ghost commented Jul 24, 2018

The executable files diabpsx.bin and the overlays (shared libraries) fmv.bin, frontend.bin, game.bin and pregame.bin are contained within these *.DIR archives.

There are also some raw source code files hidden in the "Easy as Pie" *.BIN files. In particular some headers for strings, dialogs, and monster data. Based on the monster data, I'm curious if those massive data tables for items/objects/etc. were stored in a different format and then converted during compilation. Perhaps Excel even.

LUMP.BIN/MONST.INF
LUMP.BIN/MONST.DEF
LUMP.BIN/MAINTXT.H

@ambiennt
Copy link

Just stumbling upon this thread, but I'm not expecting to get a response on this due to the thread age. Info on how to parse/analyze .SYM files with IDA seems to be really sparse and some of the info linked here could be potentially useful, but seems its all 404'd. Is there any tool(s) still available in the public domain for .SYMs and IDA?

@mewmew
Copy link
Contributor Author

mewmew commented May 22, 2022

Just stumbling upon this thread, but I'm not expecting to get a response on this due to the thread age. Info on how to parse/analyze .SYM files with IDA seems to be really sparse and some of the info linked here could be potentially useful, but seems its all 404'd. Is there any tool(s) still available in the public domain for .SYMs and IDA?

Hi @ambiennt,

The github.com/sanctuary/sym repo has now been made public. The code is in the public domain.

Cheers,
Robin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants