usage: Graphs PDF objects and references for malware analysis.
pdf-parser.py -h for help
...
positional arguments:
file pdf to graph
optional arguments:
-h, --help show this help message and exit
-f pass stream object through filter (FlateDecode,
ASCIIHexDecode, ASCII85Decode, LZWDecode
-o {dot,png,vrml} graph output file format (default: svg)
-n no obj log files or directories created
- -h: Displays help.
- -f: Passes PDF objects through FlateDecode,ASCIIHexDecode, ASCII85Decode, LZWDecode filters, this will slow down the processing speed of the script but the objects may yield more information with their content un-filtered. -f is the equivalent of pdf-parser -f.
- -o: Species the output format of the graph. If -o is not specified then the svg format will be used.
DOT: Graph description language format, useful to use with tools that specialize in viewing or processing graphs. Some DOT viewers will allow zoom, pan and restructuring of nodes. The dot format supports html linking of nodes allowing pdf-grapher to display obj information on click.
PNG: A bitmap image format, viewers only offer a static image that degrades when zoomed and allows no interaction to restructure nodes to activate html links. PNG is widely supported by many applications and is useful if you need to insert a graph into a document or webpage.
SVG: Is an XML based format which offers compatibility with a wide number of applications (E.g. web browsers) but can also offer interaction via links and restructuring of nodes (if supported by the viewer). SVG is the default file format used by pdf-grapher.
VRML: Virtual Reality Modling Language. Ok I put this in here as a bit of a joke but it kind of works. With the advent of low cost VR equipment like the Occulus Rift coming out, I'll probably be experimenting to see if 3D visualisation could be useful in this case.
- -n: By default pdf-grapher.py will create a file_pdf_logs directory for every file processed by pdf-grapher.py. These logs contain the objs dumped by pdf-parser.py and displayed when you click on a graph node. If -n is invoked on the commandline then no directories will be created (I.e. Nodes will not display obj content when clicked on). This should also improve the processing speed of pdf-grapher.py
Python (Tested with 2.7.5)
Linux (Probably, haven't tried it on Windows)
pydot (https://code.google.com/p/pydot/)
pdf-parser (http://blog.didierstevens.com/programs/pdf-tools/)
If your dealing with small-med graphs then using your browser to view SVG files should be fine. For more complex graphs I'd suggest using a more specialized viewer like ZGRViewer ( http://zvtm.sourceforge.net/zgrviewer.html).
HINT: Press spacebar when using ZGRViewer to view obj dump.
If your dealing with large volume's of PDF's then you may want to consider the following options:
- -nwill give you FASTER performance on slower HDD's but will give a you negligible performance boost on SSD drives.
- -fwill SLOW performance by ~10x, this should be avoided if performance is a concern.
- -o png