Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a nicer way to see vtable's function calls directly in the decompiler? #516

Open
aldelaro5 opened this issue Apr 25, 2019 · 34 comments
Assignees
Labels
Status: Internal This is being tracked internally by the Ghidra team

Comments

@aldelaro5
Copy link
Contributor

aldelaro5 commented Apr 25, 2019

When I am dealing with a C++ binary, it's expected that I will have to deal with vtables and that the binary will call functions by accessing the instance's vtables. Defining the vtable doesn't seem too difficult as I can type the data to an array of func* and it shows perfectly in the listing. I can then retype the appropriate field in the class's structure to func** so the decompiler knows it is an array of func*.

The problem is this is the best I can get in the decompiler (this is an example I made up to illustrate my point):

(*this->vtable[6])(local_20);

Now, at least it does tell me that it is calling the 7th function in the vtable, but the problem is that I would like the decompiler to be able to infer WHICH function it is called, but it can't because it cannot know that this field is a vtable so it won't really change once assigned which would have been done before. I haven't found an option to set a structure's field as a constant that will never change so I am forced to manually check the vtable in the listing to figure out what is index 6 in the table.

The only other solution I seem to have found is to create a new structure with all the entries being there, but not only it will JUST say the function name and not show the actual reference, it will be completely separate from the class structure which is incredibly inconvenient to create for all vtables as I am dealing with hundreds of classes (my particula binary has debugging information).

My question: Is there a better way to deal with this and if there isn't, would it be possible to fix this? It seems to be a huge inconvenience to check the table every time I see an indirect function call.

@aldelaro5 aldelaro5 added the Type: Question Further information is requested label Apr 25, 2019
@BhaaLseN
Copy link

BhaaLseN commented Apr 26, 2019

I can get to
(*this->vtable.MyClass::MyMethod)(this, param1, param2)
by typing/naming the members of a vtable structure appropriately and then typing the vtable pointer of the actual structure with it.

So basically: MyClass with a vtable member of type vtable-MyClass* and vtable-MyClass with a MyClass::MyMethod member (name can be anything, I chose to put the class name so it shows up for derived classes) of type void __thiscall MyMethod(MyClass* this, int param1, int param2)
Then go to the actual memory location of the vtable (either check a ctor/dtor or find the VFTABLE some other way) and type it as vtable-MyClass (Note that sometimes the vtable isn't the right size and has extra function pointers at the end, you might have to clear code bytes before typing)

It works, but is still kinda clunky in syntax. Helps readability more than this->vtable[6] though.

But yeah, it would be nice if there was a way to help this vtable reconstruction; be it structure definition based on the vtables from metadata, naming from the referenced methods or just a vtable field that lets you edit this in the structure (or class!) somewhere.

@rokups
Copy link

rokups commented Jun 12, 2019

@BhaaLseN how is this done? I can see classes in symbol tree, but when i try to retype a variable - it expects type from "data type" tab. Classes are not there of course. Is there some extra step to create a data type from a class in symbol tree?

Edit: I can see we create new structures manually. This is a chore though, especially for vtables. Is there really no way to create a struct datatype from class symbol?

@BhaaLseN
Copy link

Yeah, this needs to be done by hand at this point; which is why I also put my 👍 for better support.

But as mentioned, right now you need two structs; your actual object/class/structure (which was probably auto-created) and a second one for the vtable (which you have to do by hand); then type the vtable member of the first one with a pointer to the second one.

@mattypiper
Copy link

ghidra vtable guide: http://hwreblog.com/projects/ghidra.html

@dvdkon
Copy link

dvdkon commented Jul 14, 2019

I made a script for detecting vtables and creating datatypes for them. It does this by looking for "typeinfo name" mangled strings, finding typeinfos by looking at references and then finding vtables the same way. I haven't tested it on more than two binaries, so it's probably got some bugs, but it should work on any Itanium ABI binary (GCC, clang and some others).

Don't worry if you run it and get a lot of "ERR" messages, those might just be artifacts of typename-like strings in the binary or classes where the tool can't get all the necessary data.

It doesn't deal with inheriting from a multiple inheritance class yet, but I'll hopefully fix that later.

Unfortunately, this script can't currently represent the type hierarchy faithfully (i.e. how gdb's print does it). This script creates multiple vptrs in a struct at particular offsets, but what's really happening is that entire objects (of parent classes) are embedded in the child object. This is all fine and could be implemented easily, but the first embedded object's vptr is also the whole object's vptr, so the pointed-to vtable has extra function pointers appended at the end. This would require something like the types being parametric over their vtable type (or just "hardcoded" support for C++ inheritance) in Ghidra.

@dvdkon
Copy link

dvdkon commented Sep 25, 2019

Some people from the CMU have written a framework for static analysis of object-oriented code, including a tool for recovering classes and methods and a corresponding Ghidra plugin. It looks much more comprehensive than my script and should be the best way going forward.

@happydpc
Copy link

Some people from the CMU have written a framework for static analysis of object-oriented code, including a tool for recovering classes and methods and a corresponding Ghidra plugin. It looks much more comprehensive than my script and should be the best way going forward.

But it cannot compile on windows.

@pabx06
Copy link

pabx06 commented Nov 19, 2020

been doing like @BhaaLseN said typing function def (names & types) and using them in vtable struct def and applying to vtable data. this work nicely and decompiler are able to infer argument and their types.

However i have like a 100MB binary reversed almost all function def , names and their types but creating manually vtable ... for binary over 100k functions not fun...

At least if only their was was a plugin or script you select the vtable pointer array and hit F2 and it auto create one for you with pointer pointee types that you already reversed !

@pabx06
Copy link

pabx06 commented Nov 20, 2020

However creating vtable data structure from pointer to function : do create a structure like : this.

image

what is bad: is that functions are already typed correctly. so mutch manual labor for some very low reward

Small example

Decided to compile a small example to see what ghidra's decompiler is able to recover:

image

1) inserting func def in the datatype manager

image
image
image
image

2) replacing pointer with function def in the vtable and vtable placeholder

image

image

image

Finish product :

image
image

a bit disappointed

  • the aes_p->mode enumeration make the decompiler choke off . on others sample the enumeration works mostly fine most of the time. is this a bug ?

  • aes_init() : returned data could have been the first field of the struct but it is not !

  • main return type and stack variable recovered ok.

image

here is src,bin,makefile,ghidra project file gzf
aes.tar.gz

@dragonmacher
Copy link
Collaborator

@pabx06 That is one heck of a write-up. That will make it easier to follow your issue.

@pabx06
Copy link

pabx06 commented Nov 20, 2020

That is one heck of a write-up

i cant agree more. typing vtable types so much labor . it would be nice to make a script to pull func-def & name from sected pointer and create a vtable ? imagine 100MB sample to reverse ...

@brandonros
Copy link

I made a script for detecting vtables and creating datatypes for them. It does this by looking for "typeinfo name" mangled strings, finding typeinfos by looking at references and then finding vtables the same way. I haven't tested it on more than two binaries, so it's probably got some bugs, but it should work on any Itanium ABI binary (GCC, clang and some others).

Don't worry if you run it and get a lot of "ERR" messages, those might just be artifacts of typename-like strings in the binary or classes where the tool can't get all the necessary data.

It doesn't deal with inheriting from a multiple inheritance class yet, but I'll hopefully fix that later.

Unfortunately, this script can't currently represent the type hierarchy faithfully (i.e. how gdb's print does it). This script creates multiple vptrs in a struct at particular offsets, but what's really happening is that entire objects (of parent classes) are embedded in the child object. This is all fine and could be implemented easily, but the first embedded object's vptr is also the whole object's vptr, so the pointed-to vtable has extra function pointers appended at the end. This would require something like the types being parametric over their vtable type (or just "hardcoded" support for C++ inheritance) in Ghidra.

@dvdkon

ProcessVTables.py> Running...
Traceback (most recent call last):
  File "/Users/brandonros/ghidra_scripts/ProcessVTables.py", line 614, in <module>
    create_typeinfo_vmi_class_type()
  File "/Users/brandonros/ghidra_scripts/ProcessVTables.py", line 403, in create_typeinfo_vmi_class_type
    dt.setFlexibleArrayComponent(
AttributeError: 'ghidra.program.database.data.StructureDB' object has no attribute 'setFlexibleArrayComponent'
ProcessVTables.py> Finished!

Incompatible for Ghidra v10?

@mvf
Copy link

mvf commented May 29, 2022

Incompatible for Ghidra v10?

This seems to work for me. Also adds 64-bit support:

--- a/ProcessVTables.py
+++ b/ProcessVTables.py
@@ -20,7 +20,7 @@ from collections import namedtuple
 from ghidra.program.util import ProgramMemoryUtil
 from ghidra.program.model.data import (
     StructureDataType, BuiltInDataTypeManager, CategoryPath,
-    FunctionDefinitionDataType, StringDataInstance)
+    FunctionDefinitionDataType, StringDataInstance, ArrayDataType)
 from ghidra.program.model.symbol import SourceType
 from ghidra.program.model.address import Address
 from ghidra.app.util.demangler.gnu import GnuDemanglerNativeProcess
@@ -29,7 +29,7 @@ from ghidra.app.util.demangler.gnu import GnuDemanglerNativeProcess
 SAVE_LOCATION = None

 MAIN_TYPE_CATEGORY_ID = 0 # My guess is it's always 0
-SIZE_T = 4
+SIZE_T = currentProgram.getAddressFactory().getDefaultAddressSpace().getSize() >> 3

 dtm = currentProgram.getDataTypeManager()
 bdtm = BuiltInDataTypeManager.getDataTypeManager()
@@ -400,8 +400,8 @@ def create_typeinfo_vmi_class_type():
     dt.add(charptr, SIZE_T, "__type_name", "")
     dt.add(bdtm.getDataType("/int"), 4, "__flags", "")
     dt.add(bdtm.getDataType("/int"), 4, "__base_count", "")
-    dt.setFlexibleArrayComponent(
-        dtm.getDataType("/__base_class_type_info"), "__base_info", "")
+    dt.add(ArrayDataType(
+        dtm.getDataType("/__base_class_type_info"), 0, -1), "__base_info", "")

 def get_func_str_repr(fsig):
     out = fsig.getPrototypeString()
@@ -454,13 +454,13 @@ def add_type_field_at(dt, cdt, offset, name, desc):
     dtlen = 0 if dt.getNumComponents() == 0 else dt.getLength()
     dt.growStructure(max(0, offset - dtlen))
     j = offset
-    to_delete = []
+    to_delete = set()
     if dt.getNumComponents() > 0:
         while j < offset + clen:
             if j >= dtlen: break
             c = dt.getComponentAt(j)
             j = c.getOffset() + c.getLength()
-            to_delete.append(c.getOrdinal())
+            to_delete.add(c.getOrdinal())
     dt.delete(to_delete)
     dt.insertAtOffset(offset, cdt, clen, name, desc)

@etra0
Copy link

etra0 commented Jun 1, 2022

Sorry for the bump, but just to be precise, both methods (from @BhaaLseN and @mattypiper) only sets the name of the vtable, right? it doesn't generate any sort of link between the decompiled code and the functions?

Because I tried both and of course now the decompilation is more readable, but I still have to manually navigate to the vtable and click the name of the function instead of doubleclicking the decompiled one.

@BhaaLseN
Copy link

BhaaLseN commented Jun 1, 2022

Correct, it only sets the name. It does not magically let you jump to the decompiled method or whatever. It's mostly so you can understand the decompiled code better (when it says (*this->vtable.MyClass::MyMethod)(this, local_20); instead of (*this->vtable[6])(local_20); for example)

@etra0
Copy link

etra0 commented Jun 1, 2022

I thought so. It's a bit weird there doesn't exist any way to link both since you can have the Class with all the methods listed, the struct with the same name and a vtable struct which you can fill with function names, yet Ghidra can't generate any way to navigate to the functions, but I guess that's the whole point of this issue. Thanks for the reply!

@brandonros
Copy link

Does their currently exist a way to “export” from decompilation enough to roughly be able to make a .cpp / .h file with the Class and its methods defined and then LoadLibrary / GetProcAddress on the class constructor, then call its vftable methods?

@NyanCatTW1
Copy link

NyanCatTW1 commented Aug 6, 2022

https://github.com/NyanCatTW1/RedMetaClassAnalyzer/blob/main/RedMetaClassAnalyzer.py

While I and @ChefKissInc were reverse engineering a couple of AMD driver kexts, I created this script that made our experience with vtables much less painful.
It comes with many other features, so you will have to edit the script before it would work.
It will only work on the master branch due to various usages of private functions. It is known to work on Ghidra commit fa9f21c

# Features (Newest at bottom)
# Rename metaClass/vtable pointers in __got.
# Find all references to safeMetaCast and retype variables according to the arguments fed.
# Add missing meta structs.
# Add many long fields to ATIController in order to ease the REing of its structure.
# Set up meta/vtable structs to display function name on vtable calls.
# Create vtable stubs in order to ease the REing of its structure.

@ghidra007
Copy link
Contributor

There is a prototype/proof of concept script in Ghidra called RecoverClassesFromRTTIScript.java that will figure out the class information if there is RTTI in the program for Windows programs and some gcc programs. it is very rough at this point but you are welcome to try it and see if it works on your binary. If it doesn't, please let us know what the output says and which type of binary you are trying it on. The script figures out class hierarchy and creates class structures and applies them. It figures out which virtual functions belong to which classes and puts them in the correct class and more. Look for the class structures in the data type manager under ClassDataTypes. There is a folder for each class. There are subfolders if a class is in another namespace. If you look in the description of the class structure you will see parent information. Look in the SymbolTree under Classes for the class members. You can optionally have a class hierarchy graph pop up (edit script to turn on option) or run later using the GraphClassesScript. Eventually, once more class support is available in Ghidra, the script features will be polished and put into main Ghidra.

We are currently working on a more generic class recovery script for classes without RTTI. It finds vftables, figures out class hierarchy where it can and basically does the same as described above.

@Wall-AF
Copy link

Wall-AF commented Aug 8, 2022

Is there any chance of making this understand classes built by old Borland C/C++ compilers?

@ghidra007
Copy link
Contributor

Are they the ones with the old RTTI format? If so, we have a fix in progress. Once that fix is added to the RTTI analyzer then yes this will work with that.

@Wall-AF
Copy link

Wall-AF commented Sep 15, 2022

Are they the ones with the old RTTI format? If so, we have a fix in progress. Once that fix is added to the RTTI analyzer then yes this will work with that.

That I don't know! But maybe this reference might help you clever folks.

@ghidra007
Copy link
Contributor

Are they the ones with the old RTTI format? If so, we have a fix in progress. Once that fix is added to the RTTI analyzer then yes this will work with that.

That I don't know! But maybe this reference might help you clever folks.

Wish we could claim to be clever. Thanks for the nice bit of light reading. :-) Appreciate the pointer to the documentation.

@0xBEEEF
Copy link

0xBEEEF commented Sep 21, 2022

@ghidra007
If you' re already on the topic of Borland, it would be great if you also consider Delphi. There is supposed to be some overlap between the RTTI data between C++ and Delphi according to some discussions like this one.

Here you should also find mature analysis methods in IDR that can handle this RTTI data. Maybe just as an idea.

@ghidra007
Copy link
Contributor

@0xBEEEF Thanks for the request and links. There are a ton of things on the list for handling various forms of RTTI. Hopefully we can get to them all at some point.

@ghidra007 ghidra007 added the Status: Prioritize This is currently being prioritized label Sep 21, 2022
@ghidra007 ghidra007 self-assigned this Sep 21, 2022
@ghidra007 ghidra007 removed the Status: Prioritize This is currently being prioritized label Sep 21, 2022
@markasoftware
Copy link

We are currently working on a more generic class recovery script for classes without RTTI. It finds vftables, figures out class hierarchy where it can and basically does the same as described above.

Is this on any public branch so we can hack on it?

@ghidra007
Copy link
Contributor

We are currently working on a more generic class recovery script for classes without RTTI. It finds vftables, figures out class hierarchy where it can and basically does the same as described above.

Is this on any public branch so we can hack on it?

No sorry. It isn't ready for that yet.

@ghost
Copy link

ghost commented Jan 21, 2023

Revisiting this, subscribed. I'm working on some code that is (fortunately) not stripped, so typeinfo and DWARF data exists in a plentiful fashion. The number of classes is humongous, so automating the process for vtable reconstruction is a very welcome quality of life improvement.

Working with current stable (10.2.2).

@ghidra007
Copy link
Contributor

@vogelfreiheit If the RTTI script doesn't currently work on it, which does have vtable reconstruction in it for some gcc compiled programs, I'm guessing that some changes that are in progress that will make the scirpt work better with gcc compiled programs.

@ghidra007 ghidra007 added Status: Internal This is being tracked internally by the Ghidra team and removed Type: Question Further information is requested labels Mar 21, 2023
@ScottJensen18
Copy link

ScottJensen18 commented Aug 7, 2023

@ghidra007 If you' re already on the topic of Borland, it would be great if you also consider Delphi. There is supposed to be some overlap between the RTTI data between C++ and Delphi according to some discussions like this one.

Here you should also find mature analysis methods in IDR that can handle this RTTI data. Maybe just as an idea.

@ghidra007 Also interested in Delphi support!

@jasper310899
Copy link

jasper310899 commented Feb 24, 2024

Hey, I want to add to this that currently it is not even possible to model vtables faithfully at all or am I wrong? Because imagine you have to model inheritance: childclasses define their own virtual methods for the grandchildren and they collide at the same offset to the vtable start so we can't have names explaining the vfunctions. This is because whith inheritance we always need to define the vptr at the baseclass.

For example imagine this example:

grafik

What if the baseclass "Parts" has no virtual methods at all? Or what if the first virtual function declared in "Obj" performance a tick but the first virtual function declared in "StaticObj", which also inherits from "Model", actually deletes the object.

A solution without fully implementing inheritance would be that you can declare virtual methods on structs and then the get inherited by all classes who contain that struct.

@ghidra007
Copy link
Contributor

@jasper310899 The RecoverClassesFromRTTI, a very rough prototype script that needs a lot of cleanup, is an example of how to model vftables at least for Windows binaries. Not sure it is the best way but it works pretty well. The code is pretty hard to follow so I don't recommend trying to understand it, but if you run it on a windows binary with RTTI you should be able to get a good idea of how it handles things. Look at structures it creates in the "ClassDataTypes" folder and look at the actual vftables themselves to see the class namespaces the functions are assigned to. It figures out which classes the virtual functions belong to so you can see which are inherited and which are overridden.

I imagine if there were no virtual methods at all, there would be no vftable and the constructor wouldn't point a virtual function table.

@jasper310899
Copy link

@ghidra007 I found a perfect solution:
grafik
Using comments you have complete freedom.

@ghidra007
Copy link
Contributor

Very true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Internal This is being tracked internally by the Ghidra team
Projects
None yet
Development

No branches or pull requests