-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new feature: function call arguments #771
Comments
Concerns (from last meeting):
|
when referring to an argument, we should be able to refer to its specific index. we should also try to associate the argument with its declared name. so like: api: CreateFileA
arg[0]: "foo.exe" and api: CreateFileA
lpName: "foo.exe" how do we maintain these mappings? we'd need a database of APIs and their canonical argument names (ideally should match MSDN (windows) and man pages (posix)). for MSDN, we should consider extracting the info we need from M$ provided winmd files: https://github.com/microsoft/win32metadata we should push to have vivisect/vivisect#213 updated and merged. |
we'll need to figure out how to handle a subset of types commonly used for arguments, like pointers to strings. does specifying a value as a string, like we should probably not go too far down this rabbit hole; handling structures is likely out of scope. do we support regex against strings? |
thought: if we migrate most of our rules to use this feature, then we could probably natively support decompiler backends, like ghidra and hex-rays. we should consider the fragmentation of our analysis backends though. how do we handle the scenario when some backends do/n't support various features? we already almost see this with SMDA versus viv wrt FLIRT support. |
we could add this as part of capa 4.0 (probably introduces insn scope) or defer for 5.0+ as this will be a breaking change to rule syntax. |
via #930 (comment) and above probably want to support at least the following "types": - operand[{0,1,n}].number: ...
- operand[{0,1,n}].string: ...
- operand[{0,1,n}].substring: ...
- operand[{0,1,n}].bytes: ...
- operand[{0,1,n}].flag: ... |
master's thesis https://www.ru.nl/publish/pages/769526/joren_vrancken.pdf by @joren485 describes an IDA/Hex-Rays plugin that uses call-scope features to identify capabilities. they have good success, demonstrating that this is probably a useful addition to capa. notably they use Hex-Rays decompilation as the source of their features. |
one suggestion for this feature's syntax would be to use a format similar to the strace and ltrace utilities on Linux. Example: - api: CreateThread(lpThreadAttributes=0x0, dwStackSize=, lpStartAddress=, lpParameter=, dwCreationFlags=0x4, lpThreadID=) or maybe: - api: CreateThread(lpThreadAttributes=0x0, dwCreationFlags=0x4) # match just these two arguments we can also specify return values in this syntax similar to strace/ltrace: - api: IsDebuggerPresent() == 0 the downsides to this approach are:
upsides of this approach:
|
i do like some aspects of this syntax, particularly that its very human readable. human readability has always been a big goal for capa rule syntax. if we ultimately pick another solution, perhaps we can still support a shorthand like this, since its probably sufficient for many rules. some additional considerations:
|
If you are interested and if this is still relevant, I can provide an SQLite database containing API call definitions for Windows including their argument names. I scraped this information from the from the MSDN Offline Library 2009 back in 2019. So, the data basis is not the newest but should include the most relevant API calls. However, this is an important point and should not be underestimated. The API traces differ greatly in terms of conformance to the MSDN. Based on my experience so far, CAPE has its own naming for arguments and the conformance is not the best. VMRay does a better job but I can fully understand that you chose CAPE since it is open source and there is a large data set of API traces available. The example shown below illustrates the differences in terms of the conformance. Please consider that these samples do not origin from the same sample. CAPE (Sample 17beca96e3a7474622f5b23ff015c8783c0868a070cc5331db622de9b78dd45e from the avast repo):
VMRay (Sample c0832b1008aa0fc828654f9762e37bda019080cbdd92bd2453a05cfb3b79abb3):
|
Ouh, that seems like a very important point. As a rule author I'd like to specify the name instead of a number (which name though? likely the one the sandbox uses which could be different as shown above OR the name from the MSDN documentation). To match features (using multiple sandboxes) we'd want to focus on the arguments by number (mapped from the name). So, for now it may be easiest to just use numbered arguments? And then add our own mapping later, potentially based on @0x534a's data. |
note that in the example above from @0x534a, the two sandboxes doen't even recover the same number of arguments 🤦🏼 i guess each sandbox needs a database to map argument names back to argument indices. then capa can work with raw indices. capa can optionally also provide its own database of argument index <-> argument name to make rules more readable, such as the one that @0x534a offers. maintaining these databases will be a bit tedious, but im not sure how we can get around it. i suppose once they're built and tested, updates shouldn't often be needed unless the sandboxes change. we'll have to inspect the types of data emitted by the sandboxes for the arguments as well. i suspect there'll be some cases where one sandbox resolves a handle into some string (e.g., path) and another sandbox just gives the handle value. fun. |
regarding the different number of arguments for If we're going to create and maintain a mapping from CAPE argument names into msdn naming, then I propose we reach out to the CAPE team and see if we could work on updating the CAPE argument names into the msdn format there. alternatively, perhaps we could add a modifier to the arguments feature to specify which calling convention the rule author has in mind? so something like this: - call:
- api: RegOpenKeyExW
- arguments/cape:
Registry: HKEY_LOCAL_MACHINE
SubKey: system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder and maybe consequently this? - call:
- api: RegOpenKeyExW
- or
- arguments/cape:
Registry: HKEY_LOCAL_MACHINE
SubKey: system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder
- arguments/msdn:
hkey: 0x80000001
lpSubKey: Software\\Microsoft\\Windows\\CurrentVersion\\Run |
+1 one that idea I'm not a fan of the sandbox specific arguments. I think it would make rule writing and our code more complex and complicated than desired. |
I am all for updating the argument names to MSDN format within CAPE 👍 |
It might be worth noting that CAPE sometimes enriches the output by adding fields that are technically not API arguments. For example, the output from the |
@0x534a, would you mind sharing your database? This could help to get the names updated in CAPE. |
Yeah, that's pretty awesome and very appreciated! 🎉
The SQLite database can be downloaded from my OneDrive using the link https://1drv.ms/u/s!AqNdbwsLZ9qwgw7Z5izJe0OZg9t_?e=badlPF. The structure of the database is not too complex and should mostly be self-explanatory. For example, to search for all arguments of a given API call (in this case SELECT a.name AS api_function,
p.name AS argument_name,
t.name AS argument_type,
p.is_in,
p.is_out,
p.description
FROM api_calls a,
api_call_params p,
types t
WHERE p.api_call_id = a.id
AND p.type_id = t.id
AND a.NAME = "RegOpenKeyEx"
AND a.target_os = "windows"
ORDER BY p.id ASC; Some constraints:
If there are any question, I'm happy to help. |
Great, thank you very much!! |
Summary
Can we create a way to associate function arguments (mostly for numbers and strings) with calls to known functions?
Possible syntax:
See discussion in #921 around syntax.
This is easier to understand by humans and we can be a little smarter in the analysis phase.
We should restrict this feature to analysis engines/formats/runtimes for which we can reliably extract the arguments (like .NET). Then, when its working well, we can try to backport to other engines/formats/runtimes (like x86). TBD if this sort of analysis is expected by all backends, e.g. SMDA.
Motivation
Looking for examples for #767 reminded me of the other most common use case for
basic block
subscopes...Grouping function calls and their arguments, like
or
The text was updated successfully, but these errors were encountered: