new feature: function call arguments #771

mr-tz · 2021-09-10T19:15:27Z

Summary

Can we create a way to associate function arguments (mostly for numbers and strings) with calls to known functions?

Possible syntax:

- call:
  - number: 4
  - api: CreateProcess

See discussion in #921 around syntax.

This is easier to understand by humans and we can be a little smarter in the analysis phase.

We should restrict this feature to analysis engines/formats/runtimes for which we can reliably extract the arguments (like .NET). Then, when its working well, we can try to backport to other engines/formats/runtimes (like x86). TBD if this sort of analysis is expected by all backends, e.g. SMDA.

Motivation

Looking for examples for #767 reminded me of the other most common use case for basic block subscopes...

Grouping function calls and their arguments, like

      - basic block:
        - and:
          - api: kernel32.QueryInformationJobObject
          - number: 0x3 = JobObjectBasicProcessIdList

or

        - basic block:
          - and:
            - api: SendMessage
            - number: 0x40a = WM_CAP_DRIVER_CONNECT

The text was updated successfully, but these errors were encountered:

Ana06 · 2022-03-22T14:30:41Z

Concerns (from last meeting):

parameters with or
bitfield, for example for CreateFile

williballenthin · 2022-03-22T14:34:13Z

when referring to an argument, we should be able to refer to its specific index. we should also try to associate the argument with its declared name. so like:

api: CreateFileA
    arg[0]: "foo.exe"

and

api: CreateFileA
    lpName: "foo.exe"

how do we maintain these mappings? we'd need a database of APIs and their canonical argument names (ideally should match MSDN (windows) and man pages (posix)).

for MSDN, we should consider extracting the info we need from M$ provided winmd files: https://github.com/microsoft/win32metadata
alternatives might include using viv's API database or extract one from some sandbox, etc. but the winmd approach is "blessed" and supported.

we should push to have vivisect/vivisect#213 updated and merged.

williballenthin · 2022-03-22T14:34:46Z

we'll need to figure out how to handle a subset of types commonly used for arguments, like pointers to strings.

does specifying a value as a string, like lpName: "foo.exe" imply the argument is a string (either ASCII or utf-16) and instruct the matching engine to resolve the data? and/or does the engine use an API database to determine the types of arguments ahead of time?

we should probably not go too far down this rabbit hole; handling structures is likely out of scope.

do we support regex against strings?

williballenthin · 2022-03-22T15:11:00Z

thought: if we migrate most of our rules to use this feature, then we could probably natively support decompiler backends, like ghidra and hex-rays.

we should consider the fragmentation of our analysis backends though. how do we handle the scenario when some backends do/n't support various features? we already almost see this with SMDA versus viv wrt FLIRT support.

williballenthin · 2022-03-31T16:26:06Z

we could add this as part of capa 4.0 (probably introduces insn scope) or defer for 5.0+ as this will be a breaking change to rule syntax.

williballenthin · 2022-03-31T16:27:28Z

via #930 (comment) and above

probably want to support at least the following "types":

- operand[{0,1,n}].number: ...
- operand[{0,1,n}].string: ...
- operand[{0,1,n}].substring: ...
- operand[{0,1,n}].bytes: ...
- operand[{0,1,n}].flag: ...

williballenthin · 2023-02-01T10:12:07Z

master's thesis https://www.ru.nl/publish/pages/769526/joren_vrancken.pdf by @joren485 describes an IDA/Hex-Rays plugin that uses call-scope features to identify capabilities. they have good success, demonstrating that this is probably a useful addition to capa.

notably they use Hex-Rays decompilation as the source of their features.

yelhamer · 2023-06-15T10:54:23Z

one suggestion for this feature's syntax would be to use a format similar to the strace and ltrace utilities on Linux. Example:

- api: CreateThread(lpThreadAttributes=0x0, dwStackSize=, lpStartAddress=, lpParameter=, dwCreationFlags=0x4, lpThreadID=)

or maybe:

- api: CreateThread(lpThreadAttributes=0x0, dwCreationFlags=0x4) # match just these two arguments

we can also specify return values in this syntax similar to strace/ltrace:

- api: IsDebuggerPresent() == 0

the downsides to this approach are:

it seems a bit more clustered as opposed to the call scope, which I think looks pretty elegant compared to this approach.
we would need to find an efficient way to extract the api names and arguments, since otherwise this should introduce performance issues given the large number of api calls that are usually made by a sample.

upsides of this approach:

it would make the feature easily sharable between dynamic and static flavors, and should make writing rules that work both statically and dynamically easier.

williballenthin · 2023-06-15T11:04:21Z

api: CreateThread(lpThreadAttributes=0x0, dwCreationFlags=0x4)

i do like some aspects of this syntax, particularly that its very human readable. human readability has always been a big goal for capa rule syntax. if we ultimately pick another solution, perhaps we can still support a shorthand like this, since its probably sufficient for many rules.

some additional considerations:

cannot express logic for the arguments, such as this OR that. but i think its on us to demonstrate if this would be used often. i think maybe it might for bitfield/enum arguments.
have to develop a parser for this rule syntax, and also find a way to show the user what went wrong when a rule is invalid
how to specify interpretation of the arguments, like 0x4 = CREATE_SUSPENDED? maybe like dwCreationFlags=0x4 (CREATE_SUSPENDED) or something?

0x534a · 2023-06-20T15:04:39Z

how do we maintain these mappings? we'd need a database of APIs and their canonical argument names (ideally should match MSDN (windows) and man pages (posix)).

If you are interested and if this is still relevant, I can provide an SQLite database containing API call definitions for Windows including their argument names. I scraped this information from the from the MSDN Offline Library 2009 back in 2019. So, the data basis is not the newest but should include the most relevant API calls.

However, this is an important point and should not be underestimated. The API traces differ greatly in terms of conformance to the MSDN. Based on my experience so far, CAPE has its own naming for arguments and the conformance is not the best. VMRay does a better job but I can fully understand that you chose CAPE since it is open source and there is a large data set of API traces available. The example shown below illustrates the differences in terms of the conformance. Please consider that these samples do not origin from the same sample.

CAPE (Sample 17beca96e3a7474622f5b23ff015c8783c0868a070cc5331db622de9b78dd45e from the avast repo):

{
    "timestamp": "2021-06-03 21:57:55,843",
    "thread_id": "1688",
    "caller": "0x743c1321",
    "parentcaller": "0x743c13c9",
    "category": "registry",
    "api": "RegOpenKeyExW",
    "status": true,
    "return": "0x00000000",
    "arguments": [
        {
            "name": "Registry",
            "value": "0x80000002",
            "pretty_value": "HKEY_LOCAL_MACHINE"
        },
        {
            "name": "SubKey",
            "value": "system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder"
        },
        {
            "name": "Handle",
            "value": "0x000000e8"
        },
        {
            "name": "FullName",
            "value": "HKEY_LOCAL_MACHINE\\system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder"
        }
    ],
    "repeated": 0,
    "id": 39
}

VMRay (Sample c0832b1008aa0fc828654f9762e37bda019080cbdd92bd2453a05cfb3b79abb3):

[0076.435] RegOpenKeyExW (in: hKey=0x80000001, lpSubKey="Software\\Microsoft\\Windows\\CurrentVersion\\Run", ulOptions=0x0, samDesired=0xf003f, phkResult=0x18ea40 | out: phkResult=0x18ea40*=0x4f0) returned 0x0

mr-tz · 2023-06-21T10:49:26Z

Ouh, that seems like a very important point.

As a rule author I'd like to specify the name instead of a number (which name though? likely the one the sandbox uses which could be different as shown above OR the name from the MSDN documentation).

To match features (using multiple sandboxes) we'd want to focus on the arguments by number (mapped from the name).

So, for now it may be easiest to just use numbered arguments? And then add our own mapping later, potentially based on @0x534a's data.

williballenthin · 2023-06-26T07:09:19Z

note that in the example above from @0x534a, the two sandboxes doen't even recover the same number of arguments 🤦🏼

i guess each sandbox needs a database to map argument names back to argument indices. then capa can work with raw indices. capa can optionally also provide its own database of argument index <-> argument name to make rules more readable, such as the one that @0x534a offers.

maintaining these databases will be a bit tedious, but im not sure how we can get around it. i suppose once they're built and tested, updates shouldn't often be needed unless the sandboxes change.

we'll have to inspect the types of data emitted by the sandboxes for the arguments as well. i suspect there'll be some cases where one sandbox resolves a handle into some string (e.g., path) and another sandbox just gives the handle value. fun.

yelhamer · 2023-07-03T01:12:30Z

regarding the different number of arguments for RegOpenKeyExW, it seems like that's how CAPE was programmed to handle that:

If we're going to create and maintain a mapping from CAPE argument names into msdn naming, then I propose we reach out to the CAPE team and see if we could work on updating the CAPE argument names into the msdn format there.

alternatively, perhaps we could add a modifier to the arguments feature to specify which calling convention the rule author has in mind? so something like this:

- call:
  - api: RegOpenKeyExW
  - arguments/cape:
    Registry: HKEY_LOCAL_MACHINE
    SubKey: system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder

and maybe consequently this?

- call:
  - api: RegOpenKeyExW
  - or
    - arguments/cape:
        Registry: HKEY_LOCAL_MACHINE
        SubKey: system\\CurrentControlSet\\control\\NetworkProvider\\HwOrder
    - arguments/msdn:
        hkey: 0x80000001
        lpSubKey: Software\\Microsoft\\Windows\\CurrentVersion\\Run

mr-tz · 2023-07-03T08:15:36Z

we reach out to the CAPE team and see if we could work on updating the CAPE argument names into the msdn format there

+1 one that idea

I'm not a fan of the sandbox specific arguments. I think it would make rule writing and our code more complex and complicated than desired.

kevoreilly · 2023-07-05T11:59:24Z

I am all for updating the argument names to MSDN format within CAPE 👍

kevoreilly · 2023-07-05T12:16:42Z

It might be worth noting that CAPE sometimes enriches the output by adding fields that are technically not API arguments.

For example, the output from the NtReadFile hook includes the file path but this is not included in the arguments, rather is obtained by the hook from the handle argument.

mr-tz · 2023-07-05T12:17:32Z

@0x534a, would you mind sharing your database? This could help to get the names updated in CAPE.

0x534a · 2023-07-05T20:06:37Z

I am all for updating the argument names to MSDN format within CAPE 👍

Yeah, that's pretty awesome and very appreciated! 🎉

@0x534a, would you mind sharing your database? This could help to get the names updated in CAPE.

The SQLite database can be downloaded from my OneDrive using the link https://1drv.ms/u/s!AqNdbwsLZ9qwgw7Z5izJe0OZg9t_?e=badlPF. The structure of the database is not too complex and should mostly be self-explanatory. For example, to search for all arguments of a given API call (in this case RegOpenKeyEx) you can use the following SQL statement:

SELECT a.name AS api_function, 
       p.name AS argument_name, 
       t.name AS argument_type, 
       p.is_in, 
       p.is_out, 
       p.description 
FROM   api_calls a, 
       api_call_params p, 
       types t 
WHERE  p.api_call_id = a.id 
       AND p.type_id = t.id 
       AND a.NAME = "RegOpenKeyEx" 
       AND a.target_os = "windows" 
ORDER  BY p.id ASC;

Some constraints:

The database does not include structs or enums. So, no nested structures of arguments can be found.
The position of an argument is not explicitly stated in the data as own column. Nevertheless, it can be deduced from the ID of the argument (primary key of the table api_call_params).
The database contains API calls for different platforms. To get the best results simply filter by the OS windows or the calling convention WINAPI.
Not all of the API calls are documented in the MSDN. For undocumented API calls (especially NTAPI), I scraped the website http://undocumented.ntinternals.net. The site seems to be offline right now. Based on the naming of parameters on the website, I can not guarantee that the argument names always make sense. This is more like a best-effort approach. ;)

If there are any question, I'm happy to help.

mr-tz · 2023-07-06T07:36:14Z

Great, thank you very much!!

williballenthin changed the title ~~New scope: call~~ new feature: function call arguments Mar 22, 2022

williballenthin mentioned this issue Mar 24, 2022

viv: x86: extract function call arguments #926

Open

williballenthin added enhancement New feature or request breaking-change introduces a breaking change that should be released in a major version labels Mar 31, 2022

joren485 mentioned this issue Feb 5, 2023

address rule tweaks in CallSignaturesPlugin mandiant/capa-rules#679

Closed

williballenthin mentioned this issue Mar 29, 2023

Add an entropy file feature to detect packed code and encrypted sections #1401

Open

mr-tz mentioned this issue Apr 19, 2023

Add proximity scope to narrow feature occurences to specific range #1453

Open

williballenthin added this to @yelhamer GSoC 2023 May 30, 2023

williballenthin moved this to todo in @yelhamer GSoC 2023 May 30, 2023

williballenthin mentioned this issue Jun 15, 2023

add dynamic features #1530

Closed

3 tasks

williballenthin mentioned this issue Jun 20, 2023

add the CAPE feature extractor #1546

Merged

6 tasks

yelhamer moved this from todo to next up in @yelhamer GSoC 2023 Aug 2, 2023

yelhamer linked a pull request Aug 3, 2023 that will close this issue

Add a call scope #1678

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new feature: function call arguments #771

new feature: function call arguments #771

mr-tz commented Sep 10, 2021 •

edited by williballenthin

Loading

Ana06 commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 31, 2022

williballenthin commented Mar 31, 2022 •

edited

Loading

williballenthin commented Feb 1, 2023

yelhamer commented Jun 15, 2023

williballenthin commented Jun 15, 2023 •

edited

Loading

0x534a commented Jun 20, 2023 •

edited

Loading

mr-tz commented Jun 21, 2023

williballenthin commented Jun 26, 2023

yelhamer commented Jul 3, 2023 •

edited

Loading

mr-tz commented Jul 3, 2023

kevoreilly commented Jul 5, 2023

kevoreilly commented Jul 5, 2023 •

edited

Loading

mr-tz commented Jul 5, 2023

0x534a commented Jul 5, 2023 •

edited

Loading

mr-tz commented Jul 6, 2023

new feature: function call arguments #771

new feature: function call arguments #771

Comments

mr-tz commented Sep 10, 2021 • edited by williballenthin Loading

Summary

Motivation

Ana06 commented Mar 22, 2022 • edited Loading

williballenthin commented Mar 22, 2022 • edited Loading

williballenthin commented Mar 22, 2022 • edited Loading

williballenthin commented Mar 22, 2022 • edited Loading

williballenthin commented Mar 31, 2022

williballenthin commented Mar 31, 2022 • edited Loading

williballenthin commented Feb 1, 2023

yelhamer commented Jun 15, 2023

williballenthin commented Jun 15, 2023 • edited Loading

0x534a commented Jun 20, 2023 • edited Loading

mr-tz commented Jun 21, 2023

williballenthin commented Jun 26, 2023

yelhamer commented Jul 3, 2023 • edited Loading

mr-tz commented Jul 3, 2023

kevoreilly commented Jul 5, 2023

kevoreilly commented Jul 5, 2023 • edited Loading

mr-tz commented Jul 5, 2023

0x534a commented Jul 5, 2023 • edited Loading

mr-tz commented Jul 6, 2023

mr-tz commented Sep 10, 2021 •

edited by williballenthin

Loading

Ana06 commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 22, 2022 •

edited

Loading

williballenthin commented Mar 31, 2022 •

edited

Loading

williballenthin commented Jun 15, 2023 •

edited

Loading

0x534a commented Jun 20, 2023 •

edited

Loading

yelhamer commented Jul 3, 2023 •

edited

Loading

kevoreilly commented Jul 5, 2023 •

edited

Loading

0x534a commented Jul 5, 2023 •

edited

Loading