Add a call scope #1678

yelhamer · 2023-08-02T21:59:40Z

This PR is for adding the call scope.

For the time being, the call scope supports only the features: API, Number, and String.

Also, I have also modified the dynamic-related addresses to include a reference to each one's parent scope/address. This is in order to avoid the mixing of extracted features. For example, two api calls with the same id's, two threads with the same ids, and two processes with the same pid.

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

williballenthin

on the whole - very good! its so cool to read a 500 line PR and find yourself nodding along the entire way.

there are a couple places to make small changes that i've referenced inline. once these are addressed, the PR can probably be merged.

i have some personal thoughts on the terminology, but im not sure they are compelling nor important, and they don't block the logic you proposed, so i'll raise this 1:1.

williballenthin · 2023-08-03T13:03:15Z

capa/features/address.py

+class DynamicReturnAddress(Address):
    """an address from a dynamic analysis trace"""


i dont quite understand what this represents or how its used.

is this used to represent the virtual address of some code in memory during the runtime trace? does it correspond to an unrelocated part of the input sample?

Yeah, it represents the instruction in memory that called the captured api (actually, it's rather the return address). This was added by Moritz, and I believe it's necessary since an api call might be made from a dynamically-allocated executable memory region.

i may address this and the call/address comment below in a PR that i propose to you after we merge this PR. i'd like to demonstrate how i think about this and we can compare/contrast.

no need to resolve this comment now.

capa/features/extractors/base_extractor.py

capa/features/freeze/__init__.py

capa/rules/__init__.py

williballenthin · 2023-08-03T13:23:59Z

also, please add some tests that demonstrate the new rule syntax.

Co-authored-by: Willi Ballenthin <[email protected]>

williballenthin · 2023-08-07T08:07:48Z

capa/features/address.py

        return (self.process, self.tid) < (other.process, other.tid)


-class DynamicAddress(Address):
+class CallAddress(Address):


i wonder if we should call this DynamicCallAddress like we use DynamicReturnAddress below?

Hmm, I agree. I had the general future case in mind where the call scope would be also part of static analysis. But given that this address has the thread attribute right now, maybe we should use DynamicCallAddress. Perhaps we can introduce a StaticCallAddress when the call scope is added to static analysis.

capa/features/extractors/cape/call.py

Co-authored-by: Willi Ballenthin <[email protected]>

williballenthin · 2023-08-07T08:14:33Z

capa/main.py

@@ -366,24 +367,24 @@ def pbar(s, *args, **kwargs):
    return matches, meta


-def find_thread_capabilities(


i wonder if we've reached the point where we should separate out this matching logic into its own namespace, and then we can have subnamespaces for static and dynamic flavors. there's hundreds of lines of logic here, which seems excessive for the main script. we can address this in another PR.

williballenthin · 2023-08-07T08:20:02Z

scripts/show-features.py

+                    if not apis:
+                        print(f"    arguments=[{', '.join(arguments)}]")
+
+                    for cid, api in apis:
+                        print(f"call {cid}: {api}({', '.join(arguments)})")


another way to render this might be to put the entire event data into the layout structure, and then access that here. we can brainstorm about this a bit further.

tests/test_rules.py

tests/unsupported_capa_rules.txt

tests/unsupported_capa_rules.yml

williballenthin

four trivial changes (two formatting, two empty files) and then rename the DynamicCallAddress class. then ready to merge!

Co-authored-by: Willi Ballenthin <[email protected]>

williballenthin

lgtm, thanks @yelhamer

yelhamer added 2 commits August 2, 2023 22:46

Initial commit

ca2760f

update changelog

4e1527d

yelhamer added breaking-change introduces a breaking change that should be released in a major version gsoc Work related to Google Summer of Code project. dynamic related to dynamic analysis flavor labels Aug 2, 2023

add call address to show-features.py script

3c3205a

yelhamer linked an issue Aug 3, 2023 that may be closed by this pull request

new feature: function call arguments #771

Open

yelhamer requested a review from williballenthin August 3, 2023 08:15

yelhamer added 3 commits August 3, 2023 11:21

include an address' parent in comparisons

4277b4b

bugfix

4f9d245

cape/call.py: update extract_call_features() comment

7c14c51

williballenthin requested changes Aug 3, 2023

View reviewed changes

yelhamer and others added 2 commits August 3, 2023 14:38

build_statements(): fix call-scope InvalidRule message typo

eafed0f

Co-authored-by: Willi Ballenthin <[email protected]>

base_extractor.py: fix ProcessHandle documentation comment

60e94ad

Co-authored-by: Willi Ballenthin <[email protected]>

williballenthin marked this pull request as draft August 3, 2023 13:52

yelhamer and others added 4 commits August 3, 2023 15:27

Merge branch 'dynamic-feature-extraction' into call-scope

cd700a1

add call-scope tests

8b36cd1

fix test_rules.py yaml identation bug

8dc4adb

move thread-scope features into the call-scope

f461f65

yelhamer marked this pull request as ready for review August 6, 2023 17:25

yelhamer requested a review from williballenthin August 7, 2023 08:04