Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a call scope #1678

Merged

Conversation

yelhamer
Copy link
Collaborator

@yelhamer yelhamer commented Aug 2, 2023

This PR is for adding the call scope.

For the time being, the call scope supports only the features: API, Number, and String.

Also, I have also modified the dynamic-related addresses to include a reference to each one's parent scope/address. This is in order to avoid the mixing of extracted features. For example, two api calls with the same id's, two threads with the same ids, and two processes with the same pid.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@yelhamer yelhamer added breaking-change introduces a breaking change that should be released in a major version gsoc Work related to Google Summer of Code project. dynamic related to dynamic analysis flavor labels Aug 2, 2023
@yelhamer yelhamer linked an issue Aug 3, 2023 that may be closed by this pull request
Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the whole - very good! its so cool to read a 500 line PR and find yourself nodding along the entire way.

there are a couple places to make small changes that i've referenced inline. once these are addressed, the PR can probably be merged.

i have some personal thoughts on the terminology, but im not sure they are compelling nor important, and they don't block the logic you proposed, so i'll raise this 1:1.

Comment on lines +119 to 120
class DynamicReturnAddress(Address):
"""an address from a dynamic analysis trace"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont quite understand what this represents or how its used.

is this used to represent the virtual address of some code in memory during the runtime trace? does it correspond to an unrelocated part of the input sample?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it represents the instruction in memory that called the captured api (actually, it's rather the return address). This was added by Moritz, and I believe it's necessary since an api call might be made from a dynamically-allocated executable memory region.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i may address this and the call/address comment below in a PR that i propose to you after we merge this PR. i'd like to demonstrate how i think about this and we can compare/contrast.

no need to resolve this comment now.

@williballenthin
Copy link
Collaborator

also, please add some tests that demonstrate the new rule syntax.

@williballenthin williballenthin marked this pull request as draft August 3, 2023 13:52
@yelhamer yelhamer marked this pull request as ready for review August 6, 2023 17:25
return (self.process, self.tid) < (other.process, other.tid)


class DynamicAddress(Address):
class CallAddress(Address):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we should call this DynamicCallAddress like we use DynamicReturnAddress below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I agree. I had the general future case in mind where the call scope would be also part of static analysis. But given that this address has the thread attribute right now, maybe we should use DynamicCallAddress. Perhaps we can introduce a StaticCallAddress when the call scope is added to static analysis.

@@ -366,24 +367,24 @@ def pbar(s, *args, **kwargs):
return matches, meta


def find_thread_capabilities(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we've reached the point where we should separate out this matching logic into its own namespace, and then we can have subnamespaces for static and dynamic flavors. there's hundreds of lines of logic here, which seems excessive for the main script. we can address this in another PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +282 to +286
if not apis:
print(f" arguments=[{', '.join(arguments)}]")

for cid, api in apis:
print(f"call {cid}: {api}({', '.join(arguments)})")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way to render this might be to put the entire event data into the layout structure, and then access that here. we can brainstorm about this a bit further.

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

four trivial changes (two formatting, two empty files) and then rename the DynamicCallAddress class. then ready to merge!

@williballenthin

This comment was marked as outdated.

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @yelhamer

@williballenthin williballenthin merged commit 681d4fb into mandiant:dynamic-feature-extraction Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change introduces a breaking change that should be released in a major version dynamic related to dynamic analysis flavor gsoc Work related to Google Summer of Code project.
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

new feature: function call arguments
2 participants