-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a call scope #1678
Add a call scope #1678
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the whole - very good! its so cool to read a 500 line PR and find yourself nodding along the entire way.
there are a couple places to make small changes that i've referenced inline. once these are addressed, the PR can probably be merged.
i have some personal thoughts on the terminology, but im not sure they are compelling nor important, and they don't block the logic you proposed, so i'll raise this 1:1.
class DynamicReturnAddress(Address): | ||
"""an address from a dynamic analysis trace""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont quite understand what this represents or how its used.
is this used to represent the virtual address of some code in memory during the runtime trace? does it correspond to an unrelocated part of the input sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it represents the instruction in memory that called the captured api (actually, it's rather the return address). This was added by Moritz, and I believe it's necessary since an api call might be made from a dynamically-allocated executable memory region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i may address this and the call/address comment below in a PR that i propose to you after we merge this PR. i'd like to demonstrate how i think about this and we can compare/contrast.
no need to resolve this comment now.
also, please add some tests that demonstrate the new rule syntax. |
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
capa/features/address.py
Outdated
return (self.process, self.tid) < (other.process, other.tid) | ||
|
||
|
||
class DynamicAddress(Address): | ||
class CallAddress(Address): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we should call this DynamicCallAddress
like we use DynamicReturnAddress
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I agree. I had the general future case in mind where the call scope would be also part of static analysis. But given that this address has the thread attribute right now, maybe we should use DynamicCallAddress
. Perhaps we can introduce a StaticCallAddress
when the call scope is added to static analysis.
Co-authored-by: Willi Ballenthin <[email protected]>
@@ -366,24 +367,24 @@ def pbar(s, *args, **kwargs): | |||
return matches, meta | |||
|
|||
|
|||
def find_thread_capabilities( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we've reached the point where we should separate out this matching logic into its own namespace, and then we can have subnamespaces for static and dynamic flavors. there's hundreds of lines of logic here, which seems excessive for the main script. we can address this in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not apis: | ||
print(f" arguments=[{', '.join(arguments)}]") | ||
|
||
for cid, api in apis: | ||
print(f"call {cid}: {api}({', '.join(arguments)})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another way to render this might be to put the entire event data into the layout structure, and then access that here. we can brainstorm about this a bit further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
four trivial changes (two formatting, two empty files) and then rename the DynamicCallAddress class. then ready to merge!
This comment was marked as outdated.
This comment was marked as outdated.
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @yelhamer
This PR is for adding the call scope.
For the time being, the call scope supports only the features:
API
,Number
, andString
.Also, I have also modified the dynamic-related addresses to include a reference to each one's parent scope/address. This is in order to avoid the mixing of extracted features. For example, two api calls with the same id's, two threads with the same ids, and two processes with the same pid.
Checklist