-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add dynamic features #1530
add dynamic features #1530
Conversation
Co-authored-by: Moritz <[email protected]>
Co-authored-by: Moritz <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
add a Dynamic extractor interface
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
* Optimize cache and extractor interface * Update changelog * Run linter formatters * Implement review feedback * Move rulegen extractor construction to tab change * Change rulegen cache construction behavior * Adjust return values for CR, format * Fix mypy errors * Format * Fix merge --------- Co-authored-by: Stephen Eckels <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
Please always add a changelog entry so that lints and tests run (or we may have to enable that first for this branch). |
23dc3f2
to
6b95336
Compare
I've rebased the dynamic-feature-extraction branch to master so we only get the relevant diff. |
Pretty sure we have to include the branch here: capa/.github/workflows/tests.yml Lines 3 to 7 in 7ef78fd
|
updated CI configuration in 8119aa6 |
CHANGELOG updated or no update needed, thanks! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would like to get the designs for these feature fleshed out in issues and documented before we start on the implementations. see inline comments that suggest where to do this.
when we go to add any of these features, it should be done in a standalone PR, so that we can merge or reject without affecting other parallel work. keeping them together in the same PR makes it difficult to merge incremental updates.
furthermore, i would like to focus on getting a basic proof of concept dynamic analysis working first, end to end, before adding novel features. otherwise, we risk taking on too many tasks and not finishing the 1st priority goal.
i'd suggest that we close this PR until we have the underlying designs finished.
class Registry(String): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss the requirements and sketch a design for how this feature should look and act in #1558
this should include how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used.
once we're happy with the design, then please open a standalone PR with the Registry feature.
class Filename(String): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss the requirements and sketch a design for how this feature should look and act in #1559
this should include how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used.
once we're happy with the design, then please open a standalone PR with the filename feature.
class Mutex(String): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss the requirements and sketch a design for how this feature should look and act in #1560
this should include how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used.
once we're happy with the design, then please open a standalone PR with the mutex feature.
def __init__(self, signature: str, description=None): | ||
if signature.isidentifier(): | ||
# api call is in the legacy format | ||
super().__init__(signature, description=description) | ||
self.args = {} | ||
self.ret = False | ||
else: | ||
# api call is in the strace format and therefore has to be parsed | ||
name, self.args, self.ret = self.parse_signature(signature) | ||
super().__init__(name, description=description) | ||
|
||
# store the original signature for hashing purposes | ||
self.signature = signature | ||
|
||
def __hash__(self): | ||
return hash(self.signature) | ||
|
||
def __eq__(self, other): | ||
if not isinstance(other, API): | ||
return False | ||
|
||
assert(isinstance(other, API)) | ||
if {} in (self.args, other.args) or False in (self.ret, other.ret): | ||
# Legacy API feature | ||
return super().__eq__(other) | ||
|
||
# API call with arguments | ||
return super().__eq__(other) and self.args == other.args and self.ret == other.ret | ||
|
||
def parse_signature(self, signature: str) -> Tuple[str, Optional[Dict[str, str]], Optional[str]]: | ||
# todo: optimize this method and improve the code quality | ||
import re | ||
|
||
args = ret = False | ||
|
||
match = re.findall(r"(.+\(.*\)) ?=? ?([^=]*)", signature) | ||
if not match: | ||
return "", None, None | ||
if len(match[0]) == 2: | ||
ret = match[0][1] | ||
|
||
match = re.findall(r"(.*)\((.*)\)", match[0][0]) | ||
if len(match[0]) == 2: | ||
args = (match[0][1]+", ").split(", ") | ||
map(lambda x: {f"arg{x[0]}": x[1]}, enumerate(args)) | ||
args = [{} | arg for arg in args][0] | ||
|
||
return match[0][0], args, ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please contribute to the existing discussion in #771 that enumerates the requirements and sketches a design for how this feature should look and act.
this issue needs to finalize decisions on how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used. we have some of this already and welcome your new perspective.
once we're happy with the design, then we can figure out if/when to implement. given that it may require a new scope, we should do this in a separate, standalone PR. we should consider doing this after the basic dynamic analysis support has been added to capa, so as not to get distracted from that higher priority work.
I agree that getting dynamic feature extraction working should indeed be our priority right now; so we can go ahead and close this PR while the corresponding designs are finished. |
closing as discussed |
PR for adding support for dynamic features to capa.
target features thus far: add arguments to the API feature, add a registry/file/mutex feature(s) to allow authors higher selectivity, add a feature for network parsing.
See discussion: #1517