Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dynamic features #1530

Closed

Conversation

yelhamer
Copy link
Collaborator

@yelhamer yelhamer commented Jun 11, 2023

PR for adding support for dynamic features to capa.

target features thus far: add arguments to the API feature, add a registry/file/mutex feature(s) to allow authors higher selectivity, add a feature for network parsing.

See discussion: #1517

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

yelhamer and others added 30 commits June 8, 2023 23:15
Co-authored-by: Moritz <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
* Optimize cache and extractor interface

* Update changelog

* Run linter formatters

* Implement review feedback

* Move rulegen extractor construction to tab change

* Change rulegen cache construction behavior

* Adjust return values for CR, format

* Fix mypy errors

* Format

* Fix merge

---------

Co-authored-by: Stephen Eckels <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@mr-tz
Copy link
Collaborator

mr-tz commented Jun 15, 2023

Please always add a changelog entry so that lints and tests run (or we may have to enable that first for this branch).

@mr-tz mr-tz force-pushed the dynamic-feature-extraction branch from 23dc3f2 to 6b95336 Compare June 15, 2023 09:41
@mr-tz
Copy link
Collaborator

mr-tz commented Jun 15, 2023

I've rebased the dynamic-feature-extraction branch to master so we only get the relevant diff.

@mr-tz
Copy link
Collaborator

mr-tz commented Jun 15, 2023

Pretty sure we have to include the branch here:

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

@williballenthin
Copy link
Collaborator

updated CI configuration in 8119aa6

@github-actions github-actions bot dismissed their stale review June 15, 2023 10:17

CHANGELOG updated or no update needed, thanks! 😄

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to get the designs for these feature fleshed out in issues and documented before we start on the implementations. see inline comments that suggest where to do this.

when we go to add any of these features, it should be done in a standalone PR, so that we can merge or reject without affecting other parallel work. keeping them together in the same PR makes it difficult to merge incremental updates.

furthermore, i would like to focus on getting a basic proof of concept dynamic analysis working first, end to end, before adding novel features. otherwise, we risk taking on too many tasks and not finishing the 1st priority goal.

i'd suggest that we close this PR until we have the underlying designs finished.

Comment on lines +275 to +276
class Registry(String):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss the requirements and sketch a design for how this feature should look and act in #1558

this should include how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used.

once we're happy with the design, then please open a standalone PR with the Registry feature.

Comment on lines +279 to +280
class Filename(String):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss the requirements and sketch a design for how this feature should look and act in #1559

this should include how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used.

once we're happy with the design, then please open a standalone PR with the filename feature.

Comment on lines +283 to +284
class Mutex(String):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss the requirements and sketch a design for how this feature should look and act in #1560

this should include how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used.

once we're happy with the design, then please open a standalone PR with the mutex feature.

Comment on lines +24 to +71
def __init__(self, signature: str, description=None):
if signature.isidentifier():
# api call is in the legacy format
super().__init__(signature, description=description)
self.args = {}
self.ret = False
else:
# api call is in the strace format and therefore has to be parsed
name, self.args, self.ret = self.parse_signature(signature)
super().__init__(name, description=description)

# store the original signature for hashing purposes
self.signature = signature

def __hash__(self):
return hash(self.signature)

def __eq__(self, other):
if not isinstance(other, API):
return False

assert(isinstance(other, API))
if {} in (self.args, other.args) or False in (self.ret, other.ret):
# Legacy API feature
return super().__eq__(other)

# API call with arguments
return super().__eq__(other) and self.args == other.args and self.ret == other.ret

def parse_signature(self, signature: str) -> Tuple[str, Optional[Dict[str, str]], Optional[str]]:
# todo: optimize this method and improve the code quality
import re

args = ret = False

match = re.findall(r"(.+\(.*\)) ?=? ?([^=]*)", signature)
if not match:
return "", None, None
if len(match[0]) == 2:
ret = match[0][1]

match = re.findall(r"(.*)\((.*)\)", match[0][0])
if len(match[0]) == 2:
args = (match[0][1]+", ").split(", ")
map(lambda x: {f"arg{x[0]}": x[1]}, enumerate(args))
args = [{} | arg for arg in args][0]

return match[0][0], args, ret
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please contribute to the existing discussion in #771 that enumerates the requirements and sketches a design for how this feature should look and act.

this issue needs to finalize decisions on how the rule syntax changes, the format of the feature and properties of the feature, and at least two example rules showing how the feature would be used. we have some of this already and welcome your new perspective.

once we're happy with the design, then we can figure out if/when to implement. given that it may require a new scope, we should do this in a separate, standalone PR. we should consider doing this after the basic dynamic analysis support has been added to capa, so as not to get distracted from that higher priority work.

@yelhamer
Copy link
Collaborator Author

I agree that getting dynamic feature extraction working should indeed be our priority right now; so we can go ahead and close this PR while the corresponding designs are finished.

@williballenthin
Copy link
Collaborator

closing as discussed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants