-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize cache and extractor interface #1470
Optimize cache and extractor interface #1470
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
CHANGELOG updated or no update needed, thanks! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @stevemk14ebr - good improvements here! I've left comments for your review.
By delaying construction of the rulegen extractor and cache the one time file analysis is performed everytime the analyze button is clicked by the user in the rulegen dialog. This is moderately slow. Agree that this saves the user from the file analysis overhead if they don't use the rulegen feature, but it does slow down the rulegen feature for every new function it is used for. |
This new commit fixes the issue mentioned above. I construct specifically just the rulegen extractor and cache when the tab is changed to the rulegen feature. This makes analyze on each new feature fast for the rule generator. |
Totally agree with the points you make here. My intent is for the initialization of the rulegen extractor and cache to only execute once at the beginning of Lines 962 to 968 in 332853e
So from a UX perspective the one-time final analysis kicks off the first time a user clicks the What are you thoughts @stevemk14ebr ? |
Ah yes I see what you want. Done in the latest commit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you for the updates @stevemk14ebr ! Requested minor changes. Also, looks like formatting is failing. Once you're done w/ the requested changes please run the following:
$ python -m black -l 120 --extend-exclude ".*_pb2.py" --check .
Remove --check
to format identified files.
@stevemk14ebr recent changes look good but
|
…/capa into cache_optimizations
done. |
@stevemk14ebr our code style checks appear to be failing as well. Please run the following command locally to identify and address the issues:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thank you @stevemk14ebr !
* Optimize cache and extractor interface * Update changelog * Run linter formatters * Implement review feedback * Move rulegen extractor construction to tab change * Change rulegen cache construction behavior * Adjust return values for CR, format * Fix mypy errors * Format * Fix merge --------- Co-authored-by: Stephen Eckels <[email protected]>
These changed were discussed with @mike-hunhoff
Previously the
CapaRuleGenFeatureCache
accepted a list of function handles and then internally invoked_find_function_and_below_features
on that list so that later calls such asget_all_function_features
could read out cached features. Unfortunately, knowing the list of functions you want features out of is not always easy to know before hand (input is from UI, exploring recursively, etc), and in those cases you have to construct a newCapaRuleGenFeatureCache
each time per function you'd like features from. This was a problem, because_find_global_features
and_find_file_features()
was also called in the cache constructor - with_find_file_features()
in particular having non-trivial overhead. When you must continously extract features out of many functions the combined overhead of calling_find_file_features
in the constructor and throwing away the function cache multiple times generates significant overhead in some applications.The interface has therefore been changed so that functions such as
get_all_function_features
now use a helper_get_cached_func_node
which will return cached function results if they exist OR populate the cache for that function at call time. A user of this interface can now constructCapaRuleGenFeatureCache
one time at a higher global scope, where expensive_find_global_features
and_find_file_features()
will occur and callget_all_function_features
as needed later.This simplifies the tracking of the extractor and cache states as well in the
form.py
since now there is only one cache instance and it's re-used everywhere.