FEAT(capa2sarif) Add SARIF conversion script from json output #2093

ReversingWithMe · 2024-05-27T14:56:36Z

SARIF gets you navigation for binary beacons from capa in any tool that supports SARIF(e.g., ghidra/radare/ida). I expect this to be a core format for binary analysis in the future.

The Static Analysis Results Interchange Format (SARIF) is a standardized format for the output of static analysis tools, which are used to evaluate source or binary for things like vulnerabilities or dataflow. SARIF enables different analysis tools to produce results in a common format that can be easily understood, integrated, and acted upon by software development tools and systems. E.g. vscode, ghidra, radare2, and github all adopt a common standard for representing types of information.

SARIF describes: the analysis being ran and results from an analysis on an artifact. Results include description of artifacts related to a run of the tool where artifact is source code, binary file, and auxiliary data files. Results also include the invocation or how the tool was run, including version, command line, any knobs/parameters. The idea being you can reconstruct where output data came from foe things that depend on parameters on specific input. Results themselves are captured via "rules" where it is some type of analysis, one could imagine a single rule identifier for all of capa, but that wouldn't be very useful. For each rule/type of information, there is a single message for the finding as well as a property bag which you can shove anything into.

This PR adds a new script that takes in a CAPA output file (~7.0) and converts the json to SARIF (a JSON with additional schema). This is a clean start from a previous PR to clean up branch history from embedding this as argument flags in capa directly. Potentially if this feature gets enough usage and is stable enough, adding a specific renderer is desired, but that may prefer doing natively instead of 3rd party deps.

This includes additional features for Radare specific and Ghidra specific current requirements. I expect both of these to get fixed over time.

Steps to test functionality

python3 -m venv venv
source venv/bin/activate
python3 -m pip install -e .[dev]
git submodule init
git submodule update
capa --json tests/data/5d7c34b6854d48d3da4f96b71550a221.exe_ > capa_result.json
python3 -m json.tool capa_result.json // test json compliance
python3 scripts/capa2sarif.py capa_result.json -r > capa_radare.sarif
python3 -m json.tool capa_radare.sarif // test json compliance
r2 tests/data/5d7c34b6854d48d3da4f96b71550a221.exe_
> sarif -i capa_radare.sarif
> sarif -l

In ghidra, similar but -g instead of -r. Enable SARIF extension from install extensions. Sarif > Read File > capa_ghidra.sarif

Interactive table spawns

Checklist

No CHANGELOG update needed
No new tests needed
No documentation update needed

…put to sarif schema, update dependencies, and update changelog

ReversingWithMe · 2024-05-27T15:02:41Z

Clean up from this PR #2036

williballenthin

This is looking really good!

Thanks for taking the time to introduce us to SARIF and provide the script. The logic looks reasonable, and aside from some nits that I noticed, I don't see any reason not to merge this soon.

One idea: rather than interacting with the capa JSON, you might want to deserialize it into the ResultDocument format that capa provides. This has full type hints that mypy checks, whereas the JSON document doesn't have any codified schema. Therefore, if we ever change the JSON document, we'd only notice bugs when this script breaks. By using the type checked ResultDocument, we can catch that with static analysis tools. That being said, I recongize this would take you a bit more work, so I understand if you can't make the changes now. We can do it at the first bug ;-)

pyproject.toml

scripts/capa2sarif.py

williballenthin · 2024-06-06T08:58:06Z

recommend also adding a trivial test to test_scripts.py to show that this script can generate output without hitting exceptions, which we can then verify in CI.

ReversingWithMe · 2024-06-07T12:10:45Z

These are reasonable will work on adding today, thanks!

…mport statements

…plied auto formatter for styling

…ns function

…ipt using existing result document

ReversingWithMe · 2024-06-07T13:56:26Z

This should address the above suggestions, thanks again! I am not sure on the test, but using an existing result document seems to be a good testcase (granted this would NOT catch breaking changes if JSON changes over time).

I am not sure a good way to take input file, run capa, run script after currently, I think this current approach is wrong though.

williballenthin · 2024-06-07T17:22:07Z

i think you could use any of the json files in capa-testfiles/rd/ as the input. We'll update those if the format ever changes. No need to invoke capa in the test to generate the json.

williballenthin

awesome!

tests/test_scripts.py

williballenthin · 2024-06-07T17:25:19Z

please resolve merge conflicts and then i'll merge!

…not updating code

williballenthin · 2024-06-11T13:02:41Z

thank you @ReversingWithMe!

…nt#2093) * feat(capa2sarif): add new sarif conversion script converting json output to sarif schema, update dependencies, and update changelog * fix(capa2sarif): removing copy and paste transcription errors * fix(capa2sarif): remove dependencies from pyproject toml to guarded import statements * chore(capa2sarif): adding node in readme specifying dependency and applied auto formatter for styling * style(capa2sarif): applied import sorting and fixed typo in invocations function * test(capa2sarif): adding simple test for capa to sarif conversion script using existing result document * style(capa2sarif): fixing typo in version string in usage * style(capa2sarif): isort failing due to reordering of typehint imports * style(capa2sarif): fixing import order as isort on local machine was not updating code --------- Co-authored-by: ReversingWithMe <[email protected]> Co-authored-by: Willi Ballenthin <[email protected]>

ReversingWithMe and others added 2 commits May 27, 2024 08:52

feat(capa2sarif): add new sarif conversion script converting json out…

a864773

…put to sarif schema, update dependencies, and update changelog

fix(capa2sarif): removing copy and paste transcription errors

eb07aea

williballenthin self-requested a review June 6, 2024 08:43

williballenthin requested changes Jun 6, 2024

View reviewed changes

ReversingWithMe added 5 commits June 7, 2024 06:24

fix(capa2sarif): remove dependencies from pyproject toml to guarded i…

f663749

…mport statements

chore(capa2sarif): adding node in readme specifying dependency and ap…

2f2cdcd

…plied auto formatter for styling

style(capa2sarif): applied import sorting and fixed typo in invocatio…

85c5021

…ns function

test(capa2sarif): adding simple test for capa to sarif conversion scr…

8b00e85

…ipt using existing result document

style(capa2sarif): fixing typo in version string in usage

2202a98

williballenthin approved these changes Jun 7, 2024

View reviewed changes

tests/test_scripts.py Show resolved Hide resolved

ReversingWithMe and others added 4 commits June 7, 2024 16:20

Merge branch 'master' into master

85aeb1e

style(capa2sarif): isort failing due to reordering of typehint imports

138cf74

style(capa2sarif): fixing import order as isort on local machine was …

a6f9069

…not updating code

Merge branch 'master' into master

093899e

williballenthin merged commit 52e24e5 into mandiant:master Jun 11, 2024
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT(capa2sarif) Add SARIF conversion script from json output #2093

FEAT(capa2sarif) Add SARIF conversion script from json output #2093

ReversingWithMe commented May 27, 2024

ReversingWithMe commented May 27, 2024

williballenthin left a comment

williballenthin commented Jun 6, 2024

ReversingWithMe commented Jun 7, 2024 •

edited

Loading

ReversingWithMe commented Jun 7, 2024 •

edited

Loading

williballenthin commented Jun 7, 2024

williballenthin left a comment

williballenthin commented Jun 7, 2024

williballenthin commented Jun 11, 2024

FEAT(capa2sarif) Add SARIF conversion script from json output #2093

FEAT(capa2sarif) Add SARIF conversion script from json output #2093

Conversation

ReversingWithMe commented May 27, 2024

Steps to test functionality

Checklist

ReversingWithMe commented May 27, 2024

williballenthin left a comment

Choose a reason for hiding this comment

williballenthin commented Jun 6, 2024

ReversingWithMe commented Jun 7, 2024 • edited Loading

ReversingWithMe commented Jun 7, 2024 • edited Loading

williballenthin commented Jun 7, 2024

williballenthin left a comment

Choose a reason for hiding this comment

williballenthin commented Jun 7, 2024

williballenthin commented Jun 11, 2024

ReversingWithMe commented Jun 7, 2024 •

edited

Loading

ReversingWithMe commented Jun 7, 2024 •

edited

Loading