Feature/sarif output #2036

ReversingWithMe · 2024-03-24T18:50:32Z

Add sarif rendering which adapts existing json rendering logic. Additional code for closer to Ghidra compatible with built-in sarif module.

Output of this file passes compliance checks from microsoft, but will fail other parsers like trail of bits Sarif Explorer.

There would be several things to do better in this code style-wise, but testing water on whether this is even of interest, or if the idea is worth keep and re-implementing from scratch.

Checklist

Update changelog
new tests
update documentation

…mpatible sarif output

google-cla · 2024-03-24T18:50:36Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

williballenthin · 2024-03-25T10:27:52Z

Hey @ReversingWithMe, thanks!

Can you share a few sentences about SARIF and how you use it? I've seen it referenced a few times recently but haven't tries it myself.

williballenthin · 2024-03-25T10:29:24Z

I wonder if it's best to add SARIF directly to capa output, or add a script (found in ./scripts) that can convert from the JSON output format to SARIF. The tradeoffs having to do with the prevalence of SARIF and how many users would use this option.

ReversingWithMe · 2024-03-25T13:06:34Z

Sure!

The Static Analysis Results Interchange Format (SARIF) is a standardized format for the output of static analysis tools, which are used to evaluate source or binary for things like vulnerabilities or dataflow. SARIF enables different analysis tools to produce results in a common format that can be easily understood, integrated, and acted upon by software development tools and systems. E.g. vscode, ghidra, radare2, and github all adopt a common standard for representing types of information.

Sarif describes: the analysis being ran and results from an analysis on an artifact. Results include description of artifacts related to a run of the tool where artifact is source code, binary file, and auxiliary data files. Results also include the invocation or how the tool was run, including version, command line, any knobs/parameters. The idea being you can reconstruct where output data came from foe things that depend on parameters on specific input. Results themselves are captured via "rules" where it is some type of analysis, one could imagine a single rule identifier for all of capa, but that wouldn't be very useful. For each rule/type of information, there is a single message for the finding as well as a property bag which you can shove anything into.

So from this, given a sarif file, all you need to know how to handle is the property bag for each ruleid found in the output, the rest is reusable. You can see in the python code of this PR the 3-4 major chunks and how they relate to capas json.

The primary reason someone would use SARIF is to facilitate the aggregation, comparison, and management of analysis results from multiple tools, improving the efficiency of identifying, understanding, and addressing potential software issues. In other words, capa adopting SARIF means that any tool that understands sarif only needs special logic around types of results, but can skip parsing and trying to understand capa schema.

The approach here was trying to get as close as possible to direct capa output, but pydantic serialization to json got in the way. The way I am json decoding a few times isn't great.

ReversingWithMe · 2024-03-25T13:24:09Z

trailofbits/vscode-sarif-explorer#12

Issue includes an example output file from this code. I can also upload it here. The invocation part of json says which one but I think it's just --sarif flag.

mr-tz · 2024-03-26T12:26:13Z

add a script (found in ./scripts) that can convert from the JSON output format to SARIF

I'm also more in favor of this approach.

ReversingWithMe · 2024-05-26T18:22:57Z

cleaning up branch to open a new PR going the script route

ReversingWithMe added 4 commits March 24, 2024 11:54

feat(sarif-rendering): add support for generating sarif and ghidra-co…

819a340

…mpatible sarif output

chore(sarif-rendering): run black and fix dependencies for install

41ea0c4

chore(changelog): update changelog with sarif output changes

15225c8

style(sarif-rendering): clean up imports in sarif render code

bdbee30

ReversingWithMe and others added 2 commits March 24, 2024 12:57

style(sarif-rendering): fixing additional extra import of datetime

2519b3c

Merge branch 'master' into feature/sarif-output

222a856

ReversingWithMe closed this by deleting the head repository May 26, 2024

ReversingWithMe mentioned this pull request May 27, 2024

FEAT(capa2sarif) Add SARIF conversion script from json output #2093

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/sarif output #2036

Feature/sarif output #2036

ReversingWithMe commented Mar 24, 2024

google-cla bot commented Mar 24, 2024

williballenthin commented Mar 25, 2024

williballenthin commented Mar 25, 2024

ReversingWithMe commented Mar 25, 2024 •

edited

Loading

ReversingWithMe commented Mar 25, 2024

mr-tz commented Mar 26, 2024

ReversingWithMe commented May 26, 2024

Feature/sarif output #2036

Feature/sarif output #2036

Conversation

ReversingWithMe commented Mar 24, 2024

Checklist

google-cla bot commented Mar 24, 2024

williballenthin commented Mar 25, 2024

williballenthin commented Mar 25, 2024

ReversingWithMe commented Mar 25, 2024 • edited Loading

ReversingWithMe commented Mar 25, 2024

mr-tz commented Mar 26, 2024

ReversingWithMe commented May 26, 2024

ReversingWithMe commented Mar 25, 2024 •

edited

Loading