Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sarif output #2036

Closed
wants to merge 6 commits into from
Closed

Feature/sarif output #2036

wants to merge 6 commits into from

Conversation

ReversingWithMe
Copy link
Contributor

Add sarif rendering which adapts existing json rendering logic. Additional code for closer to Ghidra compatible with built-in sarif module.

Output of this file passes compliance checks from microsoft, but will fail other parsers like trail of bits Sarif Explorer.

There would be several things to do better in this code style-wise, but testing water on whether this is even of interest, or if the idea is worth keep and re-implementing from scratch.

Checklist

  • Update changelog
  • new tests
  • update documentation

Copy link

google-cla bot commented Mar 24, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@williballenthin
Copy link
Collaborator

Hey @ReversingWithMe, thanks!

Can you share a few sentences about SARIF and how you use it? I've seen it referenced a few times recently but haven't tries it myself.

@williballenthin
Copy link
Collaborator

I wonder if it's best to add SARIF directly to capa output, or add a script (found in ./scripts) that can convert from the JSON output format to SARIF. The tradeoffs having to do with the prevalence of SARIF and how many users would use this option.

@ReversingWithMe
Copy link
Contributor Author

ReversingWithMe commented Mar 25, 2024

Sure!

The Static Analysis Results Interchange Format (SARIF) is a standardized format for the output of static analysis tools, which are used to evaluate source or binary for things like vulnerabilities or dataflow. SARIF enables different analysis tools to produce results in a common format that can be easily understood, integrated, and acted upon by software development tools and systems. E.g. vscode, ghidra, radare2, and github all adopt a common standard for representing types of information.

Sarif describes: the analysis being ran and results from an analysis on an artifact. Results include description of artifacts related to a run of the tool where artifact is source code, binary file, and auxiliary data files. Results also include the invocation or how the tool was run, including version, command line, any knobs/parameters. The idea being you can reconstruct where output data came from foe things that depend on parameters on specific input. Results themselves are captured via "rules" where it is some type of analysis, one could imagine a single rule identifier for all of capa, but that wouldn't be very useful. For each rule/type of information, there is a single message for the finding as well as a property bag which you can shove anything into.

So from this, given a sarif file, all you need to know how to handle is the property bag for each ruleid found in the output, the rest is reusable. You can see in the python code of this PR the 3-4 major chunks and how they relate to capas json.

The primary reason someone would use SARIF is to facilitate the aggregation, comparison, and management of analysis results from multiple tools, improving the efficiency of identifying, understanding, and addressing potential software issues. In other words, capa adopting SARIF means that any tool that understands sarif only needs special logic around types of results, but can skip parsing and trying to understand capa schema.

The approach here was trying to get as close as possible to direct capa output, but pydantic serialization to json got in the way. The way I am json decoding a few times isn't great.

@ReversingWithMe
Copy link
Contributor Author

trailofbits/vscode-sarif-explorer#12

Issue includes an example output file from this code. I can also upload it here. The invocation part of json says which one but I think it's just --sarif flag.

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 26, 2024

add a script (found in ./scripts) that can convert from the JSON output format to SARIF

I'm also more in favor of this approach.

@ReversingWithMe ReversingWithMe closed this by deleting the head repository May 26, 2024
@ReversingWithMe
Copy link
Contributor Author

cleaning up branch to open a new PR going the script route

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants