Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate SARIF File Of Project Vulnerability Findings #3561

Merged

Conversation

aravindparappil46
Copy link
Contributor

@aravindparappil46 aravindparappil46 commented Mar 18, 2024

Description

If request header Accept: application/sarif+json is provided to the GET Finding By Project UUID API, it will now return a SARIF file with the vulnerability findings in that project.

SARIF file is generated based on a pebble template.

Addressed Issue

Closes #909

Additional Details

In the GET /v1/finding/project/{uuid} API, added a new Media Type application/sarif+json to the @Produces, to indicate the new response Content-Type.

Added a sarif.peb template file to a new directory under templates/findings/sarif.peb

Sample bom.json and it's resulting .sarif file is attached below:
bom-and-sarif.zip

I have tried to include most of the fields from Finding.java into a meaningful field in the SARIF file based on the SARIF schema here. Those fields for which I could not find an adequate mapping I have kept under the properties object (fields such as epssScore, severityRank etc).

Have tested the generated SARIF files against the validator here with "GitHub Ingestion Rules" enabled as well. Looks like most of the violations are geared towards static analysis results (like, region, helpUri etc missing).

Feedback Needed

  1. Would appreciate if the sarif.peb template is scrutinized so that I am not making any blatant mistakes with the field mappings (from Finding.java to the SARIF schema). Perhaps there are more meaningful fields in the sarif schema that we could possibly assign a finding field to.

A picture showing the fields in Finding.java and its mapped field on the SARIF file is shown below. The green comments are the target fields in the SARIF schema.

results.properties mean that the finding field is added as a key/value pair into the properties object of the schema.

image

  1. I have added logic to calculate the results.level field of the SARIF schema like below:
if finding.vulnerability.severity in ('LOW', 'INFO')
  level = "note"
else if finding.vulnerability.severity == 'MEDIUM':
  level = "warning"
else if finding.vulnerability.severity in ('HIGH', 'CRITICAL'):
  level = "error"
else:
  level = "none"

Do let me know in case we need to modify it

  1. Even though we have set content-disposition header in the response, I was not able to confirm that a file named findings-{{uuid}}.sarif was indeed being downloaded locally via cURL or Postman testing. It just displays the sarif json on the terminal now. Please let me know in case there is a way to test the actual file download. (I hope we do not have to change the media type to octet stream instead?)

  2. For rules.name, I have concatenated finding.component.name and finding.vulnerability.cweName, hoping that it's more friendly to consume for an end-user. Please do let me know if this needs to be changed.

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

If request header `Accept: application/sarif+json` is provided to the GET
Finding By Project UUID API, it will now return a SARIF file with the vulnerability findings
in that project.

SARIF file is generated based on a pebble template

Signed-off-by: Aravind Parappil <[email protected]>
Copy link

codacy-production bot commented Mar 18, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.05% (target: -1.00%) 91.43% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (e55e950) 21527 15953 74.11%
Head commit (55300d3) 21606 (+79) 16001 (+48) 74.06% (-0.05%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3561) 35 32 91.43%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

@nscuro
Copy link
Member

nscuro commented Mar 19, 2024

Great work @aravindparappil46!

Looking at the generated SARIF, I noticed that you chose to key rules by vulnerability ID. This can lead to situation where duplicate rules are created, if a vulnerability affects more than one component. For example:

{
  "id": "CVE-2023-6378",
  "name": "logback-classic - Deserialization of Untrusted Data",
  "shortDescription": {
    "text": "A serialization vulnerability in logback receiver component part of  logback version 1.4.11 allows an attacker to mount a Denial-Of-Service  attack by sending poisoned data."
  }
},
{
  "id": "CVE-2023-6378",
  "name": "logback-core - Deserialization of Untrusted Data",
  "shortDescription": {
    "text": "A serialization vulnerability in logback receiver component part of  logback version 1.4.11 allows an attacker to mount a Denial-Of-Service  attack by sending poisoned data."
  }
},

Consider removing the component name from the name field, that way you can safely reference the same rule multiple times, for multiple components. You will need to add some logic to ensure that you're not creating duplicate rule as you iterate over the findings, which in turn might require more pre-computation before rendering the template.

The SARIF validator also appears to prefer rule.names in pascal case, e.g.

SARIF2012: runs[0].tool.driver.rules[16].name: 'spring-security-crypto - Use of Insufficiently Random Values' is not a Pascal-case identifier. For uniformity of experience across all tools that produce SARIF, the friendly name should be a single Pascal-case identifier, for example, ProvideRuleFriendlyName.

For the physicalLocation of results, it is necessary to provide a region:

GH1003: runs[0].results[0].locations[0].physicalLocation: The 'region' property is absent. GitHub Advanced Security code scanning can display the correct location only for results that provide a 'region' object with line and optional column information. At minimum, 'region.startLine' is required. 'region' can also provide 'startColumn', 'endLine', and 'endColumn', although all of those have reasonable defaults.

Looking at how Trivy does it, it just sets all fields to 1: https://github.com/aquasecurity/trivy-action/blob/062f2592684a31eb3aa050cc61e7ca1451cecd3d/test/data/image-sarif.test#L58-L63

OTOH, I'm wondering if we should use logical locations instead? https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#physical-and-logical-locations

@nscuro nscuro added the enhancement New feature or request label Mar 19, 2024
@aravindparappil46
Copy link
Contributor Author

Thanks a lot for the detailed feedback @nscuro !

Regarding name field in the SARIF - makes sense to remove the component name from there. Will do some data wrangling so that only unique vulnerabilities are populated under rules (plus can pascal-case the cweName)

Regarding physicalLocation - Yes, I feel going for logicalLocations.fullyQualifiedName is better than hacking around with region in physicalLocation

I'm done with the above changes in my local branch. Have to work on updating the test and will push soon.

The rules array in the generated SARIF should contain only unique values, so that the same rule
can be referenced by multiple results.

Rule name in the SARIF should be PascalCased. Using WordUtils from apache commons library to convert cweName to PascalCase.

Set default escape strategy to the Pebble Engine to json, to escape linebreaks and double quotes in vulnerability description.

Update test case to assert whole SARIF json.

Signed-off-by: Aravind Parappil <[email protected]>
Since fullName is required by Azure DevOps ingestion rules, added it as concatenation of app name and version.

shortDescription has been changed to be vulnerability ID and added fullDescription field as the actual vulnerability description (same as Trivy's SARIF)

Signed-off-by: Aravind Parappil <[email protected]>
@aravindparappil46
Copy link
Contributor Author

Have pushed the changes. More details in the commit messages :)

Here's an updated SARIF sample:
bom-and-sarif-v2.zip

The pascal-casing for some CWE names look weird because of characters like hyphens (e.g., Out-of-boundsWrite ) but the SARIF validator is not complaining anymore.

One thought I had was whether to conditionally add fields to the results.properties object.
For e.x., Right now, even if a vulnerability does not have a recommendation, it is still added as an empty string to its results.properties. Could add a null check on the sarif.peb for each field under properties but wanted to check if it makes sense

@nscuro
Copy link
Member

nscuro commented Mar 25, 2024

One thought I had was whether to conditionally add fields to the results.properties object.

Honestly I don't know 😅

It might make sense to omit fields with empty values, on the other hand it might make sense to have properties consistent across all findings. What's your opinion?

For the JSON assertion, removed matchers which can be replaced with string literals directly in the expected JSON string

Signed-off-by: Aravind Parappil <[email protected]>
@aravindparappil46
Copy link
Contributor Author

Thanks @nscuro ! Have removed the unwanted matchers.

For the empty properties -- I am of the opinion that we should keep the properties object consistent across all findings, so that the SARIF is more predictable and follows a consistent schema. Might even be useful for downstream tools to ingest and display all properties in a uniform manner.

Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.05% (target: -1.00%) 91.43% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (e55e950) 21527 15953 74.11%
Head commit (6b1b68c) 21606 (+79) 16001 (+48) 74.06% (-0.05%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3561) 35 32 91.43%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

@nscuro nscuro added this to the 4.11 milestone Mar 26, 2024
Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nscuro nscuro merged commit b764bc4 into DependencyTrack:master Mar 26, 2024
12 checks passed
@aravindparappil46 aravindparappil46 deleted the feature/909-export-sarif branch March 26, 2024 15:48
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SARIF support
2 participants