Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated SARIF is incompatible with codeql schema #961

Closed
pamelafox opened this issue Apr 4, 2023 · 10 comments
Closed

Generated SARIF is incompatible with codeql schema #961

pamelafox opened this issue Apr 4, 2023 · 10 comments

Comments

@pamelafox
Copy link
Member

The generated sarif file produces output like:

locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "src/flaskapp/__init__.py",
                  "index": 0
                },
                "region": {
                  "snippet": {
                    "text": "<html>"
                  }
                }
              },
              "logicalLocations": [
                {
                  "fullyQualifiedName": "html",
                  "kind": "element"
                }
              ]
            }
          ]
        },

However, the Github CodeQL SARIF upload requires additional fields in the region to determine a line number. See schema:
https://github.com/oasis-tcs/sarif-spec/blob/main/Documents/CommitteeSpecifications/2.1.0/sarif-schema-2.1.0.json#L1709

Here is my sophisticated post-processing step for adding a line number:

sed -i 's/"snippet"/"startLine":1,"startColumn":1,"endColumn":1,"snippet"/g' src/tests/axe_results.sarif

That works, and results in scan results that look like this:

Screenshot 2023-04-04 at 11 42 09 AM

Obviously that is not very accurate, but it at least makes CodeQL happy.
Perhaps I should also open a pull request on https://github.com/github/codeql-action/issues?q=is%3Aissue+is%3Aopen+location for making those location fields optional. I'm not sure what the right approach is here, just opening this issue for discussion and posterity.

@microsoft-github-policy-service microsoft-github-policy-service bot added the status: new This issue is new and requires triage by DRI. label Apr 4, 2023
@DaveTryon
Copy link
Contributor

@pamelafox, thanks for opening this. Given the choice between an optional field and a mandatory field with meaningless data, the optional field seems like a better option. Of course, that assumes that all downstream components can properly handle the optional field, which may take some time to achieve. Adding the meaningless data isn't necessarily difficult, but it feels like it's fixing the wrong problem.

@DaveTryon DaveTryon added status: ready for triage This issue is ready to be triaged by the Accessibility Insights team. and removed status: new This issue is new and requires triage by DRI. labels Apr 10, 2023
@microsoft-github-policy-service
Copy link
Contributor

This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!

@pamelafox
Copy link
Member Author

Yeah, that's totally fair. It might make more sense to encourage GitHub to support CodeQL results that aren't tied to a line number of a checked in file, since it seems that Azure DevOps supports that. I think in the meantime, I will experiment with surfacing Axe results in Pull Request comments instead.

@dbjorge
Copy link
Contributor

dbjorge commented Apr 10, 2023

I agree with Dave - these results don't have any way to meaningfully include a line number (they often correspond to dynamically injected DOM content that doesn't correspond to any particular line of an HTML file), so my preference would be to avoid populating with meaningless data.

I could maybe imagine us moving the snippet out of location.physicalLocation.region.snippet and into location.message to adhere better to the letter of the spec, but that would be a breaking change for the ADO sarif viewer. I think it would make more sense to ask CodeQL to handle the case where a result has a snippet but no line information.

@JGibson2019 JGibson2019 added status: ready for work This issue is ready to be worked on. and removed status: ready for triage This issue is ready to be triaged by the Accessibility Insights team. labels Apr 25, 2023
@dbjorge
Copy link
Contributor

dbjorge commented Apr 25, 2023

We discussed this internally and we think that populating with the placeholder info by default like Pamela's original suggestion is probably the least-bad engineering tradeoff between "not breaking backcompat with the ADO viewer" and "not forcing every user in Pamela's position to make their own workaround".

@DaveTryon DaveTryon assigned DaveTryon and unassigned DaveTryon and nang4ally May 2, 2023
@DaveTryon DaveTryon added status: active This issue is currently being worked on by someone. and removed status: ready for work This issue is ready to be worked on. labels May 4, 2023
@DaveTryon
Copy link
Contributor

@pamelafox, the SARIF spec only requires the startLine property. Do the CodeQL tools work as expected if you omit the startColumn and endColumn properties? We would expect them to but wanted to check with you first

@pamelafox
Copy link
Member Author

I believe GitHub CodeQL has a higher threshold for required properties, per:
https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#physicallocation-object

I'm not sure if I tried removing startColumn/endColumn. I've pinged a GitHub engineer on CodeQL team to see if they can confirm what's required. (I ended up removing this from my workflow, so I could try it out myself, but it would take a bit of time).

@aegilops
Copy link

Pamela is totally right - GitHub Code Scanning goes beyond the SARIF spec in what is required.

I'm not quite a CodeQL engineer - I'm in Field Services - but I asked some of our CodeQL Engineers about this to make sure I had things straight for you.

Looking at the SARIF spec, it's invalid to provide a region without any location information, regardless Code Scanning's additional validation/requirements.

When you are uploading to Code Scanning with the REST API you can set a flag to ask for the SARIF to be validated according to its requirements.

The SARIF Validator website does also have a "GitHub ingestion rules" checkbox that should apply the same validation.

Code Scanning is very much set up to display results of problems that are localisable to the code itself, vs. problems that appear in artefacts produced by the code; we have a similar issue with how to display results of black box dynamic analysis that don't map obviously to source locations.

DaveTryon added a commit that referenced this issue May 22, 2023
#### Details

As called out in #961, our SARIF doesn't have a fully formed location.
Since we don't have meaningful data for this, the "least bad" solution
was to insert dummy data for the `startLine` property. This PR includes
2 code changes, plus a bunch of refreshed SARIF files generated via the
CLI tool. I noticed that the refreshed version of
`basic-axe-3.2.2.sarif` is pretty-printed differently than what we had
before. I'm guessing that the tool shifted at some point and that this
is the first time the file has been regenerated with the current tool.

##### Motivation

Address #961 for better tool compatibility

##### Context

<!-- Are there any parts that you've intentionally left out-of-scope for
a later PR to handle? -->

<!-- Were there any alternative approaches you considered? What
tradeoffs did you consider? -->

#### Pull request checklist
<!-- If a checklist item is not applicable to this change, write "n/a"
in the checkbox -->
- [x] PR title respects [Conventional
Commits](https://www.conventionalcommits.org) (starts with `fix:`,
`feat:`, etc, and is suitable for user-facing release notes)
- [x] PR contains no breaking changes, **OR** description of both PR
**and final merge commit** starts with `BREAKING CHANGE:`
- [x] (if applicable) Addresses issue: #961
- [n/a] Added relevant unit tests for your changes
- [x] Ran `yarn precheckin`
- [x] Verified code coverage for the changes made
@DaveTryon DaveTryon added status: resolved This issue has been merged into main and deployed to canary. and removed status: active This issue is currently being worked on by someone. labels May 22, 2023
@DaveTryon
Copy link
Contributor

@pamelafox, this shipped in https://github.com/microsoft/axe-sarif-converter/releases/tag/v2.10.1. We're closing this on our end

@DaveTryon DaveTryon removed their assignment Jun 7, 2023
@DaveTryon DaveTryon removed the status: resolved This issue has been merged into main and deployed to canary. label Jun 7, 2023
@pamelafox
Copy link
Member Author

Thank you, @DaveTryon! I'll likely be mentioning axe-sarif-converter in an upcoming talk (as well as showing other ways to surface axe results, since code quality isn't always the right location).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants