Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: syft 3435 - add file components to cyclonedx bom output when file metadata is available #3539

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Dec 18, 2024

Description

This change updates syft to add cyclone-dx components with the type file when the underlying SBOM has coordinates that are mapped to their respective file metadata.

Example

export SYFT_FILE_METADATA_SELECTION="all"
go run cmd/syft/main.go -o cyclonedx-json dir:../stereoscope/ > sbom.json

Slice of new file types from output

    {
      "bom-ref": "7a9779339fe3868f",
      "type": "file",
      "name": "/Users/hal/development/stereoscope/.DS_Store",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "c918669c6ac5631bee00ceaef5f2e8a023945697"
        },
        {
          "alg": "SHA-256",
          "content": "3617159d3c3333bda8497d3c60fdda46f5ce0ab72bf3ba772dbb9128e0aca2d5"
        }
      ]
    },
    {
      "bom-ref": "ec6f846aad26adea",
      "type": "file",
      "name": "/Users/hal/development/stereoscope/.binny.yaml",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "41928b763c5478f3bc3ddde1135b0ece1536ff86"
        },
        {
          "alg": "SHA-256",
          "content": "fb6962a7f181dd5bb3e74ba637b83e91800197863b7d56cb53a6a6b96cb78668"
        }
      ]
    },

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

@spiffcs spiffcs requested a review from wagoodman December 18, 2024 20:52
@spiffcs spiffcs changed the title fix: 3435 syft - add file components to cyclonedx bom output when file metadata is available feat: syft 3435 - add file components to cyclonedx bom output when file metadata is available Dec 18, 2024
@spiffcs spiffcs requested review from a team and removed request for wagoodman December 18, 2024 20:57
fileMetadata, exists := artifacts.FileMetadata[coordinate]
// no file metadata then don't include in SBOM
if !exists {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 43 above, we make a slice with the same list as s.AllCoordinates(), but here we sometimes skip coordinates. I think this may leave unitialized file components in the output. When do we expect exists to be false?

Copy link
Contributor Author

@spiffcs spiffcs Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question - let me update this code and investigate the config state where metadata selection is none.

My understanding is there would still be coordinates, but no associated metadata, but that might not be the case and coordinates := s.AllCoordinates() would just be nothing.

Config where file.metadata.selection is none

file:
  metadata:
    # select which files should be captured by the file-metadata cataloger and included in the SBOM.
    # Options include:
    #  - "all": capture all files from the search space
    #  - "owned-by-package": capture only files owned by packages
    #  - "none", "": do not capture any files (env: SYFT_FILE_METADATA_SELECTION)
    selection: 'none'

for i, coordinate := range coordinates {
var metadata *file.Metadata
// File Info
fileMetadata, exists := artifacts.FileMetadata[coordinate]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want directories in the output? I don't know the answer. I can't think of why we would. In the current implementation the show up like this:

    {
      "bom-ref": "ab68ef0e1832e438",
      "type": "file",
      "name": "/"
    }

That doesn't tell anyone a lot; it just looks like a file with no digests. In contrast, a regular file will have content digests:

    {
      "bom-ref": "6185b4b8a7b64f56",
      "type": "file",
      "name": "/bin/[",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "912faeca732392cd21175ae53ae49624da034f1c"
        },
        {
          "alg": "SHA-256",
          "content": "25015cc97808781979490c4843c4a483019ec5efc0ecfae648c7fd4f36d18096"
        }
      ]
    },

Syft JSON has both files and directories, but the metadata says it's a directory, whereas here it just looks like an incomplete file. The spec has a type file but no type directory: https://cyclonedx.org/docs/1.6/json/#components_items_type

I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll air on the side of dropping these for now since it's a little ambiguous. I'll try to look for examples of others who might be including/dropping for comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev note: we should probably be dropping directories from other SBOM formats if we go about not including them in this change as well. It should be one way or the other for all formats.

func digestsToHashes(digests []file.Digest) *[]cyclonedx.Hash {
hashes := make([]cyclonedx.Hash, len(digests))
for i, digest := range digests {
cdxAlgo := toCycloneDXAlgorithm(digest.Algorithm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle a map miss here? What should happen if there's a digest for an algorithm outside CDX spec? For example, what if someone crafts an SBOM with xx64 hashes of files? (I'm not sure any current spec allows this, but I think right now we would end up with a blank algorithm field, which is not allowed: https://cyclonedx.org/docs/1.6/json/#components_items_hashes_items_alg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That's a good point. I was only thinking about syft generated SBOMs who's digest algorithm are a subset of the current cdx specification.

{
name: "sbom coordinates with file metadata are serialized to cdx",
sbom: sbom.SBOM{
Artifacts: sbom.Artifacts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there were a couple more unit tests, since there are a few paths through the code this doesn't exercise:

  1. Missing metadata (entering the if statement at syft/format/common/cyclonedxhelpers/to_format_model.go:49
  2. Mix of packages and files
  3. Algorithm that CDX doesn't allow in a file digest
  4. No files in SBOM, only packages
  5. Weird files, like symlinks or sockets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

cyclone-dx presenter drops files, includes only packages
2 participants