Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: syft 3435 - add file components to cyclonedx bom output when file metadata is available #3539

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions syft/format/common/cyclonedxhelpers/to_format_model.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import (
"github.com/anchore/syft/internal/log"
"github.com/anchore/syft/syft/artifact"
"github.com/anchore/syft/syft/cpe"
"github.com/anchore/syft/syft/file"
"github.com/anchore/syft/syft/format/internal/cyclonedxutil/helpers"
"github.com/anchore/syft/syft/linux"
"github.com/anchore/syft/syft/pkg"
Expand All @@ -28,12 +29,42 @@ func ToFormatModel(s sbom.SBOM) *cyclonedx.BOM {
cdxBOM.SerialNumber = uuid.New().URN()
cdxBOM.Metadata = toBomDescriptor(s.Descriptor.Name, s.Descriptor.Version, s.Source)

// Packages
packages := s.Artifacts.Packages.Sorted()
components := make([]cyclonedx.Component, len(packages))
for i, p := range packages {
components[i] = helpers.EncodeComponent(p)
}
components = append(components, toOSComponent(s.Artifacts.LinuxDistribution)...)

// Files
artifacts := s.Artifacts
coordinates := s.AllCoordinates()
fileComponents := make([]cyclonedx.Component, len(coordinates))
for i, coordinate := range coordinates {
var metadata *file.Metadata
// File Info
fileMetadata, exists := artifacts.FileMetadata[coordinate]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want directories in the output? I don't know the answer. I can't think of why we would. In the current implementation the show up like this:

    {
      "bom-ref": "ab68ef0e1832e438",
      "type": "file",
      "name": "/"
    }

That doesn't tell anyone a lot; it just looks like a file with no digests. In contrast, a regular file will have content digests:

    {
      "bom-ref": "6185b4b8a7b64f56",
      "type": "file",
      "name": "/bin/[",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "912faeca732392cd21175ae53ae49624da034f1c"
        },
        {
          "alg": "SHA-256",
          "content": "25015cc97808781979490c4843c4a483019ec5efc0ecfae648c7fd4f36d18096"
        }
      ]
    },

Syft JSON has both files and directories, but the metadata says it's a directory, whereas here it just looks like an incomplete file. The spec has a type file but no type directory: https://cyclonedx.org/docs/1.6/json/#components_items_type

I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll air on the side of dropping these for now since it's a little ambiguous. I'll try to look for examples of others who might be including/dropping for comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev note: we should probably be dropping directories from other SBOM formats if we go about not including them in this change as well. It should be one way or the other for all formats.

// no file metadata then don't include in SBOM
if !exists {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 43 above, we make a slice with the same list as s.AllCoordinates(), but here we sometimes skip coordinates. I think this may leave unitialized file components in the output. When do we expect exists to be false?

Copy link
Contributor Author

@spiffcs spiffcs Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question - let me update this code and investigate the config state where metadata selection is none.

My understanding is there would still be coordinates, but no associated metadata, but that might not be the case and coordinates := s.AllCoordinates() would just be nothing.

Config where file.metadata.selection is none

file:
  metadata:
    # select which files should be captured by the file-metadata cataloger and included in the SBOM.
    # Options include:
    #  - "all": capture all files from the search space
    #  - "owned-by-package": capture only files owned by packages
    #  - "none", "": do not capture any files (env: SYFT_FILE_METADATA_SELECTION)
    selection: 'none'

}
metadata = &fileMetadata

// Digests
var digests []file.Digest
if digestsForLocation, exists := artifacts.FileDigests[coordinate]; exists {
digests = digestsForLocation
}

fileComponents[i] = cyclonedx.Component{
BOMRef: string(coordinate.ID()),
Type: cyclonedx.ComponentTypeFile,
Name: metadata.Path,
Hashes: digestsToHashes(digests),
}
}
components = append(components, fileComponents...)
cdxBOM.Components = &components

dependencies := toDependencies(s.Relationships)
Expand All @@ -44,6 +75,33 @@ func ToFormatModel(s sbom.SBOM) *cyclonedx.BOM {
return cdxBOM
}

func digestsToHashes(digests []file.Digest) *[]cyclonedx.Hash {
hashes := make([]cyclonedx.Hash, len(digests))
for i, digest := range digests {
cdxAlgo := toCycloneDXAlgorithm(digest.Algorithm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle a map miss here? What should happen if there's a digest for an algorithm outside CDX spec? For example, what if someone crafts an SBOM with xx64 hashes of files? (I'm not sure any current spec allows this, but I think right now we would end up with a blank algorithm field, which is not allowed: https://cyclonedx.org/docs/1.6/json/#components_items_hashes_items_alg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That's a good point. I was only thinking about syft generated SBOMs who's digest algorithm are a subset of the current cdx specification.

hashes[i] = cyclonedx.Hash{
Algorithm: cdxAlgo,
Value: digest.Value,
}
}
return &hashes
}

// supported algorithm in cycloneDX as of 1.4
// "MD5", "SHA-1", "SHA-256", "SHA-384", "SHA-512",
// "SHA3-256", "SHA3-384", "SHA3-512", "BLAKE2b-256", "BLAKE2b-384", "BLAKE2b-512", "BLAKE3"
// syft supported digests: cmd/syft/cli/eventloop/tasks.go
// MD5, SHA1, SHA256
func toCycloneDXAlgorithm(algorithm string) cyclonedx.HashAlgorithm {
validMap := map[string]cyclonedx.HashAlgorithm{
"sha1": cyclonedx.HashAlgoSHA1,
"md5": cyclonedx.HashAlgoMD5,
"sha256": cyclonedx.HashAlgoSHA256,
}

return validMap[strings.ToLower(algorithm)]
}

func toOSComponent(distro *linux.Release) []cyclonedx.Component {
if distro == nil {
return []cyclonedx.Component{}
Expand Down
47 changes: 47 additions & 0 deletions syft/format/common/cyclonedxhelpers/to_format_model_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"github.com/stretchr/testify/require"

"github.com/anchore/syft/syft/artifact"
"github.com/anchore/syft/syft/file"
"github.com/anchore/syft/syft/format/internal/cyclonedxutil/helpers"
"github.com/anchore/syft/syft/linux"
"github.com/anchore/syft/syft/pkg"
Expand Down Expand Up @@ -143,6 +144,52 @@ func Test_relationships(t *testing.T) {
}
}

func Test_fileComponents(t *testing.T) {
tests := []struct {
name string
sbom sbom.SBOM
want []cyclonedx.Component
}{
{
name: "sbom coordinates with file metadata are serialized to cdx",
sbom: sbom.SBOM{
Artifacts: sbom.Artifacts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there were a couple more unit tests, since there are a few paths through the code this doesn't exercise:

  1. Missing metadata (entering the if statement at syft/format/common/cyclonedxhelpers/to_format_model.go:49
  2. Mix of packages and files
  3. Algorithm that CDX doesn't allow in a file digest
  4. No files in SBOM, only packages
  5. Weird files, like symlinks or sockets

FileMetadata: map[file.Coordinates]file.Metadata{
{RealPath: "/test"}: {Path: "/test"},
},
FileDigests: map[file.Coordinates][]file.Digest{
{RealPath: "/test"}: {
{
Algorithm: "sha256",
Value: "xyz12345",
},
},
},
},
},
want: []cyclonedx.Component{
{
BOMRef: "3f31cb2d98be6c1e",
Name: "/test",
Type: cyclonedx.ComponentTypeFile,
Hashes: &[]cyclonedx.Hash{
{Algorithm: "SHA-256", Value: "xyz12345"},
},
},
},
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
cdx := ToFormatModel(test.sbom)
got := *cdx.Components
if diff := cmp.Diff(test.want, got); diff != "" {
t.Errorf("cdx file components mismatch (-want +got):\n%s", diff)
}
})
}
}

func Test_toBomDescriptor(t *testing.T) {
type args struct {
name string
Expand Down
Loading