Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to SBOM libraries #2096

Merged
merged 12 commits into from
Jun 14, 2021
Merged

Fixes to SBOM libraries #2096

merged 12 commits into from
Jun 14, 2021

Conversation

puerco
Copy link
Member

@puerco puerco commented May 29, 2021

What type of PR is this?

/kind feature
/kind design
/kind bug

What this PR does / why we need it:

This PR compiles several fixes and features that need to be addressed to the sbom libraries after the first SBOM test after alpha.3. Namely, this PR has the following changes:

  • Adds to bom the capability to configure the sbom components from a YAML file *more below)
  • Allows more options to be passed to the SPDX document builder
  • File analysis is now done in parallel speeding the kubernetes bom generation significally
  • When generating a SPDX package from a directory, file paths will now be relative to the dir root
  • Golang packages that have local replacements will be honored saving a considerable amount of downloads
  • Fixed a bug where we would erase the local golang package install
  • Fixed a bug where license data would be saved in the download cache directory, resulting in the license classifier having a lower accuracy
  • Golang packages will now include all license text in the SBOM as well as the SPDX license identifier
  • New function license.ReadTopLicense() will scan and return only the most significant license in a directory, potentially avoiding thousands of operations in the classifier code.

SBOM artifacts YAML definition file

Defining a bill of materials that has multiple packages, from different sources (files, images, directories, etc), can be cumbersome to do from the command line. This PR adds initial support to define a BOM in a declarative way from a yaml file. In this iteration, bom has a new flag -c --config which points to the yaml file. In future versions and use cases (for example when run from CI or GitHub actions), we could default bom to look for a .sbom.yaml file in a repository.

A sample of a yaml file looks like this:

---
namespace: http://www.example.com/
license: Apache-2.0
name: bom-test
creator:
    person: Kubernetes Release Managers ([email protected])
artifacts:
    - type: directory
      source: .
      license: Apache-2.0
      gomodules: true
    - type: file
      source: ./SECURITY.md
    - type: image
      source: k8s.gcr.io/kube-apiserver:v1.22.0-alpha.2
    - type: docker-archive
      source: tmp/sample-images/kube-apiserver.tar

To run bom with that file:

bom -c sbom.yaml

This configuration would render a SBOM with one loose file (SECURITY.md) and three packages: a directory, an image and a docker archive tarball.

Which issue(s) this PR fixes:

Part of #2085
Part of #1837

Special notes for your reviewer:

The format of the config file will be evolving as we start using it and finding new needs

Requires #2085 to merge and a rebase (only the last three commits are relevant)

Does this PR introduce a user-facing change?

* Allows more options to be passed to the SPDX document builder
* File analysis is now done in parallel speeding the kubernetes bom generation significally
* When generating a SPDX package from a directory, file paths will now be relative to the dir root
* Golang packages that have local replacements will be honored saving a considerable amount of downloads
* Fixed a bug where we would erase the local golang package install
* Fixed a bug where license data would be saved in the download cache directory, resulting in the license classifier having a lower accuracy
* Golang packages will now include all license text in the SBOM as well as the SPDX license identifier
* New function `license.ReadTopLicense()` will scan and return only the most significant license in a directory, potentially avoiding thousands of operations in the classifier code.

/milestone v1.22

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. kind/design Categorizes issue or PR as related to design. labels May 29, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 29, 2021
@k8s-ci-robot k8s-ci-robot added needs-priority cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 29, 2021
@k8s-ci-robot k8s-ci-robot requested review from hasheddan and xmudrii May 29, 2021 22:22
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 29, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 31, 2021
puerco added 10 commits June 12, 2021 14:10
This commit adds support to define the main license of
the document, its name, creator data in the options.
These fields can be set from the YAML configuration file.
This is the forst commit of the unit test file for the
spdx doc builder. There is =nly one test for now:
TestYAMLParse which check the yaml configuration file.
This commit adds a mutex to the spdx package and parallelizes
the read operations that scan the filsystem when processing
a directory.

The improvement generating the kubernetes SBOM:

real	2m36.691s
user	12m24.276s
sys	0m13.644s

real	4m51.323s
user	12m30.231s
sys	0m13.937s

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
When generating a package from a directory, we now relativize
all file paths to the initial directory. Previously, all paths
would reflect the command invoked.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
When checking dependencies, use the local cached replacements
of packages if they are available. This will avoid downloading
all of the kubernetes packages when generating the k8s SBOM.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
When cleaning up Go modules, we now only erased temporary
directories created by our run to avoid deleting files from
the local GOPATH.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
There was a bug in the license catalog logic where license data
was written incorrectly in the downloads cache. This commit fixes
it so that license data is now stored separate of the downloads
directory.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
To satisfy cncf requirements, we now return the full text of
scanned licenses to include them in the SBOM. As we are now
handling more data, the datastructure of the license list is
now a slice of pointers.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
puerco added 2 commits June 13, 2021 01:31
This commit adds a new function to the license package: ReadTopLicense()

This function tries to determine the most significant license file
in a directory by first trying some common names at the top of the
tree and then working its way down looking for the topmost one.

The use of this function saves thousands of license scans and reduces
memory footprint when genearting the kubernetes sbom.

Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
@puerco puerco changed the title Support defining complex SBOM configurations from a yaml file Fixes to SBOM libraries Jun 14, 2021
@puerco
Copy link
Member Author

puerco commented Jun 14, 2021

Since this PR was on hold, I've pushed the next set of the SBOM fixes and features that surfaced after v1.22.0-alpha.3 in individual commits, each one explains in its message what it does. This PR is now ready for review.

I have tested these changes in a stage run here:
https://console.cloud.google.com/cloud-build/builds;region=global/ae1c6300-95f7-4ee9-bbc8-72d5a063a71a?folder=&organizationId=&project=kubernetes-release-test

@puerco
Copy link
Member Author

puerco commented Jun 14, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 14, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 14, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: puerco, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [puerco,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 6c67d4f into kubernetes:master Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/release Categorizes an issue or PR as relevant to SIG Release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants