Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow output of Syft JSON format in multiple schema version #846

Open
dmikusa opened this issue Feb 24, 2022 · 11 comments
Open

Allow output of Syft JSON format in multiple schema version #846

dmikusa opened this issue Feb 24, 2022 · 11 comments
Labels
enhancement New feature or request I/O Describes bug or enhancement around application input or output json Regarding JSON output

Comments

@dmikusa
Copy link

dmikusa commented Feb 24, 2022

What would you like to be added:

Right now the Syft JSON format schema version is hard-coded (seems to be to the latest version). When you bump to a newer version of syft, it will start outputting the new format. It would be helpful if you could control the Syft JSON schema version used for output, like syft package --output json-v2. I think it would be sufficient to control it at the major version level.

Why is this needed:

Well, it's nice to keep on the latest version of the syft tool for bug fixes and scanner improvements but when the output format changes it can take time to adjust the tools that are consuming the output to read the new format.

I can understand if it's not possible to support all major versions for all of time, but supporting the most recent two or three (depending on how quickly the increment) would help provide time to plan and update tools consuming the output.

Additional context:

I'm not sure this would be something necessary when syft hits 1.0, as I'd assume that means the schema changes will be non-breaking, but in the meantime, it would help to have a feature like this so that it can ease the migrations between schemas.

I wouldn't be opposed to alternative solutions either, so this could perhaps be a question and not an enhancement. Like if for example, I could somehow recompile and have it use a different JSON schema version but still get other updates/fixes.

Thanks

@ryanmoran
Copy link
Contributor

We'd like to hear the project's stance on this idea. We've experienced some pretty painful ramifications of our own usage of Syft due to the fact that the output schema formats for any of the SBOM types (Syft, CycloneDX, SPDX, etc.) will just change, sometimes in patch releases. This has left us in a state where we either stop keeping up-to-date with the latest versions of Syft, or we do some pretty crazy gymnastics to make the format output stable.

Since the addition of #864, there is an interface that would allow external formats to be defined. It now appears that it would be possible for third-parties to define their own formats, including legacy formats for SBOM types already defined in Syft.

It would be great to understand a couple of things before anyone embarked on this kind of endeavor:

  1. Is the project opposed to the idea of supporting anything more than the latest schema version of each SBOM format?
  2. If we want this, should we be building our own library of formats to support it?

@luhring luhring added the I/O Describes bug or enhancement around application input or output label Mar 24, 2022
@luhring
Copy link
Contributor

luhring commented Mar 24, 2022

Hi @dmikusa-pivotal and @ryanmoran — this request makes sense. We've talked about supporting multiple major versions (latest of each) for several of the built-in formats, including Syft JSON. You've probably noticed that format versions have started appearing in format package names in the Syft library code.

Answering the two questions directly:

  1. Is the project opposed to the idea of supporting anything more than the latest schema version of each SBOM format?

In general, no, not opposed. But we'd want to really think about the right constraints to apply to the solution.

  1. If we want this, should we be building our own library of formats to support it?

You could. Part of the intent with exposing the format interface is to allow formats to be supplied by library consumers. Doing it yourself would obviously let you control the timeline of when the implementation was ready to use. And at the same time, I think it makes sense for the Syft library itself to consider implementing this support.

cc: @wagoodman a.k.a. "The Syft Formats Guru"

@ryanmoran
Copy link
Contributor

We'd be happy to contribute this support. Before we spent the time to do that, it'd be good to have a deeper discussion of the constraints around what the project would and would not want to support.

@luhring
Copy link
Contributor

luhring commented Mar 24, 2022

That sounds great! One option for discussion is that we have biweekly community meetings, where we chat through specific issues that need discussion. Are you available to join one of these? (info here: https://github.com/anchore/syft#join-our-community-meetings)

@fg-j
Copy link
Contributor

fg-j commented Mar 31, 2022

After the conversation in the Working Group meeting on 3/31/22, @kzantow and @spiffcs mentioned that it'd be useful to know what syft internal packages I copied to build working implementations of the sbom.Format interface for some legacy SBOM schema.

Note: My implementations only encode and can't decode, since that was expedient for the buildpacks use case. More internal packages may be needed for decoding.

I used the following internal files/packages:

It seems like some of these are more stable than others as the Syft JSON schema itself changes. It be useful if some of these internals were exposed as stable(ish) APIs so that users like Paketo can support SBOM formats beyond what y'all currently offer.

@spiffcs spiffcs added this to OSS Apr 27, 2022
@spiffcs spiffcs moved this to Triage in OSS Apr 27, 2022
@sophiewigmore
Copy link

sophiewigmore commented Jan 17, 2023

Hey y'all, it's been a while! 👋
Recently we updated from Syft v0.60.3 to v0.66.3. First of all, the move of the common internal packages to the syft/formats directory has been very helpful for us in reducing some duplication!

We still ran into some issues upgrading, due to the bump of SPDX 2.2 to 2.3, resulting in the need to copy over a few files in order to pin to the old upstream SPDX version.

At this point, we now support all of the "legacy" versions we need, so the issue of upgrading Syft in our code in the future should be much more straightforward. Nevertheless, I wanted to follow up and see if you've given any more thought to the idea of supporting multiple schema versions?

@luhring
Copy link
Contributor

luhring commented Jan 18, 2023

cc @kzantow

@kzantow
Copy link
Contributor

kzantow commented Jan 18, 2023

Hi @sophiewigmore. When you say:

due to the bump of SPDX 2.2 to 2.3, resulting in the need to copy over a few files in order to pin to the old upstream SPDX version

Are you only supporting SPDX 2.2 and you only want to output that version?

We recently contributed some changes to the spdx/tools-golang library, which supports multiple versions of SPDX and converting between them (we have yet to incorporate this into Syft but will soon!).

When you refer to multiple schema versions, are you saying that you would want to specify the SPDX version to output? or being able to read any version? or something else?

@sophiewigmore
Copy link

@kzantow Hey, we only support 2.2 at the moment, but ideally we will support 2.3 soon as well. I hadn't seen those contributions you mentioned, so I can check them out, it seems like a good option for us! Thanks.

We want to be able to specify the version to output ideally. Essentially, we're after what was laid out in the original issue here for Syft JSON, SPDX, and CycloneDX:

It would be helpful if you could control the Syft JSON schema version used for output, like syft package --output json-v2. I think it would be sufficient to control it at the major version level.

We mostly wanted to check in and see if there's been any thought one way or another on this idea since our preliminary discussions on this topic

@kzantow
Copy link
Contributor

kzantow commented Jan 18, 2023

@sophiewigmore yes, we definitely have thoughts on the topic! We're currently working towards Syft 1.0, at which time we plan to make the syft format more stable including support input/output of major versions (with some limited timeframe, presumably). Additionally, as noted for SPDX we plan on supporting output of specific versions once we are able to incorporate the aforementioned changes from the SPDX library we're using.

@wagoodman wagoodman moved this from Awaiting Response to Backlog in OSS Jan 24, 2024
@wagoodman wagoodman added the json Regarding JSON output label Jan 24, 2024
@wagoodman
Copy link
Contributor

We've since supported being able to specify version for spdx and cyclonedx:

syft --output [email protected]

I think if we were to do this (and I think we probably should) we should lean into that same approach (e.g. syft --output [email protected]).

arjun024 added a commit to paketo-buildpacks/packit that referenced this issue Sep 14, 2024
Packit currently supports SBOM generation with syft tooling by utilizing
syft's go library. This has caused packit maintainers significant
maintainence burden. This commit adds a mechanism for buildpack authors
to utlize the syft CLI instead to generate SBOM. The intention here is
that with widespread adoption of this, we can phase out the codebase
that uses the syft go libary and thereby relieve the maintainers of this
pain.

Until recently, syft did not allow consumers to specify the exact schema
version of an SBOM mediatype they want generated (the tooling currently
supports passing a version for CycloneDX and SPDX -
github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit
was forced to vendor-in (copy) large chunks of upstream syft go code
into packit in order to pin SBOM mediatype versions to versions that
most consumers wanted to use. Everytime a new version of Syft comes out,
maintainers had to painfully update the vendored-in code to work with
upstream syft components (e.g.
github.com//pull/491).

Furthermore, it is advantageous to use the syft CLI instead of syft go
library for multiple reasons. With CLI, we can delegate the entire SBOM
generation mechanism easily to syft. The CLI tool is well documented and
widely used in the community, and it seems like the syft project is
developed with with a CLI-first approach. The caveat here is that
buildpack authors who use this method should include the Paketo Syft
buildpack in their buildplan to have access to the CLI during the build
phase.

Example usage:

requirements = append(requirements, packit.BuildPlanRequirement{
                Name: "syft",
                Metadata: map[string]interface{}{
                        "build": true,
                },
})

syftCLIScanner := sbom.NewSyftCLIScanner(
		pexec.NewExecutable("syft"),
		scribe.NewEmitter(os.Stdout),
)

_ = syftCLIScanner.GenerateSBOM(myLayer.Path,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

_ = syftCLIScanner.GenerateSBOM(context.WorkingDir,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

- I have not implemented pretty-fication of SBOM that the codepath that
  use syft go lib implements. This seems to be adding bloat to the app
  image and not supported via CLI. Consumers of SBOM can easily prettify
  the SBOM JSONs.
- In the codepath that use the syft go lib, license information is
  manually injected from buildpack.toml data into the SBOM. This is not
  available with the SyftCLIScanner. I couldn't find any reasoning for
  why this was done in the first place.
- I have intentionally not reused code in methods that's mixed up with
  the syft go library with an intention to easily phase out that
  codebase in the near future.
arjun024 added a commit to paketo-buildpacks/packit that referenced this issue Sep 14, 2024
Packit currently supports SBOM generation with syft tooling by utilizing
syft's go library. This has caused packit maintainers significant
maintainence burden. This commit adds a mechanism for buildpack authors
to utlize the syft CLI instead to generate SBOM. The intention here is
that with widespread adoption of this, we can phase out the codebase
that uses the syft go libary and thereby relieve the maintainers of this
pain.

Until recently, syft did not allow consumers to specify the exact schema
version of an SBOM mediatype they want generated (the tooling currently
supports passing a version for CycloneDX and SPDX -
github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit
was forced to vendor-in (copy) large chunks of upstream syft go code
into packit in order to pin SBOM mediatype versions to versions that
most consumers wanted to use. Everytime a new version of Syft comes out,
maintainers had to painfully update the vendored-in code to work with
upstream syft components (e.g.
github.com//pull/491).

Furthermore, it is advantageous to use the syft CLI instead of syft go
library for multiple reasons. With CLI, we can delegate the entire SBOM
generation mechanism easily to syft. The CLI tool is well documented and
widely used in the community, and it seems like the syft project is
developed with with a CLI-first approach. The caveat here is that
buildpack authors who use this method should include the Paketo Syft
buildpack in their buildplan to have access to the CLI during the build
phase.

Example usage:

\# detect
\# unless BP_DISABLE_BOM is true
requirements = append(requirements, packit.BuildPlanRequirement{
                Name: "syft",
                Metadata: map[string]interface{}{
                        "build": true,
                },
})

\# build
syftCLIScanner := sbom.NewSyftCLIScanner(
		pexec.NewExecutable("syft"),
		scribe.NewEmitter(os.Stdout),
)

\# To scan a layer after installing a dependency
_ = syftCLIScanner.GenerateSBOM(myLayer.Path,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

\# OR to scan the workspace dir after running a process
_ = syftCLIScanner.GenerateSBOM(context.WorkingDir,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

- I have not implemented pretty-fication of SBOM that the codepath that
  use syft go lib implements. This seems to be adding bloat to the app
  image and not supported via CLI. Consumers of SBOM can easily prettify
  the SBOM JSONs.
- In the codepath that use the syft go lib, license information is
  manually injected from buildpack.toml data into the SBOM. This is not
  available with the SyftCLIScanner. I couldn't find any reasoning for
  why this was done in the first place.
- I have intentionally not reused code in methods that's mixed up with
  the syft go library with an intention to easily phase out that
  codebase in the near future.
arjun024 added a commit to paketo-buildpacks/packit that referenced this issue Sep 14, 2024
Packit currently supports SBOM generation with syft tooling by utilizing
syft's go library. This has caused packit maintainers significant
maintainence burden. This commit adds a mechanism for buildpack authors
to utlize the syft CLI instead to generate SBOM. The intention here is
that with widespread adoption of this, we can phase out the codebase
that uses the syft go libary and thereby relieve the maintainers of this
pain.

Until recently, syft did not allow consumers to specify the exact schema
version of an SBOM mediatype they want generated (the tooling currently
supports passing a version for CycloneDX and SPDX -
github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit
was forced to vendor-in (copy) large chunks of upstream syft go code
into packit in order to pin SBOM mediatype versions to versions that
most consumers wanted to use. Everytime a new version of Syft comes out,
maintainers had to painfully update the vendored-in code to work with
upstream syft components (e.g.
github.com//pull/491).

Furthermore, it is advantageous to use the syft CLI instead of syft go
library for multiple reasons. With CLI, we can delegate the entire SBOM
generation mechanism easily to syft. The CLI tool is well documented and
widely used in the community, and it seems like the syft project is
developed with with a CLI-first approach. The caveat here is that
buildpack authors who use this method should include the Paketo Syft
buildpack in their buildplan to have access to the CLI during the build
phase.

Example usage:

\# detect
\# unless BP_DISABLE_BOM is true
requirements = append(requirements, packit.BuildPlanRequirement{
                Name: "syft",
                Metadata: map[string]interface{}{
                        "build": true,
                },
})

\# build
syftCLIScanner := sbom.NewSyftCLIScanner(
		pexec.NewExecutable("syft"),
		scribe.NewEmitter(os.Stdout),
)

\# To scan a layer after installing a dependency
_ = syftCLIScanner.GenerateSBOM(myLayer.Path,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

\# OR to scan the workspace dir after running a process
_ = syftCLIScanner.GenerateSBOM(context.WorkingDir,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

- I have not implemented pretty-fication of SBOM that the codepath that
  use syft go lib implements. This seems to be adding bloat to the app
  image and not supported via CLI. Consumers of SBOM can easily prettify
  the SBOM JSONs.
- In the codepath that use the syft go lib, license information is
  manually injected from buildpack.toml data into the SBOM. This is not
  available with the SyftCLIScanner. I couldn't find any reasoning for
  why this was done in the first place.
- I have intentionally not reused some code in methods that's mixed up
  with the syft go library with an intention to easily phase out that
  codebase in the near future.
arjun024 added a commit to paketo-buildpacks/packit that referenced this issue Sep 14, 2024
Packit currently supports SBOM generation with syft tooling by utilizing
syft's go library. This has caused packit maintainers significant
maintainence burden. This commit adds a mechanism for buildpack authors
to utlize the syft CLI instead to generate SBOM. The intention here is
that with widespread adoption of this, we can phase out the codebase
that uses the syft go libary and thereby relieve the maintainers of this
pain.

Until recently, syft did not allow consumers to specify the exact schema
version of an SBOM mediatype they want generated (the tooling currently
supports passing a version for CycloneDX and SPDX -
github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit
was forced to vendor-in (copy) large chunks of upstream syft go code
into packit in order to pin SBOM mediatype versions to versions that
most consumers wanted to use. Everytime a new version of Syft comes out,
maintainers had to painfully update the vendored-in code to work with
upstream syft components (e.g.
github.com//pull/491).

Furthermore, it is advantageous to use the syft CLI instead of syft go
library for multiple reasons. With CLI, we can delegate the entire SBOM
generation mechanism easily to syft. It should help buildpacks avoid any
CVEs that are exposed to it via syft go libaries. The CLI tool is well
documented and widely used in the community, and it seems like the syft
project is developed with with a CLI-first approach. The caveat here is
that buildpack authors who use this method should include the Paketo
Syft buildpack in their buildplan to have access to the CLI during the
build phase.

Example usage:

\# detect
\# unless BP_DISABLE_BOM is true
requirements = append(requirements, packit.BuildPlanRequirement{
                Name: "syft",
                Metadata: map[string]interface{}{
                        "build": true,
                },
})

\# build
syftCLIScanner := sbom.NewSyftCLIScanner(
		pexec.NewExecutable("syft"),
		scribe.NewEmitter(os.Stdout),
)

\# To scan a layer after installing a dependency
_ = syftCLIScanner.GenerateSBOM(myLayer.Path,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

\# OR to scan the workspace dir after running a process
_ = syftCLIScanner.GenerateSBOM(context.WorkingDir,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

- I have not implemented pretty-fication of SBOM that the codepath that
  use syft go lib implements. This seems to be adding bloat to the app
  image and not supported via CLI. Consumers of SBOM can easily prettify
  the SBOM JSONs.
- In the codepath that use the syft go lib, license information is
  manually injected from buildpack.toml data into the SBOM. This is not
  available with the SyftCLIScanner. I couldn't find any reasoning for
  why this was done in the first place.
- I have intentionally not reused some code in methods that's mixed up
  with the syft go library with an intention to easily phase out that
  codebase in the near future.
arjun024 added a commit to paketo-buildpacks/packit that referenced this issue Sep 14, 2024
Packit currently supports SBOM generation with syft tooling by utilizing
syft's go library. This has caused packit maintainers significant
maintainence burden. This commit adds a mechanism for buildpack authors
to utlize the syft CLI instead to generate SBOM. The intention here is
that with widespread adoption of this, we can phase out the codebase
that uses the syft go libary and thereby relieve the maintainers of this
pain.

Until recently, syft did not allow consumers to specify the exact schema
version of an SBOM mediatype they want generated (the tooling currently
supports passing a version for CycloneDX and SPDX -
github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit
was forced to vendor-in (copy) large chunks of upstream syft go code
into packit in order to pin SBOM mediatype versions to versions that
most consumers wanted to use. Everytime a new version of Syft comes out,
maintainers had to painfully update the vendored-in code to work with
upstream syft components (e.g.
github.com//pull/491).

Furthermore, it is advantageous to use the syft CLI instead of syft go
library for multiple reasons. With CLI, we can delegate the entire SBOM
generation mechanism easily to syft. It should help buildpacks avoid any
CVEs that are exposed to it via syft go libaries. The CLI tool is well
documented and widely used in the community, and it seems like the syft
project is developed with with a CLI-first approach. The caveat here is
that buildpack authors who use this method should include the Paketo
Syft buildpack in their buildplan to have access to the CLI during the
build phase.

Example usage:

\# detect
\# unless BP_DISABLE_BOM is true
requirements = append(requirements, packit.BuildPlanRequirement{
                Name: "syft",
                Metadata: map[string]interface{}{
                        "build": true,
                },
})

\# build
syftCLIScanner := sbomgen.NewSyftCLIScanner(
		pexec.NewExecutable("syft"),
		scribe.NewEmitter(os.Stdout),
)

\# To scan a layer after installing a dependency
_ = syftCLIScanner.GenerateSBOM(myLayer.Path,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

\# OR to scan the workspace dir after running a process
_ = syftCLIScanner.GenerateSBOM(context.WorkingDir,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

- A new package sbomgen is created instead of adding the functionality
  to the existing sbom package because it helps buildpacks remove pinned
  "anchore/syft" lib from their go.mod which were flagged down by CVE
  scanners.
- I have not implemented pretty-fication of SBOM that the codepath that
  use syft go lib implements. This seems to be adding bloat to the app
  image and not supported via CLI. Consumers of SBOM can easily prettify
  the SBOM JSONs.
- In the codepath that use the syft go lib, license information is
  manually injected from buildpack.toml data into the SBOM. This is not
  available with the SyftCLIScanner. I couldn't find any reasoning for
  why this was done in the first place.
- I have intentionally not reused some code in methods that's mixed up
  with the syft go library with an intention to easily phase out that
  codebase in the near future.
arjun024 added a commit to paketo-buildpacks/packit that referenced this issue Sep 17, 2024
Packit currently supports SBOM generation with syft tooling by utilizing
syft's go library. This has caused packit maintainers significant
maintainence burden. This commit adds a mechanism for buildpack authors
to utlize the syft CLI instead to generate SBOM. The intention here is
that with widespread adoption of this, we can phase out the codebase
that uses the syft go libary and thereby relieve the maintainers of this
pain.

Until recently, syft did not allow consumers to specify the exact schema
version of an SBOM mediatype they want generated (the tooling currently
supports passing a version for CycloneDX and SPDX -
github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit
was forced to vendor-in (copy) large chunks of upstream syft go code
into packit in order to pin SBOM mediatype versions to versions that
most consumers wanted to use. Everytime a new version of Syft comes out,
maintainers had to painfully update the vendored-in code to work with
upstream syft components (e.g.
github.com//pull/491).

Furthermore, it is advantageous to use the syft CLI instead of syft go
library for multiple reasons. With CLI, we can delegate the entire SBOM
generation mechanism easily to syft. It should help buildpacks avoid any
CVEs that are exposed to it via syft go libaries. The CLI tool is well
documented and widely used in the community, and it seems like the syft
project is developed with with a CLI-first approach. The caveat here is
that buildpack authors who use this method should include the Paketo
Syft buildpack in their buildplan to have access to the CLI during the
build phase.

Example usage:

\# detect
\# unless BP_DISABLE_BOM is true
requirements = append(requirements, packit.BuildPlanRequirement{
                Name: "syft",
                Metadata: map[string]interface{}{
                        "build": true,
                },
})

\# build
syftCLIScanner := sbomgen.NewSyftCLIScanner(
		pexec.NewExecutable("syft"),
		scribe.NewEmitter(os.Stdout),
)

\# To scan a layer after installing a dependency
_ = syftCLIScanner.GenerateSBOM(myLayer.Path,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

\# OR to scan the workspace dir after running a process
_ = syftCLIScanner.GenerateSBOM(context.WorkingDir,
	context.Layers.Path,
	myLayer.Name,
	context.BuildpackInfo.SBOMFormats...,
)

- A new package sbomgen is created instead of adding the functionality
  to the existing sbom package because it helps buildpacks remove pinned
  "anchore/syft" lib from their go.mod which were flagged down by CVE
  scanners.
- I have not implemented pretty-fication of SBOM that the codepath that
  use syft go lib implements. This seems to be adding bloat to the app
  image and not supported via CLI. Consumers of SBOM can easily prettify
  the SBOM JSONs.
- In the codepath that use the syft go lib, license information is
  manually injected from buildpack.toml data into the SBOM. This is not
  available with the SyftCLIScanner. I couldn't find any reasoning for
  why this was done in the first place.
- I have intentionally not reused some code in methods that's mixed up
  with the syft go library with an intention to easily phase out that
  codebase in the near future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request I/O Describes bug or enhancement around application input or output json Regarding JSON output
Projects
Status: Backlog
Development

No branches or pull requests

7 participants