-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow output of Syft JSON format in multiple schema version #846
Comments
We'd like to hear the project's stance on this idea. We've experienced some pretty painful ramifications of our own usage of Syft due to the fact that the output schema formats for any of the SBOM types (Syft, CycloneDX, SPDX, etc.) will just change, sometimes in patch releases. This has left us in a state where we either stop keeping up-to-date with the latest versions of Syft, or we do some pretty crazy gymnastics to make the format output stable. Since the addition of #864, there is an interface that would allow external formats to be defined. It now appears that it would be possible for third-parties to define their own formats, including legacy formats for SBOM types already defined in Syft. It would be great to understand a couple of things before anyone embarked on this kind of endeavor:
|
Hi @dmikusa-pivotal and @ryanmoran — this request makes sense. We've talked about supporting multiple major versions (latest of each) for several of the built-in formats, including Syft JSON. You've probably noticed that format versions have started appearing in format package names in the Syft library code. Answering the two questions directly:
In general, no, not opposed. But we'd want to really think about the right constraints to apply to the solution.
You could. Part of the intent with exposing the format interface is to allow formats to be supplied by library consumers. Doing it yourself would obviously let you control the timeline of when the implementation was ready to use. And at the same time, I think it makes sense for the Syft library itself to consider implementing this support. cc: @wagoodman a.k.a. "The Syft Formats Guru" |
We'd be happy to contribute this support. Before we spent the time to do that, it'd be good to have a deeper discussion of the constraints around what the project would and would not want to support. |
That sounds great! One option for discussion is that we have biweekly community meetings, where we chat through specific issues that need discussion. Are you available to join one of these? (info here: https://github.com/anchore/syft#join-our-community-meetings) |
After the conversation in the Working Group meeting on 3/31/22, @kzantow and @spiffcs mentioned that it'd be useful to know what syft internal packages I copied to build working implementations of the Note: My implementations only encode and can't decode, since that was expedient for the buildpacks use case. More internal packages may be needed for decoding. I used the following internal files/packages:
It seems like some of these are more stable than others as the Syft JSON schema itself changes. It be useful if some of these internals were exposed as stable(ish) APIs so that users like Paketo can support SBOM formats beyond what y'all currently offer. |
Hey y'all, it's been a while! 👋 We still ran into some issues upgrading, due to the bump of SPDX 2.2 to 2.3, resulting in the need to copy over a few files in order to pin to the old upstream SPDX version. At this point, we now support all of the "legacy" versions we need, so the issue of upgrading Syft in our code in the future should be much more straightforward. Nevertheless, I wanted to follow up and see if you've given any more thought to the idea of supporting multiple schema versions? |
cc @kzantow |
Hi @sophiewigmore. When you say:
Are you only supporting SPDX 2.2 and you only want to output that version? We recently contributed some changes to the spdx/tools-golang library, which supports multiple versions of SPDX and converting between them (we have yet to incorporate this into Syft but will soon!). When you refer to multiple schema versions, are you saying that you would want to specify the SPDX version to output? or being able to read any version? or something else? |
@kzantow Hey, we only support 2.2 at the moment, but ideally we will support 2.3 soon as well. I hadn't seen those contributions you mentioned, so I can check them out, it seems like a good option for us! Thanks. We want to be able to specify the version to output ideally. Essentially, we're after what was laid out in the original issue here for Syft JSON, SPDX, and CycloneDX:
We mostly wanted to check in and see if there's been any thought one way or another on this idea since our preliminary discussions on this topic |
@sophiewigmore yes, we definitely have thoughts on the topic! We're currently working towards Syft 1.0, at which time we plan to make the syft format more stable including support input/output of major versions (with some limited timeframe, presumably). Additionally, as noted for SPDX we plan on supporting output of specific versions once we are able to incorporate the aforementioned changes from the SPDX library we're using. |
We've since supported being able to specify version for spdx and cyclonedx:
I think if we were to do this (and I think we probably should) we should lean into that same approach (e.g. |
Packit currently supports SBOM generation with syft tooling by utilizing syft's go library. This has caused packit maintainers significant maintainence burden. This commit adds a mechanism for buildpack authors to utlize the syft CLI instead to generate SBOM. The intention here is that with widespread adoption of this, we can phase out the codebase that uses the syft go libary and thereby relieve the maintainers of this pain. Until recently, syft did not allow consumers to specify the exact schema version of an SBOM mediatype they want generated (the tooling currently supports passing a version for CycloneDX and SPDX - github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit was forced to vendor-in (copy) large chunks of upstream syft go code into packit in order to pin SBOM mediatype versions to versions that most consumers wanted to use. Everytime a new version of Syft comes out, maintainers had to painfully update the vendored-in code to work with upstream syft components (e.g. github.com//pull/491). Furthermore, it is advantageous to use the syft CLI instead of syft go library for multiple reasons. With CLI, we can delegate the entire SBOM generation mechanism easily to syft. The CLI tool is well documented and widely used in the community, and it seems like the syft project is developed with with a CLI-first approach. The caveat here is that buildpack authors who use this method should include the Paketo Syft buildpack in their buildplan to have access to the CLI during the build phase. Example usage: requirements = append(requirements, packit.BuildPlanRequirement{ Name: "syft", Metadata: map[string]interface{}{ "build": true, }, }) syftCLIScanner := sbom.NewSyftCLIScanner( pexec.NewExecutable("syft"), scribe.NewEmitter(os.Stdout), ) _ = syftCLIScanner.GenerateSBOM(myLayer.Path, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) _ = syftCLIScanner.GenerateSBOM(context.WorkingDir, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) - I have not implemented pretty-fication of SBOM that the codepath that use syft go lib implements. This seems to be adding bloat to the app image and not supported via CLI. Consumers of SBOM can easily prettify the SBOM JSONs. - In the codepath that use the syft go lib, license information is manually injected from buildpack.toml data into the SBOM. This is not available with the SyftCLIScanner. I couldn't find any reasoning for why this was done in the first place. - I have intentionally not reused code in methods that's mixed up with the syft go library with an intention to easily phase out that codebase in the near future.
Packit currently supports SBOM generation with syft tooling by utilizing syft's go library. This has caused packit maintainers significant maintainence burden. This commit adds a mechanism for buildpack authors to utlize the syft CLI instead to generate SBOM. The intention here is that with widespread adoption of this, we can phase out the codebase that uses the syft go libary and thereby relieve the maintainers of this pain. Until recently, syft did not allow consumers to specify the exact schema version of an SBOM mediatype they want generated (the tooling currently supports passing a version for CycloneDX and SPDX - github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit was forced to vendor-in (copy) large chunks of upstream syft go code into packit in order to pin SBOM mediatype versions to versions that most consumers wanted to use. Everytime a new version of Syft comes out, maintainers had to painfully update the vendored-in code to work with upstream syft components (e.g. github.com//pull/491). Furthermore, it is advantageous to use the syft CLI instead of syft go library for multiple reasons. With CLI, we can delegate the entire SBOM generation mechanism easily to syft. The CLI tool is well documented and widely used in the community, and it seems like the syft project is developed with with a CLI-first approach. The caveat here is that buildpack authors who use this method should include the Paketo Syft buildpack in their buildplan to have access to the CLI during the build phase. Example usage: \# detect \# unless BP_DISABLE_BOM is true requirements = append(requirements, packit.BuildPlanRequirement{ Name: "syft", Metadata: map[string]interface{}{ "build": true, }, }) \# build syftCLIScanner := sbom.NewSyftCLIScanner( pexec.NewExecutable("syft"), scribe.NewEmitter(os.Stdout), ) \# To scan a layer after installing a dependency _ = syftCLIScanner.GenerateSBOM(myLayer.Path, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) \# OR to scan the workspace dir after running a process _ = syftCLIScanner.GenerateSBOM(context.WorkingDir, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) - I have not implemented pretty-fication of SBOM that the codepath that use syft go lib implements. This seems to be adding bloat to the app image and not supported via CLI. Consumers of SBOM can easily prettify the SBOM JSONs. - In the codepath that use the syft go lib, license information is manually injected from buildpack.toml data into the SBOM. This is not available with the SyftCLIScanner. I couldn't find any reasoning for why this was done in the first place. - I have intentionally not reused code in methods that's mixed up with the syft go library with an intention to easily phase out that codebase in the near future.
Packit currently supports SBOM generation with syft tooling by utilizing syft's go library. This has caused packit maintainers significant maintainence burden. This commit adds a mechanism for buildpack authors to utlize the syft CLI instead to generate SBOM. The intention here is that with widespread adoption of this, we can phase out the codebase that uses the syft go libary and thereby relieve the maintainers of this pain. Until recently, syft did not allow consumers to specify the exact schema version of an SBOM mediatype they want generated (the tooling currently supports passing a version for CycloneDX and SPDX - github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit was forced to vendor-in (copy) large chunks of upstream syft go code into packit in order to pin SBOM mediatype versions to versions that most consumers wanted to use. Everytime a new version of Syft comes out, maintainers had to painfully update the vendored-in code to work with upstream syft components (e.g. github.com//pull/491). Furthermore, it is advantageous to use the syft CLI instead of syft go library for multiple reasons. With CLI, we can delegate the entire SBOM generation mechanism easily to syft. The CLI tool is well documented and widely used in the community, and it seems like the syft project is developed with with a CLI-first approach. The caveat here is that buildpack authors who use this method should include the Paketo Syft buildpack in their buildplan to have access to the CLI during the build phase. Example usage: \# detect \# unless BP_DISABLE_BOM is true requirements = append(requirements, packit.BuildPlanRequirement{ Name: "syft", Metadata: map[string]interface{}{ "build": true, }, }) \# build syftCLIScanner := sbom.NewSyftCLIScanner( pexec.NewExecutable("syft"), scribe.NewEmitter(os.Stdout), ) \# To scan a layer after installing a dependency _ = syftCLIScanner.GenerateSBOM(myLayer.Path, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) \# OR to scan the workspace dir after running a process _ = syftCLIScanner.GenerateSBOM(context.WorkingDir, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) - I have not implemented pretty-fication of SBOM that the codepath that use syft go lib implements. This seems to be adding bloat to the app image and not supported via CLI. Consumers of SBOM can easily prettify the SBOM JSONs. - In the codepath that use the syft go lib, license information is manually injected from buildpack.toml data into the SBOM. This is not available with the SyftCLIScanner. I couldn't find any reasoning for why this was done in the first place. - I have intentionally not reused some code in methods that's mixed up with the syft go library with an intention to easily phase out that codebase in the near future.
Packit currently supports SBOM generation with syft tooling by utilizing syft's go library. This has caused packit maintainers significant maintainence burden. This commit adds a mechanism for buildpack authors to utlize the syft CLI instead to generate SBOM. The intention here is that with widespread adoption of this, we can phase out the codebase that uses the syft go libary and thereby relieve the maintainers of this pain. Until recently, syft did not allow consumers to specify the exact schema version of an SBOM mediatype they want generated (the tooling currently supports passing a version for CycloneDX and SPDX - github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit was forced to vendor-in (copy) large chunks of upstream syft go code into packit in order to pin SBOM mediatype versions to versions that most consumers wanted to use. Everytime a new version of Syft comes out, maintainers had to painfully update the vendored-in code to work with upstream syft components (e.g. github.com//pull/491). Furthermore, it is advantageous to use the syft CLI instead of syft go library for multiple reasons. With CLI, we can delegate the entire SBOM generation mechanism easily to syft. It should help buildpacks avoid any CVEs that are exposed to it via syft go libaries. The CLI tool is well documented and widely used in the community, and it seems like the syft project is developed with with a CLI-first approach. The caveat here is that buildpack authors who use this method should include the Paketo Syft buildpack in their buildplan to have access to the CLI during the build phase. Example usage: \# detect \# unless BP_DISABLE_BOM is true requirements = append(requirements, packit.BuildPlanRequirement{ Name: "syft", Metadata: map[string]interface{}{ "build": true, }, }) \# build syftCLIScanner := sbom.NewSyftCLIScanner( pexec.NewExecutable("syft"), scribe.NewEmitter(os.Stdout), ) \# To scan a layer after installing a dependency _ = syftCLIScanner.GenerateSBOM(myLayer.Path, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) \# OR to scan the workspace dir after running a process _ = syftCLIScanner.GenerateSBOM(context.WorkingDir, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) - I have not implemented pretty-fication of SBOM that the codepath that use syft go lib implements. This seems to be adding bloat to the app image and not supported via CLI. Consumers of SBOM can easily prettify the SBOM JSONs. - In the codepath that use the syft go lib, license information is manually injected from buildpack.toml data into the SBOM. This is not available with the SyftCLIScanner. I couldn't find any reasoning for why this was done in the first place. - I have intentionally not reused some code in methods that's mixed up with the syft go library with an intention to easily phase out that codebase in the near future.
Packit currently supports SBOM generation with syft tooling by utilizing syft's go library. This has caused packit maintainers significant maintainence burden. This commit adds a mechanism for buildpack authors to utlize the syft CLI instead to generate SBOM. The intention here is that with widespread adoption of this, we can phase out the codebase that uses the syft go libary and thereby relieve the maintainers of this pain. Until recently, syft did not allow consumers to specify the exact schema version of an SBOM mediatype they want generated (the tooling currently supports passing a version for CycloneDX and SPDX - github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit was forced to vendor-in (copy) large chunks of upstream syft go code into packit in order to pin SBOM mediatype versions to versions that most consumers wanted to use. Everytime a new version of Syft comes out, maintainers had to painfully update the vendored-in code to work with upstream syft components (e.g. github.com//pull/491). Furthermore, it is advantageous to use the syft CLI instead of syft go library for multiple reasons. With CLI, we can delegate the entire SBOM generation mechanism easily to syft. It should help buildpacks avoid any CVEs that are exposed to it via syft go libaries. The CLI tool is well documented and widely used in the community, and it seems like the syft project is developed with with a CLI-first approach. The caveat here is that buildpack authors who use this method should include the Paketo Syft buildpack in their buildplan to have access to the CLI during the build phase. Example usage: \# detect \# unless BP_DISABLE_BOM is true requirements = append(requirements, packit.BuildPlanRequirement{ Name: "syft", Metadata: map[string]interface{}{ "build": true, }, }) \# build syftCLIScanner := sbomgen.NewSyftCLIScanner( pexec.NewExecutable("syft"), scribe.NewEmitter(os.Stdout), ) \# To scan a layer after installing a dependency _ = syftCLIScanner.GenerateSBOM(myLayer.Path, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) \# OR to scan the workspace dir after running a process _ = syftCLIScanner.GenerateSBOM(context.WorkingDir, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) - A new package sbomgen is created instead of adding the functionality to the existing sbom package because it helps buildpacks remove pinned "anchore/syft" lib from their go.mod which were flagged down by CVE scanners. - I have not implemented pretty-fication of SBOM that the codepath that use syft go lib implements. This seems to be adding bloat to the app image and not supported via CLI. Consumers of SBOM can easily prettify the SBOM JSONs. - In the codepath that use the syft go lib, license information is manually injected from buildpack.toml data into the SBOM. This is not available with the SyftCLIScanner. I couldn't find any reasoning for why this was done in the first place. - I have intentionally not reused some code in methods that's mixed up with the syft go library with an intention to easily phase out that codebase in the near future.
Packit currently supports SBOM generation with syft tooling by utilizing syft's go library. This has caused packit maintainers significant maintainence burden. This commit adds a mechanism for buildpack authors to utlize the syft CLI instead to generate SBOM. The intention here is that with widespread adoption of this, we can phase out the codebase that uses the syft go libary and thereby relieve the maintainers of this pain. Until recently, syft did not allow consumers to specify the exact schema version of an SBOM mediatype they want generated (the tooling currently supports passing a version for CycloneDX and SPDX - github.com/anchore/syft/issues/846#issuecomment-1908676454). So packit was forced to vendor-in (copy) large chunks of upstream syft go code into packit in order to pin SBOM mediatype versions to versions that most consumers wanted to use. Everytime a new version of Syft comes out, maintainers had to painfully update the vendored-in code to work with upstream syft components (e.g. github.com//pull/491). Furthermore, it is advantageous to use the syft CLI instead of syft go library for multiple reasons. With CLI, we can delegate the entire SBOM generation mechanism easily to syft. It should help buildpacks avoid any CVEs that are exposed to it via syft go libaries. The CLI tool is well documented and widely used in the community, and it seems like the syft project is developed with with a CLI-first approach. The caveat here is that buildpack authors who use this method should include the Paketo Syft buildpack in their buildplan to have access to the CLI during the build phase. Example usage: \# detect \# unless BP_DISABLE_BOM is true requirements = append(requirements, packit.BuildPlanRequirement{ Name: "syft", Metadata: map[string]interface{}{ "build": true, }, }) \# build syftCLIScanner := sbomgen.NewSyftCLIScanner( pexec.NewExecutable("syft"), scribe.NewEmitter(os.Stdout), ) \# To scan a layer after installing a dependency _ = syftCLIScanner.GenerateSBOM(myLayer.Path, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) \# OR to scan the workspace dir after running a process _ = syftCLIScanner.GenerateSBOM(context.WorkingDir, context.Layers.Path, myLayer.Name, context.BuildpackInfo.SBOMFormats..., ) - A new package sbomgen is created instead of adding the functionality to the existing sbom package because it helps buildpacks remove pinned "anchore/syft" lib from their go.mod which were flagged down by CVE scanners. - I have not implemented pretty-fication of SBOM that the codepath that use syft go lib implements. This seems to be adding bloat to the app image and not supported via CLI. Consumers of SBOM can easily prettify the SBOM JSONs. - In the codepath that use the syft go lib, license information is manually injected from buildpack.toml data into the SBOM. This is not available with the SyftCLIScanner. I couldn't find any reasoning for why this was done in the first place. - I have intentionally not reused some code in methods that's mixed up with the syft go library with an intention to easily phase out that codebase in the near future.
What would you like to be added:
Right now the Syft JSON format schema version is hard-coded (seems to be to the latest version). When you bump to a newer version of syft, it will start outputting the new format. It would be helpful if you could control the Syft JSON schema version used for output, like
syft package --output json-v2
. I think it would be sufficient to control it at the major version level.Why is this needed:
Well, it's nice to keep on the latest version of the syft tool for bug fixes and scanner improvements but when the output format changes it can take time to adjust the tools that are consuming the output to read the new format.
I can understand if it's not possible to support all major versions for all of time, but supporting the most recent two or three (depending on how quickly the increment) would help provide time to plan and update tools consuming the output.
Additional context:
I'm not sure this would be something necessary when syft hits 1.0, as I'd assume that means the schema changes will be non-breaking, but in the meantime, it would help to have a feature like this so that it can ease the migrations between schemas.
I wouldn't be opposed to alternative solutions either, so this could perhaps be a question and not an enhancement. Like if for example, I could somehow recompile and have it use a different JSON schema version but still get other updates/fixes.
Thanks
The text was updated successfully, but these errors were encountered: