Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spdx: initial sbom framwork + spdx encoder #1468

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

BradLugo
Copy link
Contributor

@BradLugo BradLugo commented Jan 14, 2025

Description

These changes introduce the initial SBOM framework into claircore, generating an SPDX 2.3 JSON document from a claircore.IndexReport.

Design

Design doc: https://docs.google.com/document/d/1dRKMWjmkpYO5oN_Y-S_0R2vwyvcdXHCNejM3C-F2DIc/edit?tab=t.0

@BradLugo BradLugo changed the title spdx: implement initial sbom framwork + spdx encoder spdx: initial sbom framwork + spdx encoder Jan 14, 2025
@hdonnay
Copy link
Member

hdonnay commented Jan 14, 2025

Should be moved out of the pkg directory.

Rambling on why

The pkg in a package name is just line noise: we all know it's a go package. Marking something for "external" use is done in the opposite direction: internal is actually enforced by the compiler. If the API for use is squishier than that, it's on humans to observe it anyway.

@BradLugo BradLugo force-pushed the add-spdx-converter branch 3 times, most recently from 11f31ac to be73a74 Compare January 14, 2025 19:28
@BradLugo
Copy link
Contributor Author

If anyone has a suggestion on testing something that bakes in time.Now() the output, please let me know. Example of the problem in my current unit test:

            - 		"created":  string("2025-01-14T21:48:35Z"),
            + 		"created":  string("2025-01-14T09:51:40Z"),

@hdonnay
Copy link
Member

hdonnay commented Jan 15, 2025

If anyone has a suggestion on testing something that bakes in time.Now() the output, please let me know.

You can put together a cmp.Option for cmp that turns these back into time.Time and allows some delta, or just ignores them.

sbom/sbom.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
@BradLugo
Copy link
Contributor Author

I'll start using fixup commits from this point on to help reviewers until I get approvals, then I'll rebase all the fixup commits.

@BradLugo BradLugo marked this pull request as ready for review January 16, 2025 22:49
@BradLugo BradLugo requested a review from a team as a code owner January 16, 2025 22:49
@BradLugo BradLugo requested review from hdonnay, dcaravel and crozzy and removed request for a team January 16, 2025 22:49
dcaravel
dcaravel previously approved these changes Jan 16, 2025
Copy link

@dcaravel dcaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 78 to 81
// TODO(DO NO MERGE): This feels terrible
ignoreCreatedTimestamp := cmp.FilterPath(func(p cmp.Path) bool {
sf, ok := p.Index(3).(cmp.MapIndex)
return ok && sf.String() == `["created"]`
}, cmp.Ignore())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdonnay is this what you were thinking? Is there a cleaner way to do it?

Copy link
Member

@hdonnay hdonnay Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're leaving them as map[string]any, you could use cmpopts.IgnoreMapEntries. If you turn them back into the strong types, you could do:

cmpopts.AcyclicTransformer("IgnoreTimestamps", func(info *v2_3.CreationInfo) *v2_3.CreationInfo{
    info.Created = "2006-01-02T15:04:05Z"
    return info
})

@BradLugo BradLugo marked this pull request as draft January 21, 2025 19:09
@BradLugo BradLugo force-pushed the add-spdx-converter branch 3 times, most recently from 29c8071 to f8bb863 Compare January 23, 2025 21:27
@BradLugo BradLugo marked this pull request as ready for review January 23, 2025 21:27
Adding a function to be able to convert index reports
into SPDX documents and SPDX documents into index reports.

Signed-off-by: crozzy <[email protected]>
Signed-off-by: Brad Lugo <[email protected]>
Comment on lines 23 to 37
type Version string

const (
V2_3 Version = "v2.3"
)

type Format string

const JSON Format = "json"

type Creator struct {
Creator string
// In accordance to the SPDX v2 spec, CreatorType should be one of "Person", "Organization", or "Tool"
CreatorType string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types need documentation if they're exported.

Comment on lines 41 to 48
type Encoder struct {
Version Version
Format Format
Creators []Creator
DocumentName string
DocumentNamespace string
DocumentComment string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user is expected to construct this themselves? Is the zero value "ready" to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth on this. Many of the zero values aren't to spec, but disallowing empty strings felt weird. One idea I toyed around with was having a NewDefaultEncoder() that created an Encoder with some claircore values, e.g., Creator[0].CreatorType = "Tool", Creator[0].Creator = "claircore". What do you think? (cc @RTann)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcaravel, can I get your opinion on this as well?

Copy link

@dcaravel dcaravel Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   
Version I'd be OK with a default, would need to be configurable by the client in the future as multiple formats/version are supported.
Format same ^^
Creators be OK adding claircore as a default Tool but must have the option to add other tools, people, organizations, etc. that claircore will not know about. Adding a claircore version by default would likely require adding and maintaining a file in the source repo that is updated with each commit (if not already existing), which could be a burden.
DocumentName current state I see the client having to provide this - not sure how a default value could be derived - an index report does not contain a public friendly identifier that would be meaningful to a client/users (at least in the ACS case), but it's also contextual: for an image this could be the image digest, or the full image ref - for a node this could be the node name or IP, etc.
DocumentComment same ^, this may be the only field available for 'extra context' from a client, i'd be OK with a generic default value, however the client should be able to override the value - as an example, for ACS we're planning on adding a note to indicate this is tech preview.
DocumentNamespace same ^, leaning toward making this clients choice as well, the spec says this should be an "unambiguous mechanism for other SPDX documents to reference SPDX elements within this SPDX document", which the index report does not have that context, and that unique addressability may not apply to type 'Analyzed' SBOMs either. The example from the spec: https://[CreatorWebsite]/[pathToSpdx]/[DocumentName]-[UUID] hints this may be associated with some external storage/path/etc. which also is not something claircore knows about or is in index report. Am therefore leaning toward putting this on the client to specify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed an update for this. @dcaravel, what'd you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, left a review below.

super nit and possibly a personal preference: Setting the doc name, namespace, comment on the Encoder object seemed easier to read vs. strings passed to NewDefaultEncoder(string, string, string) (I had put them in the wrong order initially). Perhaps a good candidate for functional options? - but that may be over complicating it.

sbom/sbom.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
Comment on lines 78 to 81
// TODO(DO NO MERGE): This feels terrible
ignoreCreatedTimestamp := cmp.FilterPath(func(p cmp.Path) bool {
sf, ok := p.Index(3).(cmp.MapIndex)
return ok && sf.String() == `["created"]`
}, cmp.Ignore())
Copy link
Member

@hdonnay hdonnay Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're leaving them as map[string]any, you could use cmpopts.IgnoreMapEntries. If you turn them back into the strong types, you could do:

cmpopts.AcyclicTransformer("IgnoreTimestamps", func(info *v2_3.CreationInfo) *v2_3.CreationInfo{
    info.Created = "2006-01-02T15:04:05Z"
    return info
})

go.mod Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/sbom.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Show resolved Hide resolved
sbom/spdx/encoder.go Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
Apply suggested changes locally
Accept a Writer instead of returning a Reader
@BradLugo BradLugo requested review from RTann and hdonnay January 28, 2025 21:40
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
Fix Creator fields ordering in NewDefaultEncoder()
sbom/sbom.go Outdated Show resolved Hide resolved
sbom/sbom.go Outdated
"io"
)

// Encoder is an interface to convert a claircore.IndexReport into an io.Reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Encoder is an interface to convert a claircore.IndexReport into an io.Reader
// Encoder is an interface to write a Software Bill of Materials (SBOM) representation of a [claircore.IndexReport] to an output stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget about this one ^ :)

sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder.go Outdated Show resolved Hide resolved
sbom/spdx/encoder_test.go Show resolved Hide resolved
Change the NewDefaultEncoder() to use an option pattern
dcaravel
dcaravel previously approved these changes Feb 4, 2025
Copy link

@dcaravel dcaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested on a few images (namely nginx:1.24.0 and quay.io/rhacs-eng/main:4.6.2) comparing the v4.IndexReport from stackrox to the produced SBOMs, and the output was as expected / explainable.

@BradLugo BradLugo requested a review from RTann February 4, 2025 23:49
RTann
RTann previously approved these changes Feb 5, 2025
Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to ensure this can move along. No true blockers from me, though perhaps the import ordering may be one? I also ask for consideration of the strconv.Atoi comments (do we need these calls?) and the comment about breaking out of the loop

sbom/sbom.go Outdated
"io"
)

// Encoder is an interface to convert a claircore.IndexReport into an io.Reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget about this one ^ :)

sbom/spdx/encoder.go Outdated Show resolved Hide resolved
"github.com/spdx/tools-golang/spdx/v2/v2_3"
)

type Version string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't block the PR, but I think each public field usually has a godoc in this repo

sbom/spdx/encoder.go Show resolved Hide resolved
return nil, ctx.Err()
}

rPkgId, err := strconv.Atoi(r.Package.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't block the PR over this, but nit: pkgID or rPkgID (what does the r stand for?)

Copy link
Contributor Author

@BradLugo BradLugo Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the r stand for?

"record", as in "the current claircore record we're working with." I think I was creating the spdx package ID and saving that to pkgId in some other commit.

pkgIds = append(pkgIds, rPkgId)

if r.Package.Source != nil && r.Package.Source.Name != "" {
rSrcPkgId, err := strconv.Atoi(r.Package.Source.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit here, but optional


// Record Distributions for this package.
if r.Distribution != nil {
rDistId, err := strconv.Atoi(r.Distribution.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just realized, do we use the IDs for anything other than converting them into ints? If not, then can we just keep them as strings and make the map key string instead of int? Same idea for the slices. Not sure if we have to keep doing this conversion

Copy link
Contributor Author

@BradLugo BradLugo Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We convert them to ints to sort them. The return from IndexReport.IndexRecords() is non-deterministic because it iterates over IndexReport.Packages, which is a map. In order to have a deterministic output, both to test against and for users to be able to run a diff between reports, we need to have some sort of ordering structuring. This is what I came up with, but I'm happy to revisit how we do it (perhaps in a follow up PR 🙂)

Some of this context is captured in a comment here: Some context here: https://github.com/quay/claircore/pull/1468/files#diff-9730b2b5baa2c38c549f0b3c1077fddf654896f316b79c2c8f3f77d5fb7ccd1cR273-R276

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if we just cared about sorting for determinism, then I think we can keep it as strings. If we also want to have the IDs in increasing order, then converting to int makes sense. Feel free to change in a followup if you decide to change it

sbom/spdx/encoder.go Show resolved Hide resolved
sbom/spdx/encoder.go Show resolved Hide resolved
return nil, err
}

srcPkg, ok := pkgs[rSrcPkgId]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't block the PR over this, but wondering why we add the source here instead of just waiting until we see it? Do we not always see it? In the case we add it here and then see it again later, do we risk potentially duplicating data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crozzy will likely be able to explain this better, but my understanding is that source packages aren't always recorded as root-level packages. In fact, I think rhcc source packages are the only ones that are also recorded as root-level packages.
For clarity, when I say "root-level packages" here, I'm referring to IndexReport.Packages as opposed to IndexReport.Packages.Source.

@RTann
Copy link
Contributor

RTann commented Feb 5, 2025

don't forget to sign your commits :) Also looks like there are still a bunch. Seems like you are on top of that, though, with the fixup! prefixes

@BradLugo
Copy link
Contributor Author

BradLugo commented Feb 5, 2025

don't forget to sign your commits :) Also looks like there are still a bunch. Seems like you are on top of that, though, with the fixup! prefixes

Yup, thank you! It's a new thing I'm trying. The fixup commits have a message body detailing more about the commit and giving reviewers an easier history to work with. Then, when I get approvals from all participating reviewers, I rebase --autosquash at the end to clean up the history and pass CI. Not sure if this is helpful or not. Would love yall's feedback on it!

Address Ross's latest review
@BradLugo BradLugo dismissed stale reviews from RTann and dcaravel via 9fcd250 February 5, 2025 21:16
sbom/sbom.go Outdated
Comment on lines 10 to 12
// Encoder is an interface to convert a claircore.IndexReport and writes it to
// w.
// that contains a Software Bill of Materials representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is a mix of the old version and the new version of this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I didn't mean to add this to that commit. Should be fixed in the latest one + all the other godoc changes.

Add godoc comments to exported values
@BradLugo BradLugo requested a review from RTann February 6, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants