-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(spdx): use the hasExtractedLicensingInfos
field for licenses that are not listed in the SPDX
#8077
base: main
Are you sure you want to change the base?
fix(spdx): use the hasExtractedLicensingInfos
field for licenses that are not listed in the SPDX
#8077
Conversation
return fmt.Sprintf("(%s)", license) | ||
}), " AND ") | ||
|
||
normalizedLicense, err := expression.Normalize(license, licensing.NormalizeLicense, expression.NormalizeForSPDX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to override the expression.NormalizeForSPDX
function, but we use With
as a delimiter to parse licenses:
%token<token> IDENT OR AND WITH |
So there are problems with overwriting licenses with the wrong exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this code will fix the issue and provide compliance SPDX license expressions.
I had a few suggestions to produce a bit more understandable results and a few coding suggestions.
} | ||
if text { | ||
otherLicense.ExtractedText = license | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { |
I would always include the license as the license name - not just when the text is present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thought, this may be appropriate if the license text is the actual text of the license. In most cases, the metadata for packages includes text that are supposed to be a license name or identifier in which case it should also be in the name. If we know the license text is really the text, then the existing code is OK.
If, however, the license text is what is found in the package metadata files and they are not the actual text, I would add the same field as the name PLUS add a LicenseComment to explain - something like `otherLicense.LicenseComment = "The license text represents text found in package metadata and may not represent the full text of the license"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases (e.g. license
field from python METADATA files) when we can't understand that license is name/id/text.
We tried to detect text
of license:
trivy/pkg/licensing/normalize.go
Lines 596 to 617 in f6acdf7
var licenseTextKeywords = []string{ | |
"http://", | |
"https://", | |
"(c)", | |
"as-is", | |
";", | |
"hereby", | |
"permission to use", | |
"permission is", | |
"use in source", | |
"use, copy, modify", | |
"using", | |
} | |
func isLicenseText(str string) bool { | |
for _, keyword := range licenseTextKeywords { | |
if strings.Contains(str, keyword) { | |
return true | |
} | |
} | |
return false | |
} |
So Trivy split license name/id and license text(text://
prefix).
That is why i used both LicenseName
and ExtractedText
fields:
https://github.com/DmitriyLewen/trivy/blob/f851f9bb18411db838fc65e0c6c4351b04953f8f/pkg/sbom/spdx/marshal_private_test.go#L92-L112
Another question - i used NOASSERTION
for LicenseName
and ExtractedText
fields (see link above), because these fields are mandatory (https://github.com/spdx/tools-golang/blob/f6e45fdb9e4e0c993105f798bee5f8aa8ea70f84/spdx/v2/v2_3/other_license.go#L12-L20).
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a LicenseComment to explain - something like `otherLicense.LicenseComment = "The license text represents text found in package metadata and may not represent the full text of the license"
I liked this idea 👍
Added in 041ab21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Trivy split license name/id and license text(text:// prefix).
That is why i used both LicenseName and ExtractedText fields
Makes sense - I wasn't sure how the text was captured. Since you are capturing the text and name distinctly, your approach should work. BTW, it's a bit tricky to find the start and end of the license text and even trickier to match it to know licenses - something we've been working on in the SPDX java libraries for about 10 years and still don't have it perfected ;).
i used NOASSERTION for LicenseName and ExtractedText fields (see link above), because these fields are mandatory (https://github.com/spdx/tools-golang/blob/f6e45fdb9e4e0c993105f798bee5f8aa8ea70f84/spdx/v2/v2_3/other_license.go#L12-L20).
Is this correct?
For the license name, this is OK. The spec isn't very clear on how these should be treated, so many people make up a name based on the text. Unlike other parts of the spec, NOASSERTION is not required if you don't know - but in this case I think it would be fine to use NOASSERTION - for the name.
For the text, I would put in whatever string was actually found - even if it is the name. The definition of the field is the license text found - so if someone puts in "This software is licensed under mylicense" - you can put that exact string in the text field even though it is not technically the text of "mylicense". I wouldn't use "NOASSERTION" for the text field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks for your opinion!
Updated text field in 659f992
pkg/licensing/expression/category.go
Outdated
"VSFTPD-OPENSSL-EXCEPTION", | ||
"WXWINDOWS-EXCEPTION-3.1", | ||
"X11VNC-OPENSSL-EXCEPTION", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment may deserve to be a separate suggestion, but in reading the code I would recommend building the license and exception IDs from the JSON files maintained by the SPDX legal team. The license list is updated every 3 months with new IDs and maintaining these in code can be a challenge to keep up and maintain. What I do in the code I maintain is attempt to access the current JSON files on the website https://spdx.org/licenses/licenses.json and https://spdx.org/licenses/exceptions.json. If I can not access the website or if the user specified not to use the online version, I'll use a cached version of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases when users run multiple times.
Downloading these files for each run is not good.
But we can save licenses.json
and exceptions.json
files in the cache dir and use them.
The files contain releaseDate
field, so we can update this file only when releaseDate + 3 months
has expired.
The license list is updated every 3 months
How strictly is this rule followed?
Anyway let's move this discussion into another issue/pr.
What I do in the code I maintain is attempt to access the current JSON files on the website https://spdx.org/licenses/licenses.json and https://spdx.org/licenses/exceptions.json
I found that https://spdx.org/licenses/exceptions.json and https://github.com/spdx/license-list-data/blob/592c2dcb8497c6fe829eea604045f77d3bce770b/json/exceptions.json are different (see harbour-exception
).
Which file would be more correct to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How strictly is this rule followed?
Not very strictly. There is, however, a license list version field which is reliably incremented on release.
I found that https://spdx.org/licenses/exceptions.json and https://github.com/spdx/license-list-data/blob/592c2dcb8497c6fe829eea604045f77d3bce770b/json/exceptions.json are different (see harbour-exception).
Which file would be more correct to use?
The lists at https://spdx.org/licenses - these will always be the latest released version. The github repo master will have the latest in development version which may not be stable. The github repo is tagged with release versions, so if you go to the tag for the latest release in github, it will match what is on the website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, update exception list in 659f992
} | ||
return NormalizeLicense(c.Licenses) | ||
otherLicense.LicenseIdentifier = LicenseRefPrefix + "-" + licenseID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion - you could simplify this by including the "-" in the LicenseRefPrefix definition and just use LicenseRefPrefix + licenseID
The string "LicenseRef" will likely never be used without the trailing "-".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is our internal decision - we don't use -
suffixes in constants :)
pkg/licensing/expression/category.go
Outdated
// SpdxLicenseExceptions contains all supported SPDX Exceptions | ||
// cf. https://spdx.org/licenses/exceptions.json | ||
// used `awk -F'"' '/"licenseExceptionId":/ {print toupper("\"" $4 "\"," )}' exceptions.json ` command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we need to keep it up-to-date, it should be done by mage spdx
or something like that. I think we should create a separate file for the list and add
// Code generated by "mage spdx", DO NOT EDIT.
// source: https://spdx.org/licenses/exceptions.json
to the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to check exceptions.json
file in tests?
I mean the same as for mage docs:generate
This will help keep the file up-to-date, but can be noisy for PRs when a new version of file is released.
On the other hand, we can add a separate action to check the file's relevance once a week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using curl
and awk
through go generate
is the easiest way, but some environments don't have curl
, and CLI flags might be different. Ideally, we should do that in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to check exceptions.json file in tests?
No, we don't need it for now. We can update the file when we notice that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's what i planned
this command (using awk
and other commands) is just a quick way to get all exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add a separate action to check the file's relevance once a week.
This sounds better. Also, we don't need to fail the test. We can notify it on Microsoft Teams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated PR.
Take a look, when you have time, please.
Description
For cases where the SPDX license list does not contain the detected license, we should use the
hasExtractedLicensingInfos
field.See #7721 for more details.
Before:
After:
test runs for cron action:
Related issues
hasExtractedLicensingInfos
for licenses not in the SPDX license list #7721Checklist