-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Licenses missing in most report format #933
Comments
The licenses are most likely missing because their names are not listed in The warnings mentioned in this issues' description do not affect the license processing. |
Tagging @cpendery |
@WhyJee I'm having trouble replicating your license counts. When trying to recreate your values using the docker run \
--rm \
-it \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $PWD:/tmp/workdir \
anchore/syft:v0.42.4 \
-v \
packages \
-s Squashed \
-o json \
--file /tmp/workdir/bom.json \
docker:almalinux:8.5-20220306 I'm able to replicate this filtering out of licenses in |
Based on your comments @mj / @spiffcs , I have replayed the analysis with latest Syft and latest Almalinux image. There are 153 license entries in the json output which are identified as :
From this we can split the problem in several categories and eventually solve some. Multiple licensesThis is the tricky case as the scanner would need a robust split algorithm (see above table). Note: a commercial tool our company is also investigating transform:
into:
This is not a split, but it seems it is parsed pretty correctly. Single licenseName mismatchThis is the most common issue for single name. ASL 2.0 not matching one of "apache-2" "apache-2.0" "apache-2.0.0", not leading to license Apache-2.0 I don't know what the other packager (Debian, ...) are putting as license, but it seems the solution could be to update the Name not recognizedThis one occurs only if single license is "MIT" (18 occurrences) or "BSD" (10 occurrences). Solving these 2 issue would be 1st step. |
For purposes of CycloneDX note that the format allows It would be great if at least in CycloneDX case the available information could be returned. A free text License "name" is better than nothing. |
Hi @WhyJee, thanks for bringing the different categories up! One could argue, that in some component that is licenses as "MIT AND LGPL-2.1-only" there actually are "sub components" that are licensed differently. So from the perspective of CycloneDX, this should somehow be two components (that are smooshed together) and not one component with License "MIT AND LGPL-2.1-only". But I'm pretty sure that this is just a rough estimation. I've seen people release software under "GPL AND MIT" to tell you that you can choose one and sometimes people release Software under "GPL OR MIT" to give you this very choice. On the other hand SPDX and Fedora seem, to agree that only "OR" should be used for this. And "AND" should be used if different parts of the component have different licenses. Fedora has a guideline for this: https://docs.fedoraproject.org/en-US/legal/license-field/#_license_expressions |
But while we are waiting for the "real" solution would it not be better to report unknown (unmapped) licenses as "whatever" than not reporting them at all? |
Hi there, I did some cross check and found that other CyclonDX-Tools seem to struggle with Licenses Expressions such as "(LGPLv3+ or GPLv2+) and GPLv3+" as well: CycloneDX/cyclonedx-python#377 The cyclonedx-python people went one step further then just struggeling hier: CycloneDX/cyclonedx-python-lib#304 CyclonDX seems to have a precise way of doing this by embracing SPDX-License-Expressions: One more thing: Dependency Track plans to support these SPDX-License-Expressions as stated here: DependencyTrack/dependency-track#170 (comment) |
Hi there, do we have any progress on this? |
We need the bugfix for the issue anchore/syft#933
We need the bugfix for the issue anchore/syft#933
What happened:
Scanning the same image leads to different results depending on the output format.
Scanning the same image using
tern
Thus the presence or absence of the license is not a format problem as for the common spdx or cyclonedx, tern is able to get this field correctly filled. As in json Syft is able to have all the information, this is probably in the converter that the loss occur (which is reflected I think by the WARN logs).
What you expected to happen:
Expectation is that content is independent of the format (if we except of course table and text) and everything that the format may accept shall be in the output.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
syft version
: 0.42.4cat /etc/os-release
or similar):The text was updated successfully, but these errors were encountered: