Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent symlinks causing duplicate package-file relationships #1168

Merged
merged 1 commit into from
Aug 22, 2022

Conversation

jedevc
Copy link
Contributor

@jedevc jedevc commented Aug 19, 2022

As symlinks are traversed as part of file resolution, a scenario in which a package owns a file and its respective symlinks, causes multiple relationships to be created between the package and the file (as the symlinks do not appear in the list of files in the output). This seems to have been introduced in #782.

For example, see the following scan of an alpine image that contains the libz package:

    ...
    {
      "SPDXID": "SPDXRef-5baca92b12b31c62",
      "name": "zlib",
      "licenseConcluded": "Zlib",
      "description": "A compression/decompression Library",
      "downloadLocation": "https://zlib.net/",
      "externalRefs": [
        {
          "referenceCategory": "SECURITY",
          "referenceLocator": "cpe:2.3:a:zlib:zlib:1.2.12-r3:*:*:*:*:*:*:*",
          "referenceType": "cpe23Type"
        },
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:alpine/[email protected]?arch=x86_64&upstream=zlib&distro=alpine-3.16.2",
          "referenceType": "purl"
        }
      ],
      "filesAnalyzed": false,
      "hasFiles": [
        "SPDXRef-6761db52f4cfb40f",
        "SPDXRef-6761db52f4cfb40f"
      ],
      "licenseDeclared": "Zlib",
      "originator": "Person: Natanael Copa <[email protected]>",
      "sourceInfo": "acquired package info from APK DB: lib/apk/db/installed",
      "versionInfo": "1.2.12-r3"
    }
    ...
    {
      "SPDXID": "SPDXRef-6761db52f4cfb40f",
      "licenseConcluded": "NOASSERTION",
      "fileName": "lib/libz.so.1.2.12"
    },
    ...
    {
      "spdxElementId": "SPDXRef-5baca92b12b31c62",
      "relationshipType": "CONTAINS",
      "relatedSpdxElement": "SPDXRef-6761db52f4cfb40f"
    },
    {
      "spdxElementId": "SPDXRef-5baca92b12b31c62",
      "relationshipType": "CONTAINS",
      "relatedSpdxElement": "SPDXRef-6761db52f4cfb40f"
    },
    ...

As you can see, we have two copies of the exact same relationship, one for the symlink /lib/libz.so.1 and one for the regular file libz.so.1.2.12 - which are both owned by the libz package. With #1156 merged, this also reflects in incorrect output for the hasFiles field. As of v0.3.0 the golang SPDX parser produces an incorrect result, and produces a nil value in the Package.Files field: see here.

We prevent these files from being confused with each other by de-duplicating the files at the point of creating ownerships, and
removing duplicate coordinates. This ensures we only get a single copy of each relationship.

Copy link
Contributor

@spiffcs spiffcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good - I can add a test for this new behavior so we explicitly show future developers that duplicate ID are supposed to be filtered out.

syft/pkg/cataloger/catalog.go Show resolved Hide resolved
As symlinks are traversed as part of file resolution, a scenario in
which a package owns a file and its respective symlinks, causes multiple
relationships to be created between the package and the file (as the
symlinks do not appear in the list of files in the output).

We prevent these files from being confused with each other by
de-duplicating the files at the point of creating ownerships, and
removing duplicate coordinates. This ensures we only get a single copy
of each relationship.

Signed-off-by: Justin Chadwell <[email protected]>
@jedevc jedevc force-pushed the fix-symlinks-duplicate-relationships branch from 6dab642 to 6481206 Compare August 19, 2022 15:12
@jedevc
Copy link
Contributor Author

jedevc commented Aug 19, 2022

Changes look good - I can add a test for this new behavior so we explicitly show future developers that duplicate ID are supposed to be filtered out.

That would be awesome :) I'm definitely not as familiar with how syft does testing, so any help is massively appreciated 🎉

@spiffcs spiffcs merged commit f3c3d3d into anchore:main Aug 22, 2022
@jedevc jedevc deleted the fix-symlinks-duplicate-relationships branch August 24, 2022 08:33
spiffcs added a commit to scothis/syft that referenced this pull request Aug 24, 2022
* main:
  Update syft bootstrap tools to latest versions. (anchore#1171)
  Fix update-bootstrap-tools workflow (anchore#1170)
  workflow to create automated PRs to update bootstrap tools (anchore#1167)
  feat: add support for licenses in package-lock json v2 (anchore#1164)
  External sources configuration (anchore#1158)
  feat: add support for pnpm (anchore#1166)
  Prevent symlinks causing duplicate package-file relationships (anchore#1168)
  Associate node package licenses from node_modules (anchore#1152)
  Give the contributing guide a substantial rework (anchore#1155)

Signed-off-by: Christopher Phillips <[email protected]>
spiffcs added a commit that referenced this pull request Aug 25, 2022
* main:
  Update syft bootstrap tools to latest versions. (#1176)
  enhance development support on macOS ARM (#1163)
  Capture if a node module is private (#1161)
  Find version numbers from jars with different naming conventions (#1174)
  Update syft bootstrap tools to latest versions. (#1171)
  Fix update-bootstrap-tools workflow (#1170)
  workflow to create automated PRs to update bootstrap tools (#1167)
  feat: add support for licenses in package-lock json v2 (#1164)
  External sources configuration (#1158)
  feat: add support for pnpm (#1166)
  Prevent symlinks causing duplicate package-file relationships (#1168)
  Associate node package licenses from node_modules (#1152)
aiwantaozi pushed a commit to aiwantaozi/syft that referenced this pull request Oct 20, 2022
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants