Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce package UIDs for improved vulnerability mapping #6583

Merged
merged 11 commits into from
May 3, 2024

Conversation

knqyf263
Copy link
Collaborator

@knqyf263 knqyf263 commented Apr 30, 2024

Description

Currently, Trivy outputs Packages and Vulnerabilities separately in JSON format, with detected vulnerabilities including a subset of package metadata such as name and version. However, identifying the original package from this limited metadata can be challenging in certain scenarios:

  • When multiple lock files use the exact same package name and version (but have different dependencies or line numbers)
  • When the same package is installed in multiple paths

To accurately associate detected vulnerabilities with their corresponding packages in these cases, this PR introduces the concept of package UIDs. These UIDs are calculated from the Package struct and serve as unique identifiers for each package. It's important to note that these values (hashes) are not the same as digests used for ensuring integrity; they are solely intended for uniquely identifying packages within Trivy.
One advantage of using hashes instead of UUIDs is that the same hash value will be generated across multiple runs for the same package.

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

[skip ci]

Signed-off-by: knqyf263 <[email protected]>
@knqyf263
Copy link
Collaborator Author

knqyf263 commented May 2, 2024

@DmitriyLewen Please let me know if you have a better idea.

@DmitriyLewen
Copy link
Contributor

My first thought about this was "Why can't we just use the file path" (another reason to get rid of the two file paths).
But after thinking a little, I realized that this was a bad decision.
Even if we can update our logic - it will not help the users because the lock file may contain many dependencies and the user has to check all the packages of the lock file to find the required package.

So your solution is good. This will make it easier to detect the vulnerable package in Packages.
Also after these changes we will be able to remove some duplicate fields because searching for packages will be easier.

But I was worried about the hash. It is inconvenient to use manually - this value means nothing to the user.
I thought about purl + filePath qualifier (we did this for CycloneDX BomRef).
But I realized that this doesn't make sense - you can see the name/version/etc. in the vulnerability fields, but for searching it makes no difference what value to paste in Ctrl+F.

So I think this is a good idea and should be implemented as you suggested.
Tell me if I can help you with this PR.

@knqyf263
Copy link
Collaborator Author

knqyf263 commented May 3, 2024

Thanks for your input.

I thought about purl + filePath qualifier (we did this for CycloneDX BomRef).

PURLs are not IDs that uniquely identify installed packages, so we used to use PURLs + file paths, but that was not suitable for the PURL specification. Then, we have now changed the logic of the CycloneDX output and use UUIDs when PURLs conflict.

@knqyf263 knqyf263 marked this pull request as ready for review May 3, 2024 09:18
Signed-off-by: knqyf263 <[email protected]>
@knqyf263 knqyf263 force-pushed the feat/add_pkg_hashes branch from 218bbb8 to a188c94 Compare May 3, 2024 09:24
@knqyf263 knqyf263 removed request for simar7 and nikpivkin May 3, 2024 09:24
@knqyf263
Copy link
Collaborator Author

knqyf263 commented May 3, 2024

@DmitriyLewen I didn't come up with a good idea of where we should calculate the UID. Please let me know if you have a better one.

@knqyf263 knqyf263 changed the title feat: introduce package hashes for improved vulnerability mapping feat: introduce package UIDs for improved vulnerability mapping May 3, 2024
@DmitriyLewen
Copy link
Contributor

I think you chose a good place. we create identifiers (purl, uid) in one place.

@knqyf263
Copy link
Collaborator Author

knqyf263 commented May 3, 2024

@DmitriyLewen It's finally ready for review! The tests passed in my env, but failed on GitHub Actions because some tests run only on Linux.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@knqyf263 knqyf263 added this pull request to the merge queue May 3, 2024
Merged via the queue into aquasecurity:main with commit 998f750 May 3, 2024
12 checks passed
@knqyf263 knqyf263 deleted the feat/add_pkg_hashes branch May 3, 2024 11:34
fl0pp5 pushed a commit to altlinux/trivy that referenced this pull request May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants