erDiagram
VULNERABILITY_V2 {
BIGINT ID PK
TEXT VULN_ID "Vulnerability ID, e.g. CVE-123"
TEXT NAMESPACE "Namespace of the vulnerability ID, e.g. CVE"
}
VULNERABILITY_V2_METADATA {
BIGINT VULNERABILITY_V2_ID FK
TEXT SOURCE "Source of the metadata, e.g. NVD"
TEXT DESCRIPTION "Description of the vulnerability"
TEXT[] REFERENCES
TIMESTAMPTZ CREATED "When the vulnerability was created"
TIMESTAMPTZ PUBLISHED "When the vulnerability was published"
TIMESTAMPTZ UPDATED "When the vulnerability was updated"
TIMESTAMPTZ REJECTED "When the vulnerability was rejected"
}
VULNERABILITY_V2_RATING {
BIGINT VULNERABILITY_V2_ID FK
TEXT SOURCE "Source of the rating, e.g. NVD"
TEXT METHOD "Rating method, e.g. CVSS_V40"
TEXT VECTOR
NUMERIC SCORE
TEXT JUSTIFICATION
TIMESTAMPTZ CREATED
TIMESTAMPTZ UPDATED
}
VULNERABILITY_V2_MATCHING_CRITERIA {
BIGINT ID PK
BIGINT VULNERABILITY_V2_ID FK
TEXT SOURCE "Source of the criteria, e.g. NVD"
TEXT CPE "Complete CPE"
TEXT CPE_PART_LOWER "Part segment of the CPE"
TEXT CPE_VENDOR_LOWER "Vendor segment of the CPE"
TEXT CPE_PRODUCT_LOWER "Product segment of the CPE"
TEXT PURL_TYPE "Type segment of the PURL"
TEXT PURL_NAMESPACE "Namespace segment of the PURL"
TEXT PURL_NAME "Name segment of the PURL"
TEXT AFFECTED_VERSIONS "Affected versions in vers notation"
INT ADDITIONAL_CRITERIA_TYPE "Additional criteria discriminator"
BYTEA ADDITIONAL_CRITERIA "Additional criteria (e.g. affected imports, symbols)"
TIMESTAMPTZ CREATED
TIMESTAMPTZ UPDATED
}
VULNERABILITY_V2 1--1+ VULNERABILITY_V2_METADATA: describes
VULNERABILITY_V2 1--1+ VULNERABILITY_V2_MATCHING_CRITERIA: matches
VULNERABILITY_V2 1--0+ VULNERABILITY_V2_RATING: rates
TODO
We have to assume that multiple data sources will report metadata for the same vulnerability. This metadata can be conflicting
TODO
TODO
Some upstream databases provide additional data, for example to evaluate whether vulnerable code is being invoked. There is no standard notation for data like that, and the structure differs across databases and component ecosystem.
Thus, the schema treats additional criteria as opaque byte array. A discriminator column is added to inform readers about the data format. Discriminators can further be used for versioning, should the data format change over time.
Additional criteria should be compressed to reduce storage requirements, for example using zstd.
CPE matching is supposed to be case-insensitive.
To facilitate performant querying for matching criteria, we'll use PostgreSQL's covering indexes in order to enable it to use index-only scans.
CREATE
INDEX "VULNERABILITY_V2_MATCHING_CRITERIA_CPE_IDX"
ON "VULNERABILITY_V2_MATCHING_CRITERIA"(
"CPE_PART_LOWER"
, "CPE_VENDOR_LOWER"
, "CPE_PRODUCT_LOWER")
INCLUDE ("ID", "VULNERABILITY_V2_ID", "CPE", "AFFECTED_VERSIONS")
WHERE "CPE_PART_LOWER" IS NOT NULL
AND "CPE_VENDOR_LOWER" IS NOT NULL
AND "CPE_PRODUCT_LOWER" IS NOT NULL
CREATE
INDEX "VULNERABILITY_V2_MATCHING_CRITERIA_PURL_IDX"
ON "VULNERABILITY_V2_MATCHING_CRITERIA"(
"PURL_TYPE"
, "PURL_NAMESPACE"
, "PURL_NAME")
INCLUDE ("ID", "VULNERABILITY_V2_ID", "AFFECTED_VERSIONS")
WHERE "PURL_TYPE" IS NOT NULL
AND "PURL_NAME" IS NOT NULL
Ensure that CPE segments are indeed stored in lower case:
ALTER TABLE "VULNERABILITY_V2_MATCHING_CRITERIA"
ADD CONSTRAINT "VULNERABILITY_V2_MATCHING_CRITERIA_CPE_LOWERCASE_CHECK"
CHECK ("CPE_PART_LOWER" = LOWER("CPE_PART_LOWER")
AND "CPE_VENDOR_LOWER" = LOWER("CPE_VENDOR_LOWER")
AND "CPE_PRODUCT_LOWER" = LOWER("CPE_PRODUCT_LOWER"));
Ensure that PURL segments are indeed stored in URL-decoded form:
ALTER TABLE "VULNERABILITY_V2_MATCHING_CRITERIA"
ADD CONSTRAINT "VULNERABILITY_V2_MATCHING_CRITERIA_PURL_URL_DECODED_CHECK"
CHECK ("PURL_NAMESPACE" NOT LIKE '%\%%' ESCAPE '\'
AND "PURL_NAME" NOT LIKE '%\%%' ESCAPE '\');
SELECT "ID"
, "VULNERABILITY_V2_ID"
, "AFFECTED_VERSIONS"
FROM "VULNERABILITY_V2_MATCHING_CRITERIA"
WHERE "CPE_PART_LOWER" = LOWER(:cpePart)
AND "CPE_VENDOR_LOWER" = LOWER(:cpeVendor)
AND "CPE_PRODUCT_LOWER" = LOWER(:cpeProduct)
!!! note
The query above is simplified for brevity. Usually, WHERE
conditions are dynamic,
and depend on the CPE of the component at hand. Refer to the current query construction logic
for details.
SELECT "ID"
, "VULNERABILITY_V2_ID"
, "AFFECTED_VERSIONS"
FROM "VULNERABILITY_V2_MATCHING_CRITERIA"
WHERE "PURL_TYPE" = :purlType
AND "PURL_NAMESPACE" = :purlNamespace
AND "PURL_NAME" = :purlName
!!! note
PURL_NAMESPACE
and PURL_NAME
must be in their URL-decoded form (i.e. @foo
, not %40foo
).
For batch processing purposes, matching criteria can be queried for multiple components at once.
Simply UNION ALL
multiple queries, and include a COMPONENT
discriminator in each query to track
which result row was returned for which component.
SELECT 'foo' AS "COMPONENT"
, "ID"
, NULL AS "CPE"
, "VULNERABILITY_V2_ID"
, "AFFECTED_VERSIONS"
FROM "VULNERABILITY_V2_MATCHING_CRITERIA"
WHERE "PURL_TYPE" = :fooPurlType
AND "PURL_NAMESPACE" = :fooPurlNamespace
AND "PURL_NAME" = :fooPurlName
UNION ALL
SELECT 'bar' AS "COMPONENT"
, "ID"
, "CPE"
, "VULNERABILITY_V2_ID"
, "AFFECTED_VERSIONS"
FROM "VULNERABILITY_V2_MATCHING_CRITERIA"
WHERE "CPE_PART_LOWER" = LOWER(:barCpePart)
AND "CPE_VENDOR_LOWER" = LOWER(:barCpeVendor)
AND "CPE_PRODUCT_LOWER" = LOWER(:barCpeProduct)
!!! note The same approach also works if a component has multiple identifiers, i.e. both CPE and PURL.
For criteria for which a version match was identified, additional matching information may be retrieved:
SELECT "ID"
, "ADDITIONAL_CRITERIA_TYPE"
, "ADDITIONAL_CRITERIA"
FROM "VULNERABILITY_V2_MATCHING_CRITERIA"
WHERE "ID" = ANY (:matchedIds)
AND "ADDITIONAL_CRITERIA" IS NOT NULL
Additional criteria is fetched separately from version ranges, because:
- We expect it to be arbitrarily large, so it's best to defer retrieval until absolutely necessary
- We expect it to be arbitrarily large, so we can't reasonably include it in any of the indexes
- In the majority of cases, versions will not match, making retrieval of this data pointless
- This step is entirely optional, and in the majority of cases won't yield any results anyway
The proposed schema considers only CPE and PURL. However, new identifiers can always be added later by:
- Adding new column(s) to the
VULNERABILITY_V2_MATCHING_CRITERIA
table - Adding corresponding indexes for index-only scans on the new identifier