Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve databases documentation #7732

Merged
merged 44 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4f50099
docs: reorg db and airgapped docs
itaysk Oct 29, 2024
05e5951
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
78f9645
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
9c7ba2e
Update docs/docs/advanced/air-gap.md
itaysk Oct 30, 2024
9bf0c0d
Update docs/docs/advanced/air-gap.md
itaysk Oct 30, 2024
59471ad
Update docs/docs/advanced/air-gap.md
itaysk Oct 30, 2024
3d72942
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
503ad2a
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
d11129a
add checks manifest
itaysk Oct 30, 2024
8c38626
fix checks db name
itaysk Oct 30, 2024
9248f14
remove pull through cache section
itaysk Oct 30, 2024
3217ad2
fix image addresses
itaysk Oct 31, 2024
e8ea4f8
clarify fallback condition
itaysk Oct 31, 2024
5d5bb52
docs: fix broken tabs
knqyf263 Oct 31, 2024
09aae4b
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
1c4779f
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
9273551
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
0b9c6cd
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
a666194
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
8ad1e8b
small typo
itaysk Oct 31, 2024
3b4755a
move media types
itaysk Oct 31, 2024
9ce7149
separate self-hosting
itaysk Nov 5, 2024
09a0833
Update docs/docs/advanced/self-hosting.md
itaysk Nov 7, 2024
11e81ec
add crane
itaysk Nov 7, 2024
36256ad
update navigation
itaysk Nov 7, 2024
f4de57f
rename airgap doc back
itaysk Nov 7, 2024
acb7a3f
fix note
itaysk Nov 7, 2024
661c42f
fix title
itaysk Nov 7, 2024
afb7ca7
fix title
itaysk Nov 7, 2024
b649120
Update docs/docs/advanced/air-gap.md
itaysk Nov 7, 2024
3e59b11
don't duplicate db locations
itaysk Nov 8, 2024
9127eb6
add gcr
itaysk Nov 19, 2024
0cea3a5
update connectivity doc
itaysk Nov 19, 2024
1c67568
Merge branch 'main' into fallback
itaysk Nov 20, 2024
5cb9edb
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
8ba8e8d
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
04d769e
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
e23ddcf
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
9334707
Update docs/docs/configuration/db.md
itaysk Nov 24, 2024
4659664
clarify only update
itaysk Nov 24, 2024
efea814
note about only checks db
itaysk Nov 24, 2024
dba78b2
fix registry links
itaysk Nov 25, 2024
50faccb
improve self-host intro
itaysk Nov 25, 2024
b00e823
rename checks db to bundle
itaysk Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 85 additions & 103 deletions docs/docs/advanced/air-gap.md
Original file line number Diff line number Diff line change
@@ -1,130 +1,75 @@
# Advanced Network Scenarios

Trivy needs to connect to the internet occasionally in order to download relevant content. This document explains the network connectivity requirements of Trivy and setting up Trivy in particular scenarios.
Trivy requires internet connectivity in order to function normally. If your organizations blocks or restricts network traffic, that could prevent Trivy from working correctly.
This document explains Trivy's network connectivity requirements, and how to configure Trivy to work in restricted networks environments, including completely air-gapped environments.

## Network requirements
The following external resources are required by Trivy for the respective features:
itaysk marked this conversation as resolved.
Show resolved Hide resolved

Trivy's databases are distributed as OCI images via GitHub Container registry (GHCR):
External Resource | Feature | Details
--- | --- | ---
Vulnerability Database | Vulnerability scanning | [Trivy DB](../scanner/vulnerability.md)
Java Vulnerability Database | Java vulnerability scanning | [Trivy Java DB](../coverage/language/java.md)
Misconfigurations Database | Misconfigurations scanning | [Trivy Checks](../scanner/misconfiguration/check/builtin.md)
VEX Hub | VEX Hub | [VEX Hub](../supply-chain/vex/repo/#vex-hub)
Maven Central / Remote Repositories | Java vulnerability scanning | [Java Scanner/Remote Repositories](../coverage/language/java.md#remote-repositories)

- <https://ghcr.io/aquasecurity/trivy-db>
- <https://ghcr.io/aquasecurity/trivy-java-db>
- <https://ghcr.io/aquasecurity/trivy-checks>
!!! note
Trivy is an open source project that relies on public free infrastructure. In case of extreme load, you may encounter rate limiting when Trivy attempts to connect to external resources.
itaysk marked this conversation as resolved.
Show resolved Hide resolved

The following hosts are required in order to fetch them:
The rest of this document details each resource's connectivity requirements and relevant configuration options.

- `ghcr.io`
- `pkg-containers.githubusercontent.com`
## Vulnerability & Java databases

The databases are pulled by Trivy using the [OCI Distribution](https://github.com/opencontainers/distribution-spec) specification, which is a simple HTTPS-based protocol.
### Connectivity requirements

[VEX Hub](https://github.com/aquasecurity/vexhub) is distributed from GitHub over HTTPS.
The following hosts are required in order to fetch it:
Trivy's Vulnerability and Java databases are packaged as OCI images and stored in public container registries. The specific registries and locations are detailed in the [databases document](../configuration/db.md).

- `api.github.com`
- `codeload.github.com`

## Running Trivy in air-gapped environment

An air-gapped environment refers to situations where the network connectivity from the machine Trivy runs on is blocked or restricted.
Communication with OCI Registries follows the [OCI Distribution](https://github.com/opencontainers/distribution-spec) spec.

In an air-gapped environment it is your responsibility to update the Trivy databases on a regular basis.
The following hosts are known to be used by the default container registries:

## Offline Mode
Registry | Hosts | Additional info
--- | --- | ---
GitHub Container Registry | <ul><li>`ghcr.io`</li><li>`pkg-containers.githubusercontent.com`</li></ul> | [GitHub's IP addresses](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-githubs-ip-addresses)

By default, Trivy will attempt to download latest databases. If it fails, the scan might fail. To avoid this behavior, you can tell Trivy to not attempt to download database files:

- `--skip-db-update` to skip updating the main vulnerability database.
- `--skip-java-db-update` to skip updating the Java vulnerability database.
- `--skip-check-update` to skip updating the misconfiguration database.

```shell
trivy image --skip-db-update --skip-java-db-update --offline-scan --skip-check-update myimage
```
### Self-hosting

## Self-Hosting
You can host Trivy's databases in your own container registry. Please refer to [Self-hosting document](./self-hosting.md) for a detailed guide.

### OCI Databases

You can host the databases on your own local OCI registry.

First, make a copy of the databases in a container registry that is accessible to Trivy. The databases are in:

- `ghcr.io/aquasecurity/trivy-db:2`
- `ghcr.io/aquasecurity/trivy-java-db:1`
- `ghcr.io/aquasecurity/trivy-checks:0`

Then, tell Trivy to use the local registry:

```shell
trivy image \
--db-repository myregistry.local/trivy-db \
--java-db-repository myregistry.local/trivy-java-db \
--checks-bundle-repository myregistry.local/trivy-checks \
myimage
```
### Manual cache population

#### Authentication
You can download the databases files manually and surgically populate the Trivy cache directory with them.

If the registry requires authentication, you can configure it as described in the [private registry authentication document](../advanced/private-registries/index.md).

### VEX Hub

You can host a copy of VEX Hub on your own internal server.

First, make a copy of VEX Hub in a location that is accessible to Trivy.

1. Download the [VEX Hub](https://github.com/aquasecurity/vexhub) archive from: <https://github.com/aquasecurity/vexhub/archive/refs/heads/main.zip>.
1. Download the [VEX Hub Repository Manifest](https://github.com/aquasecurity/vex-repo-spec#2-repository-manifest) file from: <https://github.com/aquasecurity/vexhub/blob/main/vex-repository.json>.
1. Create or identify an internal HTTP server that can serve the VEX Hub repository in your environment (e.g `https://server.local`).
1. Make the downloaded archive file available for serving from your server (e.g `https://server.local/main.zip`).
1. Modify the downloaded manifest file's [Location URL](https://github.com/aquasecurity/vex-repo-spec?tab=readme-ov-file#locations-subfields) field to the URL of the archive file on your server (e.g `url: https://server.local/main.zip`).
1. Make the manifest file available for serving from your server under the `/.well-known` path (e.g `https://server.local/.well-known/vex-repository.json`).

Then, tell Trivy to use the local VEX Repository:

1. Locate your [Trivy VEX configuration file](../supply-chain/vex/repo/#configuration-file) by running `trivy vex repo init`. Make the following changes to the file.
1. Disable the default VEX Hub repo (`enabled: false`)
1. Add your internal VEX Hub repository as a [custom repository](../supply-chain/vex/repo/#custom-repositories) with the URL pointing to your local server (e.g `url: https://server.local`).

#### Authentication

If your server requires authentication, you can configure it as described in the [VEX Repository Authentication document](../supply-chain/vex/repo/#authentication).

## Manual cache population

You can also download the databases files manually and surgically populate the Trivy cache directory with them.

### Downloading the DB files
#### Downloading the DB files

On a machine with internet access, pull the database container archive from the public registry into your local workspace:

Note that these examples operate in the current working directory.

=== "Using ORAS"
itaysk marked this conversation as resolved.
Show resolved Hide resolved
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files.
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```


=== "Using Trivy"
itaysk marked this conversation as resolved.
Show resolved Hide resolved
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.

```shell
trivy image --cache-dir . --download-db-only
```
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.
```shell
trivy image --cache-dir . --download-db-only
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files, copy them over to the air-gapped environment.
itaysk marked this conversation as resolved.
Show resolved Hide resolved

### Populating the Trivy Cache
#### Populating the Trivy Cache

In order to populate the cache, you need to identify the location of the cache directory. If it is under the default location, you can run the following command to find it:

Expand All @@ -147,16 +92,53 @@ mkdir -p ${TRIVY_CACHE_DIR}/db
cp /path/to/trivy.db /path/to/metadata.json ${TRIVY_CACHE_DIR}/db/
```

### Java DB
#### Java DB adaptations

For Java DB the process is the same, except for the following:

1. Image location is `ghcr.io/aquasecurity/trivy-java-db:1`
2. Archive file name is `javadb.tar.gz`
3. DB file name is `trivy-java.db`

## Misconfigurations scanning
## Misconfiguration Checks Database
itaysk marked this conversation as resolved.
Show resolved Hide resolved

### Connectivity requirements

Trivy's misconfiguration database is packaged as an OCI image and follows the same connectivity requirements as the Vulnerability and Java databases, as can be seen [here](#vulnerability-java-databases).
itaysk marked this conversation as resolved.
Show resolved Hide resolved

### Self-hosting

You can host Trivy's databases in your own container registry. Please refer to [Self-hosting document](./self-hosting.md) for a detailed guide.

### Embedded misconfiguration database

Misconfigurations database is embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the database from the time of the Trivy release you are using.

## VEX Hub

### Connectivity Requirements

VEX Hub is fetched from VEX Hub GitHub Repository directly: <https://github.com/aquasecurity/vexhub>. Using simple HTTPS requests.
itaysk marked this conversation as resolved.
Show resolved Hide resolved

The following hosts are known to be used by GitHub's services:
itaysk marked this conversation as resolved.
Show resolved Hide resolved

- `api.github.com`
- `codeload.github.com`

For more information about GitHub connectivity (including specific IP addresses), please refer to [GitHub's connectivity troubleshooting guide](https://docs.github.com/en/get-started/using-github/troubleshooting-connectivity-problems).

### Self-hosting

You can host a copy of VEX Hub on your own internal server. Please refer to the [self-hosting document](./self-hosting.md) for a detailed guide.

## Maven Central / Remote Repositories

### Connectivity requirements

Trivy might attempt to connect to the following URLs:
itaysk marked this conversation as resolved.
Show resolved Hide resolved

- `https://repo.maven.apache.org/maven2`

Note that the misconfigurations checks bundle is also embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the Checks from the time of the Trivy release you are using.
### Offline mode

The misconfiguration scanner can be configured to load checks from a local directory, using the `--config-check` flag. In an air-gapped scenario you can copy the checks library from [Trivy checks repository](https://github.com/aquasecurity/trivy-checks) into a local directory, and load it with this flag. See more in the [Misconfiguration scanner documentation](../scanner/misconfiguration/index.md).
There's no way to leverage Maven Central in a network-restricted environment, but you can prevent Trivy from trying to connect to it by using the `--offline-scan` flag.
72 changes: 72 additions & 0 deletions docs/docs/advanced/self-hosting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Self-Hosting Trivy's Databases

When you install Trivy, the installed artifact contains the scanner engine but is lacking relevant security information needed to make security detections and recommendations. These so called "databases" are fetched and maintained by Trivy automatically as needed.

If you prefer, you can host Trivy's databases in your own infrastructure. This document explains how to do that.

!!! note
Please familiarize yourself with the [Databases document](../configuration/db.md) that explains about the different databases used by Trivy and the different configuration options that control them. This guide assumes you are already familiar with the concepts explained there.

## OCI databases

The following [Trivy Databases](../configuration/db.md) are packaged as OCI images:

- `trivy-db`
- `trivy-java-db`
- `trivy-checks`

To host these databases in your own infrastructure:

### Make a local copy

Use any container registry manipulation tool (e.g , [crane](https://github.com/google/go-containerregistry/blob/main/cmd/crane/doc/crane.md, [ORAS](https://oras.land), [regclient](https://github.com/regclient/regclient/tree/main)) to copy the images to your destination registry.

!!! note
You will need to keep the databases updated in order to maintain relevant scanning results over time.

### Configure Trivy

Use the appropriate [database location flags](../configuration/db.md#database-locations) to change the db-repository location:

- `--db-repository`
- `--java-db-repository`
- `--checks-bundle-repository`

### Authentication

If the registry requires authentication, you can configure it as described in the [private registry authentication document](../advanced/private-registries/index.md).

### OCI Media Types

When serving, proxying, or manipulating Trivy's databases, note that the media type of the OCI layer is not a standard container image type:

DB | Media Type | Reference
--- | --- | ---
`trivy-db` | `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip` | <https://github.com/aquasecurity/trivy-db/pkgs/container/trivy-db>
`trivy-java-db` | `application/vnd.aquasec.trivy.javadb.layer.v1.tar+gzip` | https://github.com/aquasecurity/trivy-java-db/pkgs/container/trivy-java-db
`trivy-chekcs` | `application/vnd.oci.image.manifest.v1+json` | https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks

## VEX Hub

### Make a local copy

To make a copy of VEX Hub in a location that is accessible to Trivy.

1. Download the [VEX Hub](https://github.com/aquasecurity/vexhub) archive from: <https://github.com/aquasecurity/vexhub/archive/refs/heads/main.zip>.
1. Download the [VEX Hub Repository Manifest](https://github.com/aquasecurity/vex-repo-spec#2-repository-manifest) file from: <https://github.com/aquasecurity/vexhub/blob/main/vex-repository.json>.
1. Create or identify an internal HTTP server that can serve the VEX Hub repository in your environment (e.g `https://server.local`).
1. Make the downloaded archive file available for serving from your server (e.g `https://server.local/main.zip`).
1. Modify the downloaded manifest file's [Location URL](https://github.com/aquasecurity/vex-repo-spec?tab=readme-ov-file#locations-subfields) field to the URL of the archive file on your server (e.g `url: https://server.local/main.zip`).
1. Make the manifest file available for serving from your server under the `/.well-known` path (e.g `https://server.local/.well-known/vex-repository.json`).

### Configure Trivy

To configure Trivy to use the local VEX Repository:

1. Locate your [Trivy VEX configuration file](../supply-chain/vex/repo/#configuration-file) by running `trivy vex repo init`. Make the following changes to the file.
1. Disable the default VEX Hub repo (`enabled: false`)
1. Add your internal VEX Hub repository as a [custom repository](../supply-chain/vex/repo/#custom-repositories) with the URL pointing to your local server (e.g `url: https://server.local`).

### Authentication

If your server requires authentication, you can configure it as described in the [VEX Repository Authentication document](../supply-chain/vex/repo/#authentication).
Loading