Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve databases documentation #7732

Merged
merged 44 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4f50099
docs: reorg db and airgapped docs
itaysk Oct 29, 2024
05e5951
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
78f9645
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
9c7ba2e
Update docs/docs/advanced/air-gap.md
itaysk Oct 30, 2024
9bf0c0d
Update docs/docs/advanced/air-gap.md
itaysk Oct 30, 2024
59471ad
Update docs/docs/advanced/air-gap.md
itaysk Oct 30, 2024
3d72942
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
503ad2a
Update docs/docs/configuration/db.md
itaysk Oct 30, 2024
d11129a
add checks manifest
itaysk Oct 30, 2024
8c38626
fix checks db name
itaysk Oct 30, 2024
9248f14
remove pull through cache section
itaysk Oct 30, 2024
3217ad2
fix image addresses
itaysk Oct 31, 2024
e8ea4f8
clarify fallback condition
itaysk Oct 31, 2024
5d5bb52
docs: fix broken tabs
knqyf263 Oct 31, 2024
09aae4b
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
1c4779f
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
9273551
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
0b9c6cd
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
a666194
Update docs/docs/configuration/db.md
itaysk Oct 31, 2024
8ad1e8b
small typo
itaysk Oct 31, 2024
3b4755a
move media types
itaysk Oct 31, 2024
9ce7149
separate self-hosting
itaysk Nov 5, 2024
09a0833
Update docs/docs/advanced/self-hosting.md
itaysk Nov 7, 2024
11e81ec
add crane
itaysk Nov 7, 2024
36256ad
update navigation
itaysk Nov 7, 2024
f4de57f
rename airgap doc back
itaysk Nov 7, 2024
acb7a3f
fix note
itaysk Nov 7, 2024
661c42f
fix title
itaysk Nov 7, 2024
afb7ca7
fix title
itaysk Nov 7, 2024
b649120
Update docs/docs/advanced/air-gap.md
itaysk Nov 7, 2024
3e59b11
don't duplicate db locations
itaysk Nov 8, 2024
9127eb6
add gcr
itaysk Nov 19, 2024
0cea3a5
update connectivity doc
itaysk Nov 19, 2024
1c67568
Merge branch 'main' into fallback
itaysk Nov 20, 2024
5cb9edb
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
8ba8e8d
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
04d769e
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
e23ddcf
Update docs/docs/configuration/db.md
itaysk Nov 20, 2024
9334707
Update docs/docs/configuration/db.md
itaysk Nov 24, 2024
4659664
clarify only update
itaysk Nov 24, 2024
efea814
note about only checks db
itaysk Nov 24, 2024
dba78b2
fix registry links
itaysk Nov 25, 2024
50faccb
improve self-host intro
itaysk Nov 25, 2024
b00e823
rename checks db to bundle
itaysk Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 45 additions & 130 deletions docs/docs/advanced/air-gap.md
Original file line number Diff line number Diff line change
@@ -1,162 +1,77 @@
# Advanced Network Scenarios
# Connectivity and Network considerations

Trivy needs to connect to the internet occasionally in order to download relevant content. This document explains the network connectivity requirements of Trivy and setting up Trivy in particular scenarios.
Trivy requires internet connectivity in order to function normally. If your organizations blocks or restricts network traffic, that could prevent Trivy from working correctly.
This document explains Trivy's network connectivity requirements, and how to configure Trivy to work in restricted networks environments, including completely air-gapped environments.

## Network requirements
The following table lists all external resources that are required by Trivy:

Trivy's databases are distributed as OCI images via GitHub Container registry (GHCR):
External Resource | Feature | Details
--- | --- | ---
Vulnerability Database | Vulnerability scanning | [Trivy DB](../scanner/vulnerability.md)
Java Vulnerability Database | Java vulnerability scanning | [Trivy Java DB](../coverage/language/java.md)
Checks Bundle | Misconfigurations scanning | [Trivy Checks](../scanner/misconfiguration/check/builtin.md)
VEX Hub | VEX Hub | [VEX Hub](../supply-chain/vex/repo/#vex-hub)
Maven Central / Remote Repositories | Java vulnerability scanning | [Java Scanner/Remote Repositories](../coverage/language/java.md#remote-repositories)

- <https://ghcr.io/aquasecurity/trivy-db>
- <https://ghcr.io/aquasecurity/trivy-java-db>
- <https://ghcr.io/aquasecurity/trivy-checks>
!!! note
Trivy is an open source project that relies on public free infrastructure. In case of extreme load, you may encounter rate limiting when Trivy attempts to connect to external resources.
itaysk marked this conversation as resolved.
Show resolved Hide resolved

The following hosts are required in order to fetch them:
The rest of this document details each resource's connectivity requirements and network related considerations.

- `ghcr.io`
- `pkg-containers.githubusercontent.com`
## OCI Databases

The databases are pulled by Trivy using the [OCI Distribution](https://github.com/opencontainers/distribution-spec) specification, which is a simple HTTPS-based protocol.
Trivy's Vulnerability, Java, and Checks Bundle are packaged as OCI images and stored in public container registries.

[VEX Hub](https://github.com/aquasecurity/vexhub) is distributed from GitHub over HTTPS.
The following hosts are required in order to fetch it:
### Connectivity requirements

- `api.github.com`
- `codeload.github.com`

## Running Trivy in air-gapped environment

An air-gapped environment refers to situations where the network connectivity from the machine Trivy runs on is blocked or restricted.

In an air-gapped environment it is your responsibility to update the Trivy databases on a regular basis.

## Offline Mode

By default, Trivy will attempt to download latest databases. If it fails, the scan might fail. To avoid this behavior, you can tell Trivy to not attempt to download database files:

- `--skip-db-update` to skip updating the main vulnerability database.
- `--skip-java-db-update` to skip updating the Java vulnerability database.
- `--skip-check-update` to skip updating the misconfiguration database.

```shell
trivy image --skip-db-update --skip-java-db-update --offline-scan --skip-check-update myimage
```

## Self-Hosting

### OCI Databases

You can host the databases on your own local OCI registry.

First, make a copy of the databases in a container registry that is accessible to Trivy. The databases are in:
The specific registries and locations are detailed in the [databases document](../configuration/db.md).

- `ghcr.io/aquasecurity/trivy-db:2`
- `ghcr.io/aquasecurity/trivy-java-db:1`
- `ghcr.io/aquasecurity/trivy-checks:0`
Communication with OCI Registries follows the [OCI Distribution](https://github.com/opencontainers/distribution-spec) spec.

Then, tell Trivy to use the local registry:
The following hosts are known to be used by the default container registries:

```shell
trivy image \
--db-repository myregistry.local/trivy-db \
--java-db-repository myregistry.local/trivy-java-db \
--checks-bundle-repository myregistry.local/trivy-checks \
myimage
```
Registry | Hosts | Additional info
--- | --- | ---
Google Artifact Registry | <ul><li>`mirror.gcr.io`</li><li>`googlecode.l.googleusercontent.com`</li></ul> | [Google's IP addresses](https://support.google.com/a/answer/10026322?hl=en)
GitHub Container Registry | <ul><li>`ghcr.io`</li><li>`pkg-containers.githubusercontent.com`</li></ul> | [GitHub's IP addresses](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-githubs-ip-addresses)

#### Authentication
### Self-hosting

If the registry requires authentication, you can configure it as described in the [private registry authentication document](../advanced/private-registries/index.md).
You can host Trivy's databases in your own container registry. Please refer to [Self-hosting document](./self-hosting.md#oci-databases) for a detailed guide.

### VEX Hub
## Embedded Checks

You can host a copy of VEX Hub on your own internal server.
Checks Bundle is embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the database from the time of the Trivy release you are using.

First, make a copy of VEX Hub in a location that is accessible to Trivy.
## VEX Hub

1. Download the [VEX Hub](https://github.com/aquasecurity/vexhub) archive from: <https://github.com/aquasecurity/vexhub/archive/refs/heads/main.zip>.
1. Download the [VEX Hub Repository Manifest](https://github.com/aquasecurity/vex-repo-spec#2-repository-manifest) file from: <https://github.com/aquasecurity/vexhub/blob/main/vex-repository.json>.
1. Create or identify an internal HTTP server that can serve the VEX Hub repository in your environment (e.g `https://server.local`).
1. Make the downloaded archive file available for serving from your server (e.g `https://server.local/main.zip`).
1. Modify the downloaded manifest file's [Location URL](https://github.com/aquasecurity/vex-repo-spec?tab=readme-ov-file#locations-subfields) field to the URL of the archive file on your server (e.g `url: https://server.local/main.zip`).
1. Make the manifest file available for serving from your server under the `/.well-known` path (e.g `https://server.local/.well-known/vex-repository.json`).
### Connectivity Requirements

Then, tell Trivy to use the local VEX Repository:
VEX Hub is hosted as at <https://github.com/aquasecurity/vexhub>.

1. Locate your [Trivy VEX configuration file](../supply-chain/vex/repo/#configuration-file) by running `trivy vex repo init`. Make the following changes to the file.
1. Disable the default VEX Hub repo (`enabled: false`)
1. Add your internal VEX Hub repository as a [custom repository](../supply-chain/vex/repo/#custom-repositories) with the URL pointing to your local server (e.g `url: https://server.local`).
Trivy is fetching VEX Hub GitHub Repository directly using simple HTTPS requests.

#### Authentication
The following hosts are known to be used by GitHub's services:
itaysk marked this conversation as resolved.
Show resolved Hide resolved

If your server requires authentication, you can configure it as described in the [VEX Repository Authentication document](../supply-chain/vex/repo/#authentication).

## Manual cache population

You can also download the databases files manually and surgically populate the Trivy cache directory with them.

### Downloading the DB files

On a machine with internet access, pull the database container archive from the public registry into your local workspace:

Note that these examples operate in the current working directory.

=== "Using ORAS"
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files.

=== "Using Trivy"
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.

```shell
trivy image --cache-dir . --download-db-only
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files, copy them over to the air-gapped environment.

### Populating the Trivy Cache

In order to populate the cache, you need to identify the location of the cache directory. If it is under the default location, you can run the following command to find it:

```shell
trivy -h | grep cache
```
- `api.github.com`
- `codeload.github.com`

For the example, we will assume the `TRIVY_CACHE_DIR` variable holds the cache location:
For more information about GitHub connectivity (including specific IP addresses), please refer to [GitHub's connectivity troubleshooting guide](https://docs.github.com/en/get-started/using-github/troubleshooting-connectivity-problems).

```shell
TRIVY_CACHE_DIR=/home/user/.cache/trivy
```
### Self-hosting

Put the Trivy DB files in the Trivy cache directory under a `db` subdirectory:
You can host a copy of VEX Hub on your own internal server. Please refer to the [self-hosting document](./self-hosting.md#vex-hub) for a detailed guide.

```shell
# ensure cache db directory exists
mkdir -p ${TRIVY_CACHE_DIR}/db
# copy the db files
cp /path/to/trivy.db /path/to/metadata.json ${TRIVY_CACHE_DIR}/db/
```
## Maven Central / Remote Repositories

### Java DB
Trivy might call out to Maven central or other remote repositories to fetch in order to correctly identify Java packages during a vulnerability scan.

For Java DB the process is the same, except for the following:
### Connectivity requirements

1. Image location is `ghcr.io/aquasecurity/trivy-java-db:1`
2. Archive file name is `javadb.tar.gz`
3. DB file name is `trivy-java.db`
Trivy might attempt to connect (over HTTPS) to the following URLs:

## Misconfigurations scanning
- `https://repo.maven.apache.org/maven2`

Note that the misconfigurations checks bundle is also embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the Checks from the time of the Trivy release you are using.
### Offline mode

The misconfiguration scanner can be configured to load checks from a local directory, using the `--config-check` flag. In an air-gapped scenario you can copy the checks library from [Trivy checks repository](https://github.com/aquasecurity/trivy-checks) into a local directory, and load it with this flag. See more in the [Misconfiguration scanner documentation](../scanner/misconfiguration/index.md).
There's no way to leverage Maven Central in a network-restricted environment, but you can prevent Trivy from trying to connect to it by using the `--offline-scan` flag.
132 changes: 132 additions & 0 deletions docs/docs/advanced/self-hosting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Self-Hosting Trivy's Databases

This document explains how to host Trivy's [external dependencies](./air-gap.md) in your own infrastructure to prevent external network access. If you haven't already, please familiarize yourself with the [Databases document](../configuration/db.md) that explains about the different databases used by Trivy and the different configuration options that control them. This guide assumes you are already familiar with the concepts explained there.

## OCI databases

The following [Trivy Databases](../configuration/db.md) are packaged as OCI images:

- `trivy-db`
- `trivy-java-db`
- `trivy-checks`

To host these databases in your own infrastructure:

### Make a local copy

Use any container registry manipulation tool (e.g , [crane](https://github.com/google/go-containerregistry/blob/main/cmd/crane/doc/crane.md, [ORAS](https://oras.land), [regclient](https://github.com/regclient/regclient/tree/main)) to copy the images to your destination registry.

!!! note
You will need to keep the databases updated in order to maintain relevant scanning results over time.

### Configure Trivy

Use the appropriate [database location flags](../configuration/db.md#database-locations) to change the db-repository location:

- `--db-repository`
- `--java-db-repository`
- `--checks-bundle-repository`

### Authentication

If the registry requires authentication, you can configure it as described in the [private registry authentication document](../advanced/private-registries/index.md).

### OCI Media Types

When serving, proxying, or manipulating Trivy's databases, note that the media type of the OCI layer is not a standard container image type:

DB | Media Type | Reference
--- | --- | ---
`trivy-db` | `application/vnd.aquasec.trivy.db.layer.v1.tar+gzip` | <https://github.com/aquasecurity/trivy-db/pkgs/container/trivy-db>
`trivy-java-db` | `application/vnd.aquasec.trivy.javadb.layer.v1.tar+gzip` | https://github.com/aquasecurity/trivy-java-db/pkgs/container/trivy-java-db
`trivy-checks` | `application/vnd.oci.image.manifest.v1+json` | https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks

## Manual cache population
knqyf263 marked this conversation as resolved.
Show resolved Hide resolved

Trivy uses a local cache directory to store the database files, as described in the [cache](../configuration/cache.md) document.
You can download the databases files and surgically populate the Trivy cache directory with them.

### Downloading the DB files

On a machine with internet access, pull the database container archive from the public registry into your local workspace:

Note that these examples operate in the current working directory.

=== "Using ORAS"
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```


=== "Using Trivy"
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.

```shell
trivy image --cache-dir . --download-db-only
```

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files, copy them over to the air-gapped environment.

### Populating the Trivy Cache

In order to populate the cache, you need to identify the location of the cache directory. If it is under the default location, you can run the following command to find it:

```shell
trivy -h | grep cache
```

For the example, we will assume the `TRIVY_CACHE_DIR` variable holds the cache location:

```shell
TRIVY_CACHE_DIR=/home/user/.cache/trivy
```

Put the Trivy DB files in the Trivy cache directory under a `db` subdirectory:

```shell
# ensure cache db directory exists
mkdir -p ${TRIVY_CACHE_DIR}/db
# copy the db files
cp /path/to/trivy.db /path/to/metadata.json ${TRIVY_CACHE_DIR}/db/
```

### Java DB adaptations

For Java DB the process is the same, except for the following:

1. Image location is `ghcr.io/aquasecurity/trivy-java-db:1`
2. Archive file name is `javadb.tar.gz`
3. DB file name is `trivy-java.db`

## VEX Hub

### Make a local copy

To make a copy of VEX Hub in a location that is accessible to Trivy.

1. Download the [VEX Hub](https://github.com/aquasecurity/vexhub) archive from: <https://github.com/aquasecurity/vexhub/archive/refs/heads/main.zip>.
1. Download the [VEX Hub Repository Manifest](https://github.com/aquasecurity/vex-repo-spec#2-repository-manifest) file from: <https://github.com/aquasecurity/vexhub/blob/main/vex-repository.json>.
1. Create or identify an internal HTTP server that can serve the VEX Hub repository in your environment (e.g `https://server.local`).
1. Make the downloaded archive file available for serving from your server (e.g `https://server.local/main.zip`).
1. Modify the downloaded manifest file's [Location URL](https://github.com/aquasecurity/vex-repo-spec?tab=readme-ov-file#locations-subfields) field to the URL of the archive file on your server (e.g `url: https://server.local/main.zip`).
1. Make the manifest file available for serving from your server under the `/.well-known` path (e.g `https://server.local/.well-known/vex-repository.json`).

### Configure Trivy

To configure Trivy to use the local VEX Repository:

1. Locate your [Trivy VEX configuration file](../supply-chain/vex/repo/#configuration-file) by running `trivy vex repo init`. Make the following changes to the file.
1. Disable the default VEX Hub repo (`enabled: false`)
1. Add your internal VEX Hub repository as a [custom repository](../supply-chain/vex/repo/#custom-repositories) with the URL pointing to your local server (e.g `url: https://server.local`).

### Authentication

If your server requires authentication, you can configure it as described in the [VEX Repository Authentication document](../supply-chain/vex/repo/#authentication).
Loading