Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update air-gapped docs #7160

Merged
merged 10 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 90 additions & 104 deletions docs/docs/advanced/air-gap.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea about re-structuring this page, but we can do that in another PR.

Original file line number Diff line number Diff line change
@@ -1,142 +1,128 @@
# Air-Gapped Environment
# Advanced Network Scenarios

Trivy can be used in air-gapped environments. Note that an allowlist is [here][allowlist].
Trivy needs to connect to the internet occasionally, in order to download relevant content. This document explains the network connectivity requirements of Trivy and setting up Trivy in particular scenarios.

## Air-Gapped Environment for vulnerabilities
## Network requirements

### Download the vulnerability database
At first, you need to download the vulnerability database for use in air-gapped environments.
Trivy's databases are distributed as OCI images via GitHub Container registry (GHCR):

=== "Trivy"
- <https://ghcr.io/aquasecurity/trivy-db>
- <https://ghcr.io/aquasecurity/trivy-java-db>
- <https://ghcr.io/aquasecurity/trivy-checks>

```
TRIVY_TEMP_DIR=$(mktemp -d)
trivy --cache-dir $TRIVY_TEMP_DIR image --download-db-only
tar -cf ./db.tar.gz -C $TRIVY_TEMP_DIR/db metadata.json trivy.db
rm -rf $TRIVY_TEMP_DIR
```
If Trivy is running behind a firewall, you'll need to add the following urls to your allowlist:

=== "oras >= v0.13.0"
Please follow [oras installation instruction][oras].
- `ghcr.io`
- `pkg-containers.githubusercontent.com`

Download `db.tar.gz`:
The databases are pulled by Trivy using the [OCI Distribution](https://github.com/opencontainers/distribution-spec) specification, which is based on simple HTTPS protocol.

```
$ oras pull ghcr.io/aquasecurity/trivy-db:2
```
## Running Trivy in air-gapped environment

=== "oras < v0.13.0"
Please follow [oras installation instruction][oras].
An air-gapped environment refers to situations where the network connectivity from the machine Trivy runs on is blocked or restricted.

Download `db.tar.gz`:
In an air-gapped environment it is your responsibility to update the Trivy databases on a regular basis.

```
$ oras pull -a ghcr.io/aquasecurity/trivy-db:2
```
## Offline Mode

### Download the Java index database[^1]
Java users also need to download the Java index database for use in air-gapped environments.
By default, Trivy will attempt to download latest databases. If it fails, the scan might fail. To avoid this behavior, you can tell Trivy to not attempt to download database files:

!!! note
You container image may contain JAR files even though you don't use Java directly.
In that case, you also need to download the Java index database.
- `--skip-db-update` to skip updating the main vulnerability database.
- `--skip-java-db-update` to skip updating the Java vulnerability database.
- `--skip-check-update` to skip updating the misconfiguration database.

=== "Trivy"
```shell
trivy image --skip-db-update --skip-java-db-update --offline-scan --skip-check-update myimage
```

```
TRIVY_TEMP_DIR=$(mktemp -d)
trivy --cache-dir $TRIVY_TEMP_DIR image --download-java-db-only
tar -cf ./javadb.tar.gz -C $TRIVY_TEMP_DIR/java-db metadata.json trivy-java.db
rm -rf $TRIVY_TEMP_DIR
```
=== "oras >= v0.13.0"
Please follow [oras installation instruction][oras].
## Self-Hosting

Download `javadb.tar.gz`:
You can host the databases on your own local OCI registry, in order to prevent Trivy reaching out of your network.

```
$ oras pull ghcr.io/aquasecurity/trivy-java-db:1
```
First, make a copy of the databases in a container registry that is accessible to Trivy. The databases are in:

=== "oras < v0.13.0"
Please follow [oras installation instruction][oras].
- `ghcr.io/aquasecurity/trivy-db:2`
- `ghcr.io/aquasecurity/trivy-java-db:1`
- `ghcr.io/aquasecurity/trivy-checks:0`

Download `javadb.tar.gz`:
Then, tell Trivy to use the local registry:

```
$ oras pull -a ghcr.io/aquasecurity/trivy-java-db:1
```
```shell
trivy image \
--db-repository myregistry.local/trivy-db \
--java-db-repository myregistry.local/trivy-java-db \
--checks-bundle-repository myregistry.local/trivy-checks \
myimage
```

### Authentication
itaysk marked this conversation as resolved.
Show resolved Hide resolved

### Transfer the DB files into the air-gapped environment
The way of transfer depends on the environment.
If the registry requires authentication, you can configure it in as described in the [private registry authentication document](../advanced/private-registries/index.md).

=== "Vulnerability db"
```
$ rsync -av -e ssh /path/to/db.tar.gz [user]@[host]:dst
```
## Manual cache population

=== "Java index db[^1]"
```
$ rsync -av -e ssh /path/to/javadb.tar.gz [user]@[host]:dst
```
You can also download the databases files manually and surgically populate the Trivy cache directory with them.

### Put the DB files in Trivy's cache directory
You have to know where to put the DB files. The following command shows the default cache directory.
### Downloading the DB files

On a machine with internet access, pull the database container archive from the public registry into your local workspace:

Note that these examples operate in the current working directory.

=== "Using ORAS"
This example uses [ORAS](https://oras.land), but you can use any other container registry manipulation tool.

```shell
oras pull ghcr.io/aquasecurity/trivy-db:2
```
$ ssh user@host
$ trivy -h | grep cache
--cache-dir value cache directory (default: "/home/myuser/.cache/trivy") [$TRIVY_CACHE_DIR]
```
=== "Vulnerability db"
Put the DB file in the cache directory + `/db`.

```
$ mkdir -p /home/myuser/.cache/trivy/db
$ cd /home/myuser/.cache/trivy/db
$ tar xvf /path/to/db.tar.gz -C /home/myuser/.cache/trivy/db
x trivy.db
x metadata.json
$ rm /path/to/db.tar.gz
```

=== "Java index db[^1]"
Put the DB file in the cache directory + `/java-db`.

```
$ mkdir -p /home/myuser/.cache/trivy/java-db
$ cd /home/myuser/.cache/trivy/java-db
$ tar xvf /path/to/javadb.tar.gz -C /home/myuser/.cache/trivy/java-db
x trivy-java.db
x metadata.json
$ rm /path/to/javadb.tar.gz
```



In an air-gapped environment it is your responsibility to update the Trivy databases on a regular basis, so that the scanner can detect recently-identified vulnerabilities.

### Run Trivy with the specific flags.
In an air-gapped environment, you have to specify `--skip-db-update` and `--skip-java-db-update`[^1] so that Trivy doesn't attempt to download the latest database files.
In addition, if you want to scan `pom.xml` dependencies, you need to specify `--offline-scan` since Trivy tries to issue API requests for scanning Java applications by default.

You should now have a file called `db.tar.gz`. Next, extract it to reveal the db files:

```shell
tar -xzf db.tar.gz
```
$ trivy image --skip-db-update --skip-java-db-update --offline-scan alpine:3.12

You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files.

=== "Using Trivy"
This example uses Trivy to pull the database container archive. The `--cache-dir` flag makes Trivy download the database files into our current working directory. The `--download-db-only` flag tells Trivy to only download the database files, not to scan any images.

```shell
trivy image --cache-dir . --download-db-only
```

## Air-Gapped Environment for misconfigurations
You should now have 2 new files, `metadata.json` and `trivy.db`. These are the Trivy DB files, copy them over to the air-gapped environment.

### Populating the Trivy Cache

No special measures are required to detect misconfigurations in an air-gapped environment.
In order to populate the cache, you need to identify the location of the cache directory. If it is under the default location, you can run the following command to find it:

```shell
trivy -h | grep cache
```

### Run Trivy with `--skip-check-update` option
In an air-gapped environment, specify `--skip-check-update` so that Trivy doesn't attempt to download the latest misconfiguration checks.
For the example, we will assume the `TRIVY_CACHE_DIR` variable holds the cache location:

```shell
TRIVY_CACHE_DIR=/home/user/.cache/trivy
```
$ trivy conf --skip-policy-update /path/to/conf

Put the Trivy DB files in the Trivy cache directory under a `db` subdirectory:

```shell
# ensure cache db directory exists
mkdir -p ${TRIVY_CACHE_DIR}/db
# copy the db files
cp /path/to/trivy.db /path/to/metadata.json ${TRIVY_CACHE_DIR}/db/
```

[allowlist]: ../references/troubleshooting.md
[oras]: https://oras.land/docs/installation
### Java DB

For Java DB the process is the same, except for the following:
1. Image location is `ghcr.io/aquasecurity/trivy-java-db:1`
2. Archive file name is `javadb.tar.gz`
3. DB file name is `trivy-java.db`
itaysk marked this conversation as resolved.
Show resolved Hide resolved

## Misconfigurations scanning

[^1]: This is only required to scan `jar` files. More information about `Java index db` [here](../coverage/language/java.md)
Note that the misconfigurations database is also embedded in the Trivy binary (at build time), and will be used as a fallback if the external database is not available. This means that you can still scan for misconfigurations in an air-gapped environment using the Checks from the time of the Trivy release you are using.
itaysk marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 2 additions & 4 deletions docs/docs/references/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,7 @@ Trivy v0.23.0 or later requires Trivy DB v2. Please update your local database o
!!! error
FATAL failed to download vulnerability DB

If trivy is running behind corporate firewall, you have to add the following urls to your allowlist.

- ghcr.io
- pkg-containers.githubusercontent.com
If Trivy is running behind corporate firewall, refer to the necessary connectivity requirements as described [here][network].

### Denied

Expand Down Expand Up @@ -271,4 +268,5 @@ $ trivy clean --all
```

[air-gapped]: ../advanced/air-gap.md
[network]: ../advanced/air-gap.md#network-requirements
[redis-cache]: ../../vulnerability/examples/cache/#cache-backend
25 changes: 13 additions & 12 deletions docs/docs/scanner/misconfiguration/check/builtin.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
# Built-in Checks

## Check Sources
Built-in checks are mainly written in [Rego][rego] and Go.
Those checks are managed under [trivy-checks repository][trivy-checks].
## Checks Sources
Trivy has an extensive library of misconfiguration checks that is maintained at <https://github.com/aquasecurity/trivy-checks>.
Trivy checks are mainly written in [Rego][rego], while some checks are written in Go.
See [here](../../../coverage/iac/index.md) for the list of supported config types.

For suggestions or issues regarding policy content, please open an issue under the [trivy-checks][trivy-checks] repository.
## Checks Bundle
When performing a misconfiguration scan, Trivy will automatically downloads the relevant Checks bundle. The bundle is cached locally and Trivy will reuse it for subsequent scans on the same machine. Trivy takes care of updating the cache automatically so normally can be oblivious to it.
itaysk marked this conversation as resolved.
Show resolved Hide resolved

## Check Distribution
Trivy checks are distributed as an OPA bundle on [GitHub Container Registry][ghcr] (GHCR).
When misconfiguration detection is enabled, Trivy pulls the OPA bundle from GHCR as an OCI artifact and stores it in the cache.
Those checks are then loaded into Trivy OPA engine and used for detecting misconfigurations.
If Trivy is unable to pull down newer checks, it will use the embedded set of checks as a fallback. This is also the case in air-gap environments where `--skip-policy-update` might be passed.
For CLI flags related to the database, please refer to [this page](../configuration/db.md).
itaysk marked this conversation as resolved.
Show resolved Hide resolved

## Update Interval
## Checks Distribution
Trivy checks are distributed as an [OPA bundle](opa-bundle) hosted in the following GitHub Container Registry: <https://ghcr.io/aquasecurity/trivy-checks>.
Trivy checks for updates to OPA bundle on GHCR every 24 hours and pulls it if there are any updates.

### External connectivity
Trivy needs to connect to the internet to download the bundle. If you are running Trivy in an air-gapped environment, or an tightly controlled network, please refer to the [Advanced Network Scenarios document](../advanced/air-gap.md).
The Checks bundle is also embedded in the Trivy binary (at build time), and will be used as a fallback if Trivy is unable to download the bundle. This means that you can still scan for misconfigurations in an air-gapped environment using the Checks from the time of the Trivy release you are using.

[rego]: https://www.openpolicyagent.org/docs/latest/policy-language/
[trivy-checks]: https://github.com/aquasecurity/trivy-checks
[ghcr]: https://github.com/aquasecurity/trivy-checks/pkgs/container/trivy-checks
[opa-bundle]: https://www.openpolicyagent.org/docs/latest/management-bundles/
43 changes: 10 additions & 33 deletions docs/docs/scanner/vulnerability.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,45 +158,22 @@ Trivy can detect vulnerabilities in Kubernetes clusters and components by scanni

[^1]: Some manual triage and correction has been made.

## Database
Trivy downloads [the vulnerability database](https://github.com/aquasecurity/trivy-db) every 6 hours.
Trivy uses two types of databases for vulnerability detection:

- Vulnerability Database
- Java Index Database

This page provides detailed information about these databases.

### Vulnerability Database
Trivy utilizes a database containing vulnerability information.
This database is built every six hours on [GitHub](https://github.com/aquasecurity/trivy-db) and is distributed via [GitHub Container registry (GHCR)](https://ghcr.io/aquasecurity/trivy-db).
The database is cached and updated as needed.
As Trivy updates the database automatically during execution, users don't need to be concerned about it.
## Databases
Trivy utilizes several databases containing information relevant for vulnerability scanning.
When performing a vulnerability scan, Trivy will automatically downloads the relevant databases. The databases are cached locally and Trivy will reuse them for subsequent scans on the same machine. Trivy takes care of updating the databases cache automatically so normally can be oblivious to it.

For CLI flags related to the database, please refer to [this page](../configuration/db.md).

#### Private Hosting
If you host the database on your own OCI registry, you can specify a different repository with the `--db-repository` flag.
The default is `ghcr.io/aquasecurity/trivy-db`.

```shell
$ trivy image --db-repository YOUR_REPO YOUR_IMAGE
```

If authentication is required, it can be configured in the same way as for private images.
Please refer to [the documentation](../advanced/private-registries/index.md) for more details.
### Vulnerability Database
This is Trivy's main database which contains vulnerability information, as collected from the datasources mentioned above.
It is built every six hours on [GitHub](https://github.com/aquasecurity/trivy-db).

### Java Index Database
This database is only downloaded when scanning JAR files so that Trivy can identify the groupId, artifactId, and version of JAR files.
It is built once a day on [GitHub](https://github.com/aquasecurity/trivy-java-db) and distributed via [GitHub Container registry (GHCR)](https://ghcr.io/aquasecurity/trivy-java-db).
Like the vulnerability database, it is automatically downloaded and updated when needed, so users don't need to worry about it.

#### Private Hosting
If you host the database on your own OCI registry, you can specify a different repository with the `--java-db-repository` flag.
The default is `ghcr.io/aquasecurity/trivy-java-db`.
When scanning JAR files, Trivy relies on a dedicated database for identifying the groupId, artifactId, and version of the scanned JAR files. This database is only used when scanning JAR files, however your scanned artifacts might contain JAR files that you're not aware of.
This database is built once a day on [GitHub](https://github.com/aquasecurity/trivy-java-db).

If authentication is required, you need to run `docker login YOUR_REGISTRY`.
Currently, specifying a username and password is not supported.
### External connectivity
Trivy needs to connect to the internet to download the databases. If you are running Trivy in an air-gapped environment, or an tightly controlled network, please refer to the [Advanced Network Scenarios document](../advanced/air-gap.md).

## Configuration
This section describes vulnerability-specific configuration.
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ nav:
- Developer guide: docs/plugin/developer-guide.md
- Advanced:
- Modules: docs/advanced/modules.md
- Air-Gapped Environment: docs/advanced/air-gap.md
- Advanced Network Scenarios: docs/advanced/air-gap.md
- Container Image:
- Embed in Dockerfile: docs/advanced/container/embed-in-dockerfile.md
- Unpacked container image filesystem: docs/advanced/container/unpacked-filesystem.md
Expand Down