Skip to content

Commit

Permalink
docs: add v1.2.x documentation (#819)
Browse files Browse the repository at this point in the history
  • Loading branch information
sozercan authored Aug 7, 2023
1 parent 31ecfb6 commit a95732c
Show file tree
Hide file tree
Showing 17 changed files with 918 additions and 0 deletions.
21 changes: 21 additions & 0 deletions docs/versioned_docs/version-v1.2.x/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: Architecture
---
At a high level, Eraser has two main modes of operation: manual and automated.

Manual image removal involves supplying a list of images to remove; Eraser then
deploys pods to clean up the images you supplied.

Automated image removal runs on a timer. By default, the automated process
removes images based on the results of a vulnerability scan. The default
vulnerability scanner is Trivy, but others can be provided in its place. Or,
the scanner can be disabled altogether, in which case Eraser acts as a garbage
collector -- it will remove all non-running images in your cluster.

## Manual image cleanup

<img title="manual cleanup" src="/eraser/docs/img/eraser_manual.png" />

## Automated analysis, scanning, and cleanup

<img title="automated cleanup" src="/eraser/docs/img/eraser_timer.png" />
10 changes: 10 additions & 0 deletions docs/versioned_docs/version-v1.2.x/code-of-conduct.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Code of Conduct
---

This project has adopted the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md).

Resources:

- [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md)
- [Code of Conduct Reporting](https://github.com/cncf/foundation/blob/main/code-of-conduct.md)
14 changes: 14 additions & 0 deletions docs/versioned_docs/version-v1.2.x/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: Contributing
---

There are several ways to get involved with Eraser

- Join the [mailing list](https://groups.google.com/u/1/g/eraser-dev) to get notifications for releases, security announcements, etc.
- Participate in the [biweekly community meetings](https://docs.google.com/document/d/1Sj5u47K3WUGYNPmQHGFpb52auqZb1FxSlWAQnPADhWI/edit) to disucss development, issues, use cases, etc.
- Join the `#eraser` channel on the [Kubernetes Slack](https://slack.k8s.io/)
- View the [development setup instructions](https://eraser-dev.github.io/eraser/docs/development)

This project welcomes contributions and suggestions.

This project has adopted the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md).
12 changes: 12 additions & 0 deletions docs/versioned_docs/version-v1.2.x/custom-scanner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: Custom Scanner
---

## Creating a Custom Scanner
To create a custom scanner for non-compliant images, use the following [template](https://github.com/eraser-dev/eraser-scanner-template/).

In order to customize your scanner, start by creating a `NewImageProvider()`. The ImageProvider interface can be found can be found [here](../../pkg/scanners/template/scanner_template.go).

The ImageProvider will allow you to retrieve the list of all non-running and non-excluded images from the collector container through the `ReceiveImages()` function. Process these images with your customized scanner and threshold, and use `SendImages()` to pass the images found non-compliant to the eraser container for removal. Finally, complete the scanning process by calling `Finish()`.

When complete, provide your custom scanner image to Eraser in deployment.
226 changes: 226 additions & 0 deletions docs/versioned_docs/version-v1.2.x/customization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
---
title: Customization
---

## Overview

Eraser uses a configmap to configure its behavior. The configmap is part of the
deployment and it is not necessary to deploy it manually. Once deployed, the configmap
can be edited at any time:

```bash
kubectl edit configmap --namespace eraser-system eraser-manager-config
```

If an eraser job is already running, the changes will not take effect until the job completes.
The configuration is in yaml.

## Key Concepts

### Basic architecture

The _manager_ runs as a pod in your cluster and manages _ImageJobs_. Think of
an _ImageJob_ as a unit of work, performed on every node in your cluster. Each
node runs a sub-job. The goal of the _ImageJob_ is to assess the images on your
cluster's nodes, and to remove the images you don't want. There are two stages:
1. Assessment
1. Removal.


### Scheduling

An _ImageJob_ can either be created on-demand (see [Manual Removal](https://eraser-dev.github.io/eraser/docs/manual-removal)),
or they can be spawned on a timer like a cron job. On-demand jobs skip the
assessment stage and get right down to the business of removing the images you
specified. The behavior of an on-demand job is quite different from that of
timed jobs.

### Fault Tolerance

Because an _ImageJob_ runs on every node in your cluster, and the conditions on
each node may vary widely, some of the sub-jobs may fail. If you cannot
tolerate any failure, set the `manager.imageJob.successRatio` property to
`1.0`. If 75% success sounds good to you, set it to `0.75`. In that case, if
fewer than 75% of the pods spawned by the _ImageJob_ report success, the job as
a whole will be marked as a failure.

This is mainly to help diagnose error conditions. As such, you can set
`manager.imageJob.cleanup.delayOnFailure` to a long value so that logs can be
captured before the spawned pods are cleaned up.

### Excluding Nodes

For various reasons, you may want to prevent Eraser from scheduling pods on
certain nodes. To do so, the nodes can be given a special label. By default,
this label is `eraser.sh/cleanup.filter`, but you can configure the behavior with
the options under `manager.nodeFilter`. The [table](#detailed-options) provides more detail.

### Configuring Components

An _ImageJob_ is made up of various sub-jobs, with one sub-job for each node.
These sub-jobs can be broken down further into three stages.
1. Collection (What is on the node?)
1. Scanning (What images conform to the policy I've provided?)
1. Removal (Remove images based on the results of the above)

Of the above stages, only Removal is mandatory. The others can be disabled.
Furthermore, manually triggered _ImageJobs_ will skip right to removal, even if
Eraser is configured to collect and scan. Collection and Scanning will only
take place when:
1. The collector and/or scanner `components` are enabled, AND
1. The job was *not* triggered manually by creating an _ImageList_.

### Swapping out components

The collector, scanner, and remover components can all be swapped out. This
enables you to build and host the images yourself. In addition, the scanner's
behavior can be completely tailored to your needs by swapping out the default
image with one of your own. To specify the images, use the
`components.<component>.image.repo` and `components.<component>.image.tag`,
where `<component>` is one of `collector`, `scanner`, or `remover`.

## Universal Options

The following portions of the configmap apply no matter how you spawn your
_ImageJob_. The values provided below are the defaults. For more detail on
these options, see the [table](#detailed-options).

```yaml
manager:
runtime: containerd
otlpEndpoint: "" # empty string disables OpenTelemetry
logLevel: info
profile:
enabled: false
port: 6060
imageJob:
successRatio: 1.0
cleanup:
delayOnSuccess: 0s
delayOnFailure: 24h
pullSecrets: [] # image pull secrets for collector/scanner/remover
priorityClassName: "" # priority class name for collector/scanner/remover
nodeFilter:
type: exclude # must be either exclude|include
selectors:
- eraser.sh/cleanup.filter
- kubernetes.io/os=windows
components:
remover:
image:
repo: ghcr.io/eraser-dev/remover
tag: v1.0.0
request:
mem: 25Mi
cpu: 0
limit:
mem: 30Mi
cpu: 1000m
```
## Component Options
```yaml
components:
collector:
enabled: true
image:
repo: ghcr.io/eraser-dev/collector
tag: v1.0.0
request:
mem: 25Mi
cpu: 7m
limit:
mem: 500Mi
cpu: 0
scanner:
enabled: true
image:
repo: ghcr.io/eraser-dev/eraser-trivy-scanner
tag: v1.0.0
request:
mem: 500Mi
cpu: 1000m
limit:
mem: 2Gi
cpu: 0
config: |
# this is the schema for the provided 'trivy-scanner'. custom scanners
# will define their own configuration. see the below
remover:
image:
repo: ghcr.io/eraser-dev/remover
tag: v1.0.0
request:
mem: 25Mi
cpu: 0
limit:
mem: 30Mi
cpu: 1000m
```
## Scanner Options
These options can be provided to `components.scanner.config`. They will be
passed through as a string to the scanner container and parsed there. If you
want to configure your own scanner, you must provide some way to parse this.

Below are the values recognized by the provided `eraser-trivy-scanner` image.
Values provided below are the defaults.

```yaml
cacheDir: /var/lib/trivy # The file path inside the container to store the cache
dbRepo: ghcr.io/aquasecurity/trivy-db # The container registry from which to fetch the trivy database
deleteFailedImages: true # if true, remove images for which scanning fails, regardless of why it failed
deleteEOLImages: true # if true, remove images that have reached their end-of-life date
vulnerabilities:
ignoreUnfixed: true # consider the image compliant if there are no known fixes for the vulnerabilities found.
types: # a list of vulnerability types. for more info, see trivy's documentation.
- os
- library
securityChecks: # see trivy's documentation for more invormation
- vuln
severities: # in this case, only flag images with CRITICAL vulnerability for removal
- CRITICAL
timeout:
total: 23h # if scanning isn't completed before this much time elapses, abort the whole scan
perImage: 1h # if scanning a single image exceeds this time, scanning will be aborted
```

## Detailed Options

| Option | Description | Default |
| --- | --- | --- |
| manager.runtime | The runtime to use for the manager's containers. Must be one of containerd, crio, or dockershim. It is assumed that your nodes are all using the same runtime, and there is currently no way to configure multiple runtimes. | containerd |
| manager.otlpEndpoint | The endpoint to send OpenTelemetry data to. If empty, data will not be sent. | "" |
| manager.logLevel | The log level for the manager's containers. Must be one of debug, info, warn, error, dpanic, panic, or fatal. | info |
| manager.scheduling.repeatInterval | Use only when collector ando/or scanner are enabled. This is like a cron job, and will spawn an _ImageJob_ at the interval provided. | 24h |
| manager.scheduling.beginImmediately | If set to true, the fist _ImageJob_ will run immediately. If false, the job will not be spawned until after the interval (above) has elapsed. | true |
| manager.profile.enabled | Whether to enable profiling for the manager's containers. This is for debugging with `go tool pprof`. | false |
| manager.profile.port | The port on which to expose the profiling endpoint. | 6060 |
| manager.imageJob.successRatio | The ratio of successful image jobs required before a cleanup is performed. | 1.0 |
| manager.imageJob.cleanup.delayOnSuccess | The amount of time to wait after a successful image job before performing cleanup. | 0s |
| manager.imageJob.cleanup.delayOnFailure | The amount of time to wait after a failed image job before performing cleanup. | 24h |
| manager.pullSecrets | The image pull secrets to use for collector, scanner, and remover containers. | [] |
| manager.priorityClassName | The priority class to use for collector, scanner, and remover containers. | "" |
| manager.nodeFilter.type | The type of node filter to use. Must be either "exclude" or "include". | exclude |
| manager.nodeFilter.selectors | A list of selectors used to filter nodes. | [] |
| components.collector.enabled | Whether to enable the collector component. | true |
| components.collector.image.repo | The repository containing the collector image. | ghcr.io/eraser-dev/collector |
| components.collector.image.tag | The tag of the collector image. | v1.0.0 |
| components.collector.request.mem | The amount of memory to request for the collector container. | 25Mi |
| components.collector.request.cpu | The amount of CPU to request for the collector container. | 7m |
| components.collector.limit.mem | The maximum amount of memory the collector container is allowed to use. | 500Mi |
| components.collector.limit.cpu | The maximum amount of CPU the collector container is allowed to use. | 0 |
| components.scanner.enabled | Whether to enable the scanner component. | true |
| components.scanner.image.repo | The repository containing the scanner image. | ghcr.io/eraser-dev/eraser-trivy-scanner |
| components.scanner.image.tag | The tag of the scanner image. | v1.0.0 |
| components.scanner.request.mem | The amount of memory to request for the scanner container. | 500Mi |
| components.scanner.request.cpu | The amount of CPU to request for the scanner container. | 1000m |
| components.scanner.limit.mem | The maximum amount of memory the scanner container is allowed to use. | 2Gi |
| components.scanner.limit.cpu | The maximum amount of CPU the scanner container is allowed to use. | 0 |
| components.scanner.config | The configuration to pass to the scanner container, as a YAML string. | See YAML below |
| components.remover.image.repo | The repository containing the remover image. | ghcr.io/eraser-dev/remover |
| components.remover.image.tag | The tag of the remover image. | v1.0.0 |
| components.remover.request.mem | The amount of memory to request for the remover container. | 25Mi |
| components.remover.request.cpu | The amount of CPU to request for the remover container. | 0 |
25 changes: 25 additions & 0 deletions docs/versioned_docs/version-v1.2.x/exclusion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: Exclusion
---

## Excluding registries, repositories, and images
Eraser can exclude registries (example, `docker.io/library/*`) and also specific images with a tag (example, `docker.io/library/ubuntu:18.04`) or digest (example, `sha256:80f31da1ac7b312ba29d65080fd...`) from its removal process.

To exclude any images or registries from the removal, create configmap(s) with the label `eraser.sh/exclude.list=true` in the eraser-system namespace with a JSON file holding the excluded images.

```bash
$ cat > sample.json <<"EOF"
{
"excluded": [
"docker.io/library/*",
"ghcr.io/eraser-dev/test:latest"
]
}
EOF

$ kubectl create configmap excluded --from-file=sample.json --namespace=eraser-system
$ kubectl label configmap excluded eraser.sh/exclude.list=true -n eraser-system
```

## Exempting Nodes from the Eraser Pipeline
Exempting nodes from cleanup was added in v1.0.0. When deploying Eraser, you can specify whether there is a list of nodes you would like to `include` or `exclude` from the cleanup process using the configmap. For more information, see the section on [customization](https://eraser-dev.github.io/eraser/docs/customization).
12 changes: 12 additions & 0 deletions docs/versioned_docs/version-v1.2.x/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: FAQ
---
## Why am I still seeing vulnerable images?
Eraser currently targets **non-running** images, so any vulnerable images that are currently running will not be removed. In addition, the default vulnerability scanning with Trivy removes images with `CRITICAL` vulnerabilities. Any images with lower vulnerabilities will not be removed. This can be configured using the [configmap](https://eraser-dev.github.io/eraser/docs/customization#scanner-options).

## How is Eraser different from Kubernetes garbage collection?
The native garbage collection in Kubernetes works a bit differently than Eraser. By default, garbage collection begins when disk usage reaches 85%, and stops when it gets down to 80%. More details about Kubernetes garbage collection can be found in the [Kubernetes documentation](https://kubernetes.io/docs/concepts/architecture/garbage-collection/), and configuration options can be found in the [Kubelet documentation](https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/).

There are a couple core benefits to using Eraser for image cleanup:
* Eraser can be configured to use image vulnerability data when making determinations on image removal
* By interfacing directly with the container runtime, Eraser can clean up images that are not managed by Kubelet and Kubernetes
15 changes: 15 additions & 0 deletions docs/versioned_docs/version-v1.2.x/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: Installation
---

## Manifest

To install Eraser with the manifest file, run the following command:

```bash
kubectl apply -f https://raw.githubusercontent.com/eraser-dev/eraser/v1.2.0/deploy/eraser.yaml
```

## Helm

If you'd like to install and manage Eraser with Helm, follow the install instructions [here](https://github.com/eraser-dev/eraser/blob/main/charts/eraser/README.md)
10 changes: 10 additions & 0 deletions docs/versioned_docs/version-v1.2.x/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Introduction
slug: /
---

# Introduction

When deploying to Kubernetes, it's common for pipelines to build and push images to a cluster, but it's much less common for these images to be cleaned up. This can lead to accumulating bloat on the disk, and a host of non-compliant images lingering on the nodes.

The current garbage collection process deletes images based on a percentage of load, but this process does not consider the vulnerability state of the images. **Eraser** aims to provide a simple way to determine the state of an image, and delete it if it meets the specified criteria.
Loading

0 comments on commit a95732c

Please sign in to comment.