Skip to content

Commit

Permalink
removed custom-log locations and provided alternative approach by sym…
Browse files Browse the repository at this point in the history
…linking log directory
  • Loading branch information
copejon committed Feb 28, 2024
1 parent 2e652b5 commit 9c836b9
Showing 1 changed file with 17 additions and 36 deletions.
53 changes: 17 additions & 36 deletions enhancements/microshift/audit-log-configuration-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ api-approvers:

## Summary

Add ability for MicroShift users to configure API server audit logging policies, storage location, log rotation and retention, and actions when disk capacity reached.
Add ability for MicroShift users to configure API server audit logging policies, log rotation and retention.

## Motivation

Expand All @@ -31,34 +31,28 @@ MicroShift currently uses a hardcoded audit logging policy. It should be configu

* As a MicroShift administrator, I want to configure audit logging policies so that I can control what events are logged.

* As a MicroShift administrator, I want to specify a custom log file location so that I can write audit logs to a dedicated volume or device.

* As a MicroShift administrator, I want to configure the max file size and retention policy for audit logs so that I can better manage their disk usage.

* As a MicroShift administrator managing devices with limited storage capacity, I want to configure fall-back behaviors for logs should the device reach a storage limit I specify or the disk reach capacity.

### Goals

- Enable MicroShift administrators to manage logging policies as a tiered set of profiles.
- Provide flexibility to specify storage location, log sizes and retention policies.
- Enable MicroShift administrators to manage logging policies as a set of hierarchical profiles.
- Provide flexibility to specify log file sizes and retention policies.

### Non-Goals

- Custom Rules: MicroShift is a single-user system without user groups, so custom audit log rules similar to OpenShift are not required and should be explicitly marked as out of scope.
- Custom Rules: MicroShift is a single-user system without user groups, so custom audit log rules similar to OpenShift are not
required and should be explicitly marked as out of scope.
- Support OVN-K audit log configuration. OVN-K audit log policies are managed via a configMap and do not have the same flexibility a OpenShift and Kubernetes API servers.

## Proposal

This proposes exposing kube-apiserver audit-log settings and adding logic to give users greater control over MicroShift audit logs. The kube-apiserver settings will enable user control over log file rotation and retention. However, there no support for specifying a total storage capacity, nor what should be done if a capacity is reached. This is a critical feature for far edge devices with limited storage capacities. On such devices, logging data accumulation risks starving the host system or cluster workloads, potentially bricking the device until human intervention can be applied. Thus, it is necessary to provide users a means of setting such a capacity, and defining actions that MicroShift will take when capacity is reached. These behaviors will be exposed as "profiles,"
with predefined behaviours and will be mutually exclusive. This should give users holistic control of their audit log rotation, retention, and max allowable storage allocation.
This proposes exposing a subset of kube-apiserver audit-log settings flags. The kube-apiserver settings will enable user control over log file rotation and retention. Users may set fields in combination to define a maximum storage limit (e.g. max num files * max single file size = total storage limit). This is a critical feature for far edge devices with limited storage capacities. On such devices, logging data accumulation risks starving the host system or cluster workloads, potentially bricking the device until human intervention can be applied. Thus, it is necessary to provide users a means of enforcing such a limit and what actions to take at that limit. Users must also be able to select which events are logged. This will be exposed as a set of "profiles," with predefined behaviours and will be mutually exclusive. This should give users holistic control of their audit log rotation, retention, and max allowable storage allocation.

### Workflow Description

1. Administrator edits MicroShift config file to specify desired audit logging policy profile
2. Administrator edits MicroShift config file to specify audit log location if required
3. Administrator edits MicroShift config file to specify max file size and retention policy for logs
4. Administrator selects action to take when storage limit or capacity reached
5. MicroShift service is restarted to apply changes
2. Administrator edits MicroShift config file to specify max file size, number of files total, and max age of files for logs
3. MicroShift service is restarted to apply changes

### API Extensions

Expand All @@ -79,16 +73,6 @@ apiServer:
policy: STRING
```

**Audit Log File Storage Location:**

- `path` field to specify non-default audit log file paths. Accepted values must be absolute paths.

```yaml
apiServer:
auditLog:
path: STRING
```

**Audit Log File Rotation:**

- `maxFileSize` config variable to set maximum audit log file size. Accepted values must be a string with a number and unit suffix with no separator character, e.g. `10GB`. Valid units are "KB", "MB", "GB", "TB", "KiB", "MiB", "GiB", and "TiB". Since the kube-apiserver only accepts max file size defined in bytes, MicroShift will need to translate this field value internally.
Expand All @@ -103,11 +87,6 @@ apiServer:
maxFileAge: STRING
```

**Logging:**

- Log audit log policy additions, removals, changes
- Log audit log rotations and locations

### Implementation Details

**Passing User Provided Options to the API Server**
Expand All @@ -121,12 +100,6 @@ The kube-apiserver provides three CLI flags to dictate log rotation and retentio
* `--audit-log-maxbackup` defines the maximum number of audit log files to retain. MicroShift defaults to 10 files.
* `--audit-log-maxsize` defines the maximum size in megabytes of the audit log file before it gets rotated

**Non-standard Logging Path**

The kube-apiserver provides a CLI flag to specify an alternative audit log file path. This field may be exposed to the user via the MicroShift config API. This field is hardcoded in MicroShift and will need to be updated to allow dynamic values. This flag is:

* `--audit-log-path` defines the path to the audit log file. MicroShift defaults to `/var/log/microshift/audit.log`

**Policy Profiles**

OpenShift's policy profiles are defined as part of the `openshift-cluster-config-operator` API and are not recognized by the kube-apiserver. The kube-apiserver provides less defined profiles, called "levels":
Expand Down Expand Up @@ -185,6 +158,8 @@ Note that "omitStages" is determines the point at which a message is logged. For
- Exceeding disk capacity: Microshift targets small form-factor devices with limit on-board storage. If the product of `maxFileSize` and `maxFiles` equals a size larger than the available storage, the apiserver risks destabilizing the system by maxing out the consumed storage. This can be mitigated via documentation which recommends the user understand their storage limitations when setting these values. Logging a warning during startup when the apiserver values exceed the available storage at the logging path. It is impossible to predict which component of MicroShift or the host system will fail when the disk is out of space, thus logging a calculated overage can provide an important clue.
- Lost Log Data: The apiserver culls log files given a certain size or age. If users do not take care to back up logs at a rate greater than the rate at which the kube-apiserver culls the files, they risk losing log data. This can be mitigated with examples of log-forwarding provided in documentation, example manifests, or both. Alternatively, it may be necessary to consider deploying the openshift-cluster-logging-operator to provide supportable log forwarding features.

- Exposing the apiserver's audit-log-path, which allows users to set a custom log location, would hinder sos report gathering. Instead, users should replace `/var/log/kube-apiserver` with a symlink the desired path. Sos will follow the symlink and collect logs by default.

### Drawbacks

- N/A
Expand All @@ -194,8 +169,13 @@ Note that "omitStages" is determines the point at which a message is logged. For
### Open Questions

- Custom log paths: Can sos collect reports from dynamic paths? If not, can we symlink the log directory to as a means of faking a static path while writing logs to a custom location? If so, can sos follow these symlinks?

- SOS can traverse symlinks to collect log files from a target directory. The link itself and the target path are included in the tarball, meaning support can easily locate audit logs via the default location (`/var/log/kube-apiserver`).

- Should Microshift support a `minFreeStorage` field that would be used to determine whether an acceptable amount of space exists at the log file path? This is not a value recognized by the apiserver, but it could be used by Microshift to check if the system is in an acceptable state to boot into. If the free space is lower than the value, MicroShift would not start and log a fatal error to document the reason. If unset, MicroShift should deploy regardless of the available storage.

- Providing additional fields and internal handling logic is outside the scope of this EP. If this is a useful feature, it should be documented in a bespoke EP.

### Test Plan

* Unit tests to validate new config APIs
Expand Down Expand Up @@ -236,7 +216,7 @@ Note that "omitStages" is determines the point at which a message is logged. For

#### Failure Modes

- None. Simply calculating `maxFiles` * `maxFileSize` - Total Log Size on Disk >= Free Space is not a sufficient safeguard against "disk out of space" errors because it does not consider other processes that may be writing data to the same device. Thus, it would _only_ determine if it was safe to _start_ MicroShift and not whether MicroShift could run safely for any amount of time after starting.
- **Disk capacity exceeded**: MicroShift and the apiserver do not assess existing storage capacity for any logs, including audit logs. As audit logs can grow quite quickly, this creates the potential for maxing out a storage device and hindering system performance. Users must therefore consider their total storage needs, in addition to how and how often to transfer logs off-device.

#### Support Procedures

Expand All @@ -254,6 +234,7 @@ Note that "omitStages" is determines the point at which a message is logged. For

* Continue using hardcoded audit logging policy
* Using the `logrotate` system utility to manage logs. The apiserver provides the same basic rotation and retention functionality as logrotate. Thus using the system tool would duplicate existing logic and increase technical debt.
* Custom log paths would be passed to the apiserver. The MicroShift sos plugin would have to be made capable of finding logs at a user-defined path, which would create an undesirable coupling between the support tool and the MicroShift config (in which users would specify the log path).

## Infrastructure Needed

Expand Down

0 comments on commit 9c836b9

Please sign in to comment.