Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ✏️ update kibana to opensearch refs #6460

Merged
merged 1 commit into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions runbooks/source/grafana-dashboards.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Grafana Dashboards
weight: 9106
last_reviewed_on: 2024-10-09
last_reviewed_on: 2024-11-15
review_in: 3 months
---

Expand Down Expand Up @@ -36,7 +36,7 @@ kubectl describe node <node_name>

### Fixing "failed to load dashboard" errors

The kibana alert has reported an error similar to:
The OpenSearch alert has reported an error similar to:

> Grafana failed to load one or more dashboards - This could prevent new dashboards from being created ⚠️

Expand Down Expand Up @@ -68,7 +68,7 @@ Contact the user in the given slack-channel and ask them to fix it. Provide the

### Fixing "duplicate dashboard uid" errors

The kibana alert has reported an error similar to:
The OpenSearch alert has reported an error similar to:

> Duplicate Grafana dashboard UIDs found

Expand Down
39 changes: 0 additions & 39 deletions runbooks/source/kibana-podsecurity-violations-alert.html.md.erb

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: OpenSearch PodSecurity Violations Alert
weight: 191
last_reviewed_on: 2024-11-15
review_in: 3 months
---

# OpenSearch PodSecurity Violations Alert
This runbook will document the OpenSearch PodSecurity (PSA) violations monitor and how to debug the offending namespace and resources.

## OpenSearch Alert/Monitor

[This OpenSearch monitor](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/alerting#/monitors/t4z3XI8BxtKHqtnhcXO2) has been created that will alert if any PSA violations are detected.

You can see when previous alerts have been triggered under the `Alerts` section on the monitor.

## Checking logs for PSA violations in OpenSearch

To diagnose which namespace(s) are violating and to see the reason in the logs, either go to the [discover section on OpenSearch](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/data-explorer/discover/) and search for the following query:

```
"violates PodSecurity" AND NOT "smoketest-restricted" AND NOT "smoketest-privileged"
```

Or follow [this link](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/data-explorer/discover#?_q=(filters:!(),query:(language:kuery,query:'%22violates%20PodSecurity%22%20AND%20NOT%20%22smoketest-restricted%22%20AND%20NOT%20%22smoketest-privileged%22'))&_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:bb90f230-0d2e-11ef-bf63-53113938c53a,view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-5h,to:now))) to get the same search.

This will show any logs of PSA violations (excluding smoketests). If no logs appear, then increase the time frame to match when the alert was triggered. You can check this on the monitor under the `Alerts` heading.

In the logs, it will provide information such as the offending namespace and the reason it has been triggered.

## Fixing PSA Violations

To fix a PSA violation and stop the monitor from triggering, gather the namespace and violation reason from the logs and then contact a member of the team that owns the violating namespace with details of what is causing the issue, the user then should resolve this issue.

## Slack Alert

OpenSearch will put a message into the `#low-priority-alarms` slack channel whenever the [PodSecurity Violations monitor](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/alerting#/monitors/t4z3XI8BxtKHqtnhcXO2) first goes into the `Triggered` status.

The monitor is throttled to only send 1 message every 24 hours per trigger. This means if a namespace is already triggering the monitor then when another violation occurs, then it will not send another message. The best way to check what is triggering the monitor is to use the steps mentioned above under [Checking logs for PSA violation in OpenSearch](#checking-logs-for-psa-violations-in-opensearch).
Loading