Skip to content

Commit

Permalink
[doc] Logstash Kubernetes - Persistent Storage Docs (#14714)
Browse files Browse the repository at this point in the history
Co-authored-by: Rob Bavey <[email protected]>
Co-authored-by: Karen Metts <[email protected]>
  • Loading branch information
3 people authored Nov 4, 2022
1 parent 6847ad8 commit 90aae6a
Show file tree
Hide file tree
Showing 2 changed files with 265 additions and 7 deletions.
241 changes: 235 additions & 6 deletions docsk8s/setting-up/ls-k8s-persistent-storage.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,245 @@
[[ls-k8s-persistent-storage]]
=== Persistent storage requirements
=== Stateful {ls} for persistent storage

WARNING: This documentation is still in development and may be changed or removed in a future release.

Some snazzy intro text...
You need {ls} to persist data to disk for certain use cases.
{ls} offers some persistent storage options to help:

[[persistent-storage-pq-dlq]]
==== PQ and DLQ for {ls} core
* <<persistent-storage-pq,Persistent queue (PQ)>> to absorb bursts of events
* <<persistent-storage-dlq,Dead letter queue (DLQ)>> to accept corrupted events that cannot be processed
* <<persistent-storage-plugins,Persistent storage options in some {ls} plugins>>

There a few factors to help you decide whether to configure your {ls} core instance to use either a persistent queue {PQ} or a dead letter queue (DLQ).
For all of these cases, we need to ensure that we can preserve state.
Remember that the {k8s} scheduler can shutdown pods at anytime and spawn the process to another node. To preserve state, we define our {ls} deployment using `StatefulSet` rather than `Deployment`.

[[persistent-storage-statefulset]]
==== Set up StatefulSet

[source,yaml]
--
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: logstash
labels:
app: logstash-demo
spec:
replicas: 1
selector:
matchLabels:
app: logstash-demo
serviceName: logstash
template:
metadata:
labels:
app: logstash-demo
spec:
containers:
- name: logstash
image: "docker.elastic.co/logstash/logstash:{version}"
env:
- name: LS_JAVA_OPTS
value: "-Xmx1g -Xms1g"
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
ports:
- containerPort: 9600
name: stats
livenessProbe:
httpGet:
path: /
port: 9600
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 9600
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumeMounts:
- name: logstash-data <2>
mountPath: /usr/share/logstash/data
- name: logstash-pipeline
mountPath: /usr/share/logstash/pipeline
- name: logstash-config
mountPath: /usr/share/logstash/config/logstash.yml
subPath: logstash.yml
- name: logstash-config
mountPath: /usr/share/logstash/config/pipelines.yml
subPath: pipelines.yml
volumes:
- name: logstash-pipeline
configMap:
name: logstash-pipeline
- name: logstash-config
configMap:
name: logstash-config
volumeClaimTemplates: <1>
- metadata:
name: logstash-data
labels:
app: logstash-demo
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 2Gi
--

Everything is similar to `Deployment`, except the usage of `VolumeClaimTemplates`.

<1> Request 2G of persistent storage from `PersistentVolumes`.

<2> Mount the storage to `/usr/share/logstash/data`. This is the default path of {ls} and its plugins for any persistence needs.

NOTE: The feature of persistent link:https://kubernetes.io/blog/2018/07/12/resizing-persistent-volumes-using-kubernetes/[volume expansion] depends on the storage class. Check with your cloud provider.

[[persistent-storage-pq]]
==== Persistent queue (PQ)
You can configure persistent queues globally across all pipelines in `logstash.yml`, with settings for individual pipelines in `pipelines.yml`. Note that individual settings in `pipelines.yml` override those in `logstash.yml`. Queue data store is set to `/usr/share/logstash/data/queue` by default.

To enable {logstash-ref}/persistent-queues.html[PQ] for every pipeline, specify options in `logstash.yml`.

[source,yaml]
--
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
data:
logstash.yml: |
api.http.host: "0.0.0.0"
queue.type: persisted
queue.max_bytes: 1024mb
...
--

To specify options per pipeline, set in `pipelines.yml`.

[source,yaml]
--
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
data:
logstash.yml: |
api.http.host: "0.0.0.0"
pipelines.yml: |
- pipeline.id: fast_ingestion
path.config: "/usr/share/logstash/pipeline/fast.conf"
queue.type: persisted
queue.max_bytes: 1024mb
- pipeline.id: slow_ingestion
path.config: "/usr/share/logstash/pipeline/slow.conf"
queue.type: persisted
queue.max_bytes: 2048mb
--

[[persistent-storage-dlq]]
==== Dead letter queue (DLQ)

To enable {logstash-ref}/dead-letter-queues.html[dead letter queue], specify options in `logstash.yml`. The default path of DLQ is `/usr/share/logstash/data/dead_letter_queue`.

[source,yaml]
--
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
data:
logstash.yml: |
api.http.host: "0.0.0.0"
dead_letter_queue.enable: true <1>
pipelines.yml: |
- pipeline.id: main <2>
path.config: "/usr/share/logstash/pipeline/main.conf"
- pipeline.id: dlq <3>
path.config: "/usr/share/logstash/pipeline/dlq.conf"
--

<1> Enable DLQ for all pipelines that use {logstash-ref}/plugins-outputs-elasticsearch.html[elasticsearch output plugin]

<2> The `main` pipeline sends failed events to DLQ. Checkout the pipeline definition in the next section.

<3> The `dlq` pipeline should consume events from the DLQ, fix errors and re-send events to {es}. Checkout the pipeline definition in the next section.

[source,yaml]
--
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-pipeline
data:
main.conf: | <1>
input {
exec {
command => "uptime"
interval => 5
}
}
output {
elasticsearch {
hosts => ["https://hostname.cloud.es.io:9200"]
index => "uptime-%{+YYYY.MM.dd}"
user => 'elastic'
password => 'changeme'
}
}
dlq.conf: | <2>
input {
dead_letter_queue {
path => "/usr/share/logstash/data/dead_letter_queue"
commit_offsets => true
pipeline_id => "main"
}
}
filter {
# Do your fix here
}
output {
elasticsearch {
hosts => ["https://hostname.cloud.es.io:9200"]
index => "dlq-%{+YYYY.MM.dd}"
user => 'elastic'
password => 'changeme'
}
}
--

<1> An example pipeline that tries to send events to a closed index in {es}. To test this functionality manually, use {ref}/indices-close.html[_close] API to close the index.

<2> This pipeline use {logstash-ref}/plugins-inputs-dead_letter_queue.html[dead_letter_queue input plugin] to consume DLQ events. This example sends to a different index, but you can add filter plugins to fix other types of error causing fail insertion, such as mapping errors.

[[persistent-storage-plugins]]
==== Plugins that require local storage to track work done
Many Logstash plugins are stateful, and need to use persistent storage to track the current state of the work that they are doing.

Logstash plugins that are stateful will typically have some kind of `path` that needs to be configured, such as `sincedb_path` or `last_run_metadata_path`

Here is the list of popular plugins that will require persistent storage, and the use of a `StatefulSet` with `VolumeClaimTemplates`, checkout <<persistent-storage-statefulset>>.

Description...
[cols="<,<",options="header",]
|=======================================================================
|Plugin |Settings
|logstash-codec-netflow| {logstash-ref}/plugins-codecs-netflow.html#plugins-codecs-netflow-cache_save_path[cache_save_path]
|logstash-inputs-couchdb_changes| {logstash-ref}/plugins-inputs-couchdb_changes.html#plugins-inputs-couchdb_changes-sequence_path[sequence_path]
|logstash-input-dead_letter_queue| {logstash-ref}/plugins-inputs-dead_letter_queue.html#plugins-inputs-dead_letter_queue-sincedb_path[sincedb_path]
|logstash-input-file| {logstash-ref}/plugins-inputs-file.html#plugins-inputs-file-file_completed_log_path[file_completed_log_path], {logstash-ref}/plugins-inputs-file.html#plugins-inputs-file-sincedb_path[sincedb_path]
|logstash-input-google_cloud_storage| {logstash-ref}/plugins-inputs-google_cloud_storage.html#plugins-inputs-google_cloud_storage-processed_db_path[processed_db_path]
|logstash-input-imap| {logstash-ref}/plugins-inputs-imap.html#plugins-inputs-imap-sincedb_path[sincedb_path]
|logstash-input-jdbc| {logstash-ref}/plugins-inputs-jdbc.html#plugins-inputs-jdbc-last_run_metadata_path[last_run_metadata_path]
|logstash-input-s3| {logstash-ref}/plugins-inputs-s3.html#plugins-inputs-s3-sincedb_path[sincedb_path]
|logstash-filters-aggregate| {logstash-ref}/plugins-filters-aggregate.html#plugins-filters-aggregate-aggregate_maps_path[aggregate_maps_path]
|=======================================================================
31 changes: 30 additions & 1 deletion docsk8s/troubleshooting/ls-k8s-troubleshooting-methods.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ Here are some approaches that you can use to diagnose the state of your {ls} and
* <<ls-k8s-viewing-logs>>
* <<ls-k8s-connecting-to-a-container>>
* <<ls-k8s-diagnostics>>
* <<ls-k8s-pq-util>>
* <<ls-k8s-pq-drain>>

[float]
[[ls-k8s-checking-resources]]
Expand Down Expand Up @@ -92,4 +94,31 @@ jdk/bin/jcmd 1 GC.heap_dump /tmp/heap_dump.hprof
[source,bash]
--
kubectl cp logstash-7477d46bb7-4lcnv:/tmp/heap_dump.hprof ./heap.hprof
--
--

[[ls-k8s-pq-util]]
=== Running PQ utilities

In the event of persistent queue corruption, the `pqcheck` and `pqrepair` tools are available for troubleshooting.

Run {logstash-ref}/persistent-queues.html#pqcheck[pqcheck] to identify corrupted files:

[source,bash]
--
kubectl exec logstash-0 -it -- /usr/share/logstash/bin/pqcheck /usr/share/logstash/data/queue/pipeline_id
--

Run {logstash-ref}/persistent-queues.html#pqrepair[pqrepair] to repair the queue:

[source,bash]
--
kubectl exec logstash-0 -it -- /usr/share/logstash/bin/pqrepair /usr/share/logstash/data/queue/pipeline_id
--

[[ls-k8s-pq-drain]]
=== Draining the PQ

{ls} provides a `queue.drain: true` configuration setting to pause shutdown, when Logstash is stopped gracefully, until all messages from a persistent queue have been handled.
Special consideration needs to be taken when using the `queue.drain: true` setting when using {k8s}. By default, a {k8s} pod has a grace period of 30 seconds to shutdown before it is closed forcefully, via a `SIGKILL`, which may cause {ls} to exit before the queue is fully drained.

To avoid {ls} shutting down before the queue is completely drained, we recommend setting the `TerminationGracePeriodSeconds` value to an artifically long period, such as 1 year, to give {ls} sufficient time to drain the queue when this functionality is required.

0 comments on commit 90aae6a

Please sign in to comment.