Skip to content

Commit

Permalink
Add How to guide for migrating to filestream input (elastic#30972)
Browse files Browse the repository at this point in the history
  • Loading branch information
kvch authored May 12, 2022
1 parent 2619445 commit a03f883
Show file tree
Hide file tree
Showing 2 changed files with 245 additions and 0 deletions.
2 changes: 2 additions & 0 deletions filebeat/docs/howto/howto.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Learn how to perform common {beatname_uc} configuration tasks.
* <<configuring-ingest-node>>
* <<using-environ-vars>>
* <<yaml-tips>>
* <<migrate-to-filestream>>


--
Expand Down Expand Up @@ -43,5 +44,6 @@ include::{libbeat-dir}/shared-env-vars.asciidoc[]
include::{libbeat-dir}/yaml.asciidoc[]
:standalone!:

include::migrate-to-filestream.asciidoc[]


243 changes: 243 additions & 0 deletions filebeat/docs/howto/migrate-to-filestream.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
[[migrate-to-filestream]]
== Migrate `log` input configurations to `filestream`

The `filestream` input has been generally available since 7.14. So it is high time you
migrate your existing `log` input configurations. The `filestream` input comes with many
improvements over the old input, such as configurable order for parsers and more.

While we do not plan to remove the `log` input from Filebeat, we are not fixing
new issues or adding any enhancements to the input. Our focus is on `filestream`.

In this guide, you learn how to migrate an existing `log` input configuration.
The following example shows three `log` inputs:

[source,yaml]
----
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/java-exceptions*.log
multiline:
pattern: '^\['
negate: true
match: after
close_removed: true
close_renamed: true
- type: log
enabled: true
paths:
- /var/log/my-application*.json
scan_frequency: 1m
json.keys_under_root: true
- type: log
enabled: true
paths:
- /var/log/my-old-files*.log
tail_files: true
----

For this example, let's assume that the `log` input is used to collect logs from the following files. The progress of data collection is shown for each file.
["source","sh",subs="attributes"]
----
/var/log/java-exceptions1.log (100%)
/var/log/java-exceptions2.log (100%)
/var/log/java-exceptions3.log (75%)
/var/log/java-exceptions4.log (0%)
/var/log/java-exceptions5.log (0%)
/var/log/my-application1.json (100%)
/var/log/my-application2.json (5%)
/var/log/my-application3.json (0%)
/var/log/my-old-files1.json (0%)
----

=== Step 1: Set an identifier for each `filestream` input

All `filestream` inputs require an ID. So make sure you set a unique identifier for every input.

IMPORTANT: Never change the ID of an input, or you will end up with duplicate events.

[source,yaml]
----
filebeat.inputs:
- type: filestream
enabled: true
id: my-java-collector
paths:
- /var/log/java-exceptions*.log
- type: filestream
enabled: true
id: my-application-input
paths:
- /var/log/my-application*.json
- type: filestream
enabled: true
id: my-old-files
paths:
- /var/log/my-old-files*.log
----

=== Step 2: Exclude all processed files

Filebeat does not provide access to the state information of different inputs.
Hence, the `filestream` input cannot access the state information of a `log` input in the
Filebeat registry. You must exclude the files the `log` input has processed
or is processing. If you do not exclude those files, you will end up with
duplicate events in the output.

Given the file list and ingestion progress shown earlier,
you should run the `log` and `filestream` inputs simultaneously until everything
collected by the `log` input has made it to the output.
After the files collected by the `log` input are shipped and the files are deleted,
you can delete the `log` inputs and the `exlude_files` settings from `filestream` input.

[source,yaml]
----
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/java-exceptions*.log
multiline:
pattern: '^\['
negate: true
match: after
close_removed: true
close_renamed: true
exclude_files: java-exceptions[4-5]{1}.log
- type: log
enabled: true
paths:
- /var/log/my-application*.json
scan_frequency: 1m
json.keys_under_root: true
exclude_files: my-application3.log
- type: filestream
enabled: true
id: my-java-collector
paths:
- /var/log/java-exceptions*.log
prospector.scanner.exclude_files: java-exceptions[1-3]{1}.log
- type: filestream
enabled: true
id: my-application-input
paths:
- /var/log/my-application*.json
prospector.scanner.exclude_files: my-application[1-2]{1}.log
- type: filestream
enabled: true
id: my-old-files
paths:
- /var/log/my-old-files*.log
----


=== Step 3: Use new option names

Several options are renamed in `filestream`. You can find a table with all of the
changed configuration names at the end of this guide.

The most significant change you have to know about is in parsers. The configuration of
`multiline`, `json`, and other parsers has changed. Now the ordering is
configurable, so `filestream` expects a list of parsers. Furthermore, the `json`
parser was renamed to `ndjson`.

The example configuration shown earlier needs to be adjusted as well:

[source,yaml]
----
- type: filestream
enabled: true
id: my-java-collector
paths:
- /var/log/java-exceptions*.log
parsers:
- multiline:
pattern: '^\['
negate: true
match: after
close.on_state_change.removed: true
close.on_state_change.renamed: true
- type: filestream
enabled: true
id: my-application-input
paths:
- /var/log/my-application*.json
prospector.scanner.check_interval: 1m
parsers:
- ndjson:
keys_under_root: true
- type: filestream
enabled: true
id: my-old-files
paths:
- /var/log/my-old-files*.log
ignore_inactive: since_last_start
----

[cols="1,1"]
|===
|Option name in log input
|Option name in filestream input

|recursive_glob.enabled
|prospector.scanner.recursive_glob

|harvester_buffer_size
|buffer_size

|max_bytes
|message_max_bytes

|json
|parsers.n.ndjson

|multiline
|parsers.n.mutiline

|exclude_files
|prospector.scanner.exclude_files

|close_inactive
|close.on_state_change.inactive

|close_removed
|close.on_state_change.removed

|close_eof
|close.reader.on_eof

|close_timeout
|close.reader.after_interval

|close_inactive
|close.on_state_change.inactive

|scan_frequency
|prospector.scanner.check_interval

|tail_files
|ignore_inactive.since_last_start

|symlinks
|prospector.scanner.symlinks

|backoff
|backoff.init

|backoff_max
|backoff.max
|===



0 comments on commit a03f883

Please sign in to comment.