Skip to content

Commit

Permalink
Update documentation of filestream input with the new improvements (#…
Browse files Browse the repository at this point in the history
…25303)

(cherry picked from commit d29e006)
  • Loading branch information
kvch authored and mergify-bot committed May 4, 2021
1 parent ee2647e commit 0006089
Show file tree
Hide file tree
Showing 3 changed files with 176 additions and 8 deletions.
62 changes: 60 additions & 2 deletions filebeat/docs/inputs/input-filestream-file-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,30 @@ a `gz` extension:

See <<regexp-support>> for a list of supported regexp patterns.

===== `prospector.scanner.include_files`

A list of regular expressions to match the files that you want {beatname_uc} to
include. If a list of regexes is provided, only the files that are allowed by
the patterns are harvested.

By default no files are excluded. This option is the counterpart of
`prospector.scanner.exclude_files`.

The following example configures {beatname_uc} to exclude files that
are not under `/var/log`:

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
...
prospector.scanner.include_files: ['^/var/log/.*']
----

NOTE: Patterns should start with `^` in case of absolute paths.

See <<regexp-support>> for a list of supported regexp patterns.

===== `prospector.scanner.symlinks`

The `symlinks` option allows {beatname_uc} to harvest symlinks in addition to
Expand All @@ -57,6 +81,12 @@ This is, for example, the case for Kubernetes log files.

Because this option may lead to data loss, it is disabled by default.

===== `prospector.scanner.resend_on_touch`

If this option is enabled a file is resent if its size has not changed
but its modification time has changed to a later time than before.
It is disabled by default to avoid accidentally resending files.


[float]
[id="{beatname_lc}-input-{type}-scan-frequency"]
Expand Down Expand Up @@ -117,6 +147,35 @@ If a file that's currently being harvested falls under `ignore_older`, the
harvester will first finish reading the file and close it after
`close.on_state_change.inactive` is reached. Then, after that, the file will be ignored.

[float]
[id="{beatname_lc}-input-{type}-ignore-inactive"]
===== `ignore_inactive`

If this option is enabled, {beatname_uc} ignores every file that has not been
updated since the selected time. Possible options are `since_first_start` and
`since_last_start`. The first option ignores every file that has not been updated since
the first start of {beatname_uc}. It is useful when the Beat might be restarted
due to configuration changes or a failure. The second option tells
the Beat to read from files that have been updated since its start.

The files affected by this setting fall into two categories:

* Files that were never harvested
* Files that were harvested but weren't updated since `ignore_inactive`.

For files that were never seen before, the offset state is set to the end of
the file. If a state already exist, the offset is not changed. In case a file is
updated again later, reading continues at the set offset position.

The setting relies on the modification time of the file to
determine if a file is ignored. If the modification time of the file is not
updated when lines are written to a file (which can happen on Windows), the
setting may cause {beatname_uc} to ignore files even though content was added
at a later time.

To remove the state of previously harvested files from the registry file, use
the `clean_inactive` configuration option.

[float]
[id="{beatname_lc}-input-{type}-close-options"]
===== `close.*`
Expand Down Expand Up @@ -218,7 +277,7 @@ single log event to a new file. This option is disabled by default.

[float]
[id="{beatname_lc}-input-{type}-close-timeout"]
===== `close.reader.timeout`
===== `close.reader.after_interval`

WARNING: Only use this option if you understand that data loss is a potential
side effect. Another side effect is that multiline events might not be
Expand Down Expand Up @@ -393,4 +452,3 @@ Set the location of the marker file the following way:
----
file_identity.inode_marker.path: /logs/.filebeat-marker
----

90 changes: 90 additions & 0 deletions filebeat/docs/inputs/input-filestream-reader-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,93 @@ The default is 16384.

The maximum number of bytes that a single log message can have. All bytes after
`mesage_max_bytes` are discarded and not sent. The default is 10MB (10485760).

[float]
===== `parsers`

This option expects a list of parsers the log line has to go through.

Avaliable parsers:
- `multiline`
- `ndjson`

In this example, {beatname_uc} is reading multiline messages that consist of 3 lines
and encapsulated in single-line JSON objects.
The multiline message is stored under the key `msg`.

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
...
parsers:
- ndjson:
keys_under_root: true
message_key: msg
- multiline:
type: counter
lines_count: 3
----

See the available parser settings in detail below.

[float]
===== `multiline`

Options that control how {beatname_uc} deals with log messages that span
multiple lines. See <<multiline-examples>> for more information about
configuring multiline options.

[float]
===== `ndjson`

These options make it possible for {beatname_uc} to decode logs structured as
JSON messages. {beatname_uc} processes the logs line by line, so the JSON
decoding only works if there is one JSON object per message.

The decoding happens before line filtering. You can combine JSON
decoding with filtering if you set the `message_key` option. This
can be helpful in situations where the application logs are wrapped in JSON
objects, as with like it happens for example with Docker.

Example configuration:

[source,yaml]
----
- ndjson:
keys_under_root: true
add_error_key: true
message_key: log
----

*`keys_under_root`*:: By default, the decoded JSON is placed under a "json" key
in the output document. If you enable this setting, the keys are copied top
level in the output document. The default is false.

*`overwrite_keys`*:: If `keys_under_root` and this setting are enabled, then the
values from the decoded JSON object overwrite the fields that {beatname_uc}
normally adds (type, source, offset, etc.) in case of conflicts.

*`expand_keys`*:: If this setting is enabled, {beatname_uc} will recursively
de-dot keys in the decoded JSON, and expand them into a hierarchical object
structure. For example, `{"a.b.c": 123}` would be expanded into `{"a":{"b":{"c":123}}}`.
This setting should be enabled when the input is produced by an
https://github.com/elastic/ecs-logging[ECS logger].

*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds a
"error.message" and "error.type: json" key in case of JSON unmarshalling errors
or when a `message_key` is defined in the configuration but cannot be used.

*`message_key`*:: An optional configuration setting that specifies a JSON key on
which to apply the line filtering and multiline settings. If specified the key
must be at the top level in the JSON object and the value associated with the
key must be a string, otherwise no filtering or multiline aggregation will
occur.

*`document_id`*:: Option configuration setting that specifies the JSON key to
set the document id. If configured, the field will be removed from the original
json document and stored in `@metadata._id`

*`ignore_decoding_error`*:: An optional configuration setting that specifies if
JSON decoding errors should be logged or not. If set to true, errors will not
be logged. The default is false.
32 changes: 26 additions & 6 deletions filebeat/docs/inputs/input-filestream.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,30 @@ experimental[]
++++

Use the `filestream` input to read lines from active log files. It is the
new, improved alternative to the `log` input. However, a few feature are
missing from it, e.g. `multiline` or other special parsing capabilities.
These missing options are probably going to be added again. We strive to
achieve feature parity, if possible.
new, improved alternative to the `log` input. It comes various improvements
to the existing input:

1. Checking of `close_*` options happens out of band. Thus, if an output is blocked
{beatname_uc} is able to close the reader and it avoids keeping too many files open.

2. Detailed metrics are available for all files that match the `paths` configuration
regardless of the `harvester_limit`. This way, you can keep track of all files,
even ones that are not actively read.

3. The order of `parsers` is configurable. So it is possible to parse JSON lines and then
aggregate the contents into a multiline event.

4. Some position updates and metadata changes no longer depend on the publishing pipeline.
If a the pipeline is blocked some changes are still applied to the registry.

5. Only the most recent updates are serialized to the registry. In contrast, the `log` input
has to serialize the complete registry on each ACK from the outputs. This makes the registry updates
much quicker with this input.

6. The input ensures that only offsets updates are written to the registry append only log.
The `log` writes the complete file state.

7. Stale entries can be removed from the registry, even if there is no active input.

To configure this input, specify a list of glob-based <<filestream-input-paths,`paths`>>
that must be crawled to locate and fetch the log lines.
Expand Down Expand Up @@ -158,10 +178,10 @@ on. If enabled it expands a single `**` into a 8-level deep `*` pattern.
This feature is enabled by default. Set `prospector.scanner.recursive_glob` to false to
disable it.

include::../inputs/input-filestream-reader-options.asciidoc[]

include::../inputs/input-filestream-file-options.asciidoc[]

include::../inputs/input-filestream-reader-options.asciidoc[]

[id="{beatname_lc}-input-{type}-common-options"]
include::../inputs/input-common-options.asciidoc[]

Expand Down

0 comments on commit 0006089

Please sign in to comment.