Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization of reader pipeline in Beats #16137

Closed
kvch opened this issue Feb 6, 2020 · 2 comments
Closed

Generalization of reader pipeline in Beats #16137

kvch opened this issue Feb 6, 2020 · 2 comments
Assignees
Labels
discuss Issue needs further discussion. Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@kvch
Copy link
Contributor

kvch commented Feb 6, 2020

Goal

Filebeat has various readers that are strictly tied to the log input. However, the readers (usually multiline and syslog) are requested to be used from multiple inputs. The goal is to generalize these readers so more inputs can utilize them.

Current state

The reader pipeline consists of the following readers:

  • readfile.EncoderReader: reads an input with the configured encoding
  • readfile.LineReader: reads a line from a file
  • readfile.LimitReader: truncates message if it is too long
  • readfile.StripNewLine: removes the configured newline characters from the end of the message
  • readfile.TimeoutReader: flushes the message if the configured time has elapsed
  • multiline.Reader: creates a single message from multiple ones based on patterns and timeout
  • readjson.DockerJSON: parses JSON events from Docker
  • readjson.JSONReader: parses arbitrary JSON events

All of the readers above implement the following reader.Reader interface:

type Reader interface {
    Next() (Message, error)
}

All readers except for readfile.EncoderReader read a reader.Message from an underlying reader.Reader.

These readers are only exposed and used in log input and in DockerJSON in container input. However, these readers could be reused in more inputs (e.g. journal, tcp, etc.). It is possible that someone needs to read multiline messages from a systemd journal.

But right now adding this feature to more inputs is not straightforward because the reader pipeline is too tightly coupled with the log input. The readers are part of libbeat, but they can only be used from Filebeat. In order to provide a similar experience for more Beats, the pipeline has to be generalized.

Proposal

The "source" parts of inputs have to be decoupled. TCP and UDP inputs were created like this. The concrete data sources are part of the package inputsource. More sources can be extracted from inputs to provide flexibility:

  • log: reads from a log file
  • journal: reads a journal

The inputsources should implement the interface oi.Reader or reader.Reader in order to plug into the pipeline.

Progress

So far filestream input is the only one that leverages the same reader.Reader functionality. During adopting the readers, a few things have been sorted out. In the interim, a new parser interface was introduced so the existing reader.Reader structures can function as previously in the log input. The new interface already has an improved JSON handling. The interface is not exposed yet.

// parser transforms or translates the Content attribute of a Message.
// They are able to aggregate two or more Messages into a single one.
type parser interface {
    io.Closer
    Next() (reader.Message, error)
}

Next steps

  • Decide if the new inputs should use reader.Reader or beat.Event
  • Create a transformer that converts a reader.Message to a beat.Event
  • Come up with interface between a reader.Reader and inputs to support e.g. journals
@kvch kvch added discuss Issue needs further discussion. [zube]: Meta Team:Services (Deprecated) Label for the former Integrations-Services team [zube]: In Progress and removed [zube]: Meta labels Feb 6, 2020
@kvch kvch mentioned this issue Mar 10, 2020
23 tasks
@kvch kvch self-assigned this May 5, 2020
@kvch kvch changed the title Proposal: Generalization of reader pipeline in Beats Generalization of reader pipeline in Beats Apr 28, 2021
@urso
Copy link

urso commented Apr 28, 2021

We should also check if there are other parsers in Filebeat inputs that we might want to generalize. E.g. syslog or xml coming to mind

@kvch
Copy link
Contributor Author

kvch commented Oct 7, 2021

This has been done. The new processor is called parsers. There is an open issue for adding this to all inputs in Filebeat: #26130

@kvch kvch closed this as completed Oct 7, 2021
@zube zube bot removed the [zube]: Done label Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

No branches or pull requests

3 participants