[Filebeat] Kafka input, json payload #26833

mjmbischoff · 2021-07-11T01:32:43Z

What does this PR do?

It allows the Filebeat Kafka input to handle json. Specifically this enables picking up structured data and exposing it under top level fields in stead of having escaped json in the message field.

Why is it important?

Kafka is often used to pull data away from the log sources as fast as possible to avoid disks filling up and to allow the 'backend of the pipeline to be serviced / incidents be handled, without dropping events on the floor.

This avoid the need for one to apply the decode-json-fields processor immediately after the input to be able to process any of the fields in the structured data.

In the context of modules this change can be a big advantage; right now we can override the input but not change the processors used by the module easy or inject a processor between the input and the module. While this doesn't solve the issue of mismatched structure, it at least allows one to transform the data before it's stored in Kafka so that modules can be used post Kafka.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

…ut vs string with json

… expecting.

elasticmachine · 2021-07-11T03:13:51Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-07-11T01:33:02.013+0000
Duration: 101 min 31 sec
Commit: ad02f7b

Test stats 🧪

Test	Results
Failed	0
Passed	14806
Skipped	2312
Total	17118

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	14806
Skipped	2312
Total	17118

kvch · 2021-07-14T11:15:14Z

@mjmbischoff We have an Filebeat wide initiative to expose the same parsers in all inputs like we have in log/filestream: multiline, json and container tracked here: #26130

Do you mind looking into it and implementing the support for the input? We want to have a uniform parsing experience for all Filebeat inputs, so I am afraid this PR cannot go in as is.

mjmbischoff · 2021-07-27T10:11:09Z

Still on my radar, looking into implementing based on parsers. Do hit some snags as there doesn't seem to be an easy way to avoid string-> json -> string -> parser(ndjson) dance. Also parser seems pull based and the kafka is more setup as push based code wise. Going to take some changes.

botelastic · 2021-08-26T10:58:04Z

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

mjmbischoff · 2021-08-26T16:25:09Z

#27335 superspeeds this PR, expect it will be the one to get merged. Removing stale.

mergify · 2021-09-06T06:43:36Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b kafka-input-json-payload upstream/kafka-input-json-payload
git merge upstream/master
git push upstream kafka-input-json-payload

mjmbischoff · 2021-09-06T06:45:45Z

Closing this as #27335 got merged

mjmbischoff added 2 commits July 11, 2021 03:02

Adding the option to use json payload as structured data on kafka inp…

032dbac

…ut vs string with json

Improving test to ensure we get at least the number of messages we're…

ad02f7b

… expecting.

mjmbischoff added the enhancement label Jul 11, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 11, 2021

mjmbischoff added the Team: Ingest label Jul 11, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 11, 2021

mjmbischoff changed the title ~~filebeat: Kafka input json payload~~ [Filebeat] Kafka input, json payload Jul 11, 2021

mjmbischoff added the Filebeat Filebeat label Jul 11, 2021

mjmbischoff mentioned this pull request Jul 13, 2021

run processors, defined in the block overriding the input, before the module. #26862

Closed

8 tasks

mjmbischoff mentioned this pull request Aug 22, 2021

[Filebeat] kafka v2 using parsers #27335

Merged

6 tasks

botelastic bot added the Stalled label Aug 26, 2021

mjmbischoff removed the Stalled label Aug 26, 2021

mjmbischoff closed this Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Filebeat] Kafka input, json payload #26833

[Filebeat] Kafka input, json payload #26833

mjmbischoff commented Jul 11, 2021 •

edited

Loading

elasticmachine commented Jul 11, 2021

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

kvch commented Jul 14, 2021

mjmbischoff commented Jul 27, 2021

botelastic bot commented Aug 26, 2021

mjmbischoff commented Aug 26, 2021

mergify bot commented Sep 6, 2021

mjmbischoff commented Sep 6, 2021

[Filebeat] Kafka input, json payload #26833

[Filebeat] Kafka input, json payload #26833

Conversation

mjmbischoff commented Jul 11, 2021 • edited Loading

What does this PR do?

Why is it important?

Checklist

elasticmachine commented Jul 11, 2021

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

kvch commented Jul 14, 2021

mjmbischoff commented Jul 27, 2021

botelastic bot commented Aug 26, 2021

mjmbischoff commented Aug 26, 2021

mergify bot commented Sep 6, 2021

mjmbischoff commented Sep 6, 2021

mjmbischoff commented Jul 11, 2021 •

edited

Loading