Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat modules: Always send message by default #14708

Closed
exekias opened this issue Nov 22, 2019 · 12 comments
Closed

Filebeat modules: Always send message by default #14708

exekias opened this issue Nov 22, 2019 · 12 comments
Labels
breaking change discuss Issue needs further discussion. enhancement Stalled Team:Integrations Label for the Integrations team

Comments

@exekias
Copy link
Contributor

exekias commented Nov 22, 2019

Currently Kibana Logs UI needs a mechanism to rebuild the original message from events coming from Filebeat modules. This doesn't scale very well, as every time we add/update a new integration, changes need to happen on the Kibana side to support this.

For this reason, in order to provide a good experience, we can change the current behavior of modules to always send the original log line. This would mean:

@exekias exekias added enhancement discuss Issue needs further discussion. Team:Integrations Label for the Integrations team breaking change labels Nov 22, 2019
@exekias exekias changed the title Filebeat modules: Always send original message by default Filebeat modules: Always send message by default Nov 22, 2019
@exekias
Copy link
Contributor Author

exekias commented Nov 22, 2019

We also have log.original which contains the raw original log line, where message is the same without the initial timestamp. We need to come up with a solution that contemplates both

@weltenwort
Copy link
Member

Prior related issues for completeness sake: #8950, #8083

@exekias
Copy link
Contributor Author

exekias commented Nov 26, 2019

I've done some testing, sending the NASA access logs Jul 95 and tweaking apache pipeline to allow for keeping message and log.original fields. In this example log.original is an exact copy of message, as timestamp is not located at the beginning:

filebeat.inputs:
  # Don't keep anything
  - type: log
    paths:
      - ./bench/NASA_access_log_Jul95.1
    index: filebeat-8.0.0-keep_nothing

  - type: log
    paths:
      - ./bench/NASA_access_log_Jul95.2
    fields:
      keep_message: true
    fields_under_root: true
    index: filebeat-8.0.0-keep_message

  - type: log
    paths:
      - ./bench/NASA_access_log_Jul95.3
    fields:
      keep_message: true
      keep_original: true
    fields_under_root: true
    index: filebeat-8.0.0-keep_all

setup.ilm.enabled: false
filebeat.overwrite_pipelines: true

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: filebeat-8.0.0-apache-access-default

After force merge:

yellow open filebeat-8.0.0-keep_all     dGQKyg4TQROa9GvSHlxYhA 1 1 1891714 0 763.5mb 763.5mb
yellow open filebeat-8.0.0-keep_nothing NJinYOw9RYip_bb0nHc_Iw 1 1 1891714 0 310.1mb 310.1mb
yellow open filebeat-8.0.0-keep_message -iL4OMRrTyaO8x9iUmV3QQ 1 1 1891714 0 406.5mb 406.5mb

It sounds to me that we should try to do some effort to store only one of log.original or message if possible.

@weltenwort have we considered in the past using log.original + perhaps an offset field that tells you were timestamp ends and the message starts? I'm talking about the general case, taking out the ones where message is actually inside some JSON field or similar.

@exekias
Copy link
Contributor Author

exekias commented Nov 26, 2019

Noting also a previous conversation about this very same thing: #8448

@kaiyan-sheng
Copy link
Contributor

Seems like we also have a field event.original for storing original log message hmm

@weltenwort
Copy link
Member

Ignoring the redundancy of log.original and event.original for now, for log viewing both an indexed version of message and the original version without the timestamp would be valuable. Using that combination the log entry can be searched for but also displayed correctly.

The full original would be valuable for lossless reindexing, which we don't have a UI for but probably want to have at some point.

using log.original + perhaps an offset field that tells you were timestamp ends and the message starts

@exekias that's an interesting idea, which I haven't heard mentioned before. It should't be a problem from the UI perspective, but I wonder how we would reasonably integrate that into ECS.

@exekias
Copy link
Contributor Author

exekias commented Jan 23, 2020

Just checked the status of existing filesets. 35 out of 91 are reporting a message field, many of them don't contain the original message, but a subset of it:

for f in $(find filebeat/module/*/*/test/*-expected* x-pack/filebeat/module/*/*/test/*-expected*); do grep \"message $f > /dev/null && echo $f | sed -e "s/x-pack\///" | cut -d/ -f3,4; done  | uniq 
apache/error
auditd/log
elasticsearch/audit
elasticsearch/deprecation
elasticsearch/gc
elasticsearch/server
elasticsearch/slowlog
icinga/debug
icinga/main
icinga/startup
kafka/log
kibana/log
logstash/log
logstash/slowlog
mongodb/log
mysql/error
nats/log
nginx/error
postgresql/log
redis/log
system/auth
system/syslog
activemq/audit
activemq/log
azure/signinlogs
cef/log
cisco/ftd
cisco/ios
coredns/log
envoyproxy/log
ibmmq/errorlog
misp/threat
mssql/log
rabbitmq/log
suricata/eve

One possible option to avoid big breaking changes is:

  • 7.x: Always report log.original in all filesets
  • Also: Stop adding message field to new filesets, as log.original should be enough.
  • 7.x: Come up with a plan to use it from the UI (instead of message). For instance, show the first match in this list:
    • UI can compile the message from structured data (only existing modules implemented in the UI)
    • message field
    • log.original with maybe some stripping rules to remove the date
  • 8.0: remove message field from filesets, UI should fallback to log.original

Thoughts?

@exekias
Copy link
Contributor Author

exekias commented Jan 23, 2020

Also pinging @urso @ruflin

@weltenwort
Copy link
Member

weltenwort commented Jan 23, 2020

UI can compile the message from structured data (only existing modules implemented in the UI)

This is what the Logs UI already does and it seems to be very unintuitive for our users. It also doesn't scale and makes search/highlighting very complicated.

8.0: remove message field from filesets, UI should fallback to log.original

As I wrote before, the UI wants to display the message without the timestamp, which is handled separately. As such being able to rely on a sensible message content would be better and make the search experience more predictable.

@exekias
Copy link
Contributor Author

exekias commented Jan 23, 2020

Ok, I had a good chat with @weltenwort, we discussed some things:

  • We can forget about log.original for now because we don't really need to index it for log reindexing. The message field needs to be searchable by users
  • We could think of all kind of tricks to show a message from somewhere else (including log.original or the structured fields), but it all boils down to one problem: If we introduce this kind of magic users will be confused when they try to search over the message they are seeing.

So I think the safe play here is the simplest (as usual):

  • 7.7: Start sending message field for all modules, make it mandatory for the new ones
  • At some point (the UI can rely on it and avoid using its heuristics to reconstruct the message)

@andrewkroh
Copy link
Member

Relates: elastic/ecs#841

@botelastic
Copy link

botelastic bot commented Oct 19, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 19, 2022
@botelastic botelastic bot closed this as completed Apr 17, 2023
@zube zube bot removed the [zube]: Done label Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change discuss Issue needs further discussion. enhancement Stalled Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants