Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep message field intact when using modules #8950

Closed
rocketraman opened this issue Nov 6, 2018 · 9 comments
Closed

Keep message field intact when using modules #8950

rocketraman opened this issue Nov 6, 2018 · 9 comments
Labels
enhancement Filebeat Filebeat module Team:Integrations Label for the Integrations team

Comments

@rocketraman
Copy link

Creating an issue as requested by @kvch in this discussion: https://discuss.elastic.co/t/keeping-message-field-intact-with-module-parsing/155452

I use filebeat modules via auto-discover enabled and hinting. Currently, the message field itself is destroyed after parsing. This means that just casual viewing of logs in Kibana or tools like elktail will just shown an empty log message. Here is an example from a Kibana dashboard:

image

With the current behavior, one has to either add several of the destructured fields to the output view or query, or click into the details of each one. This is not appealing when just trying to get an overall view of a set of logs before digging into the details.

In addition, if I search for message:(something) I won't find it. I have to know which destructured field contains something to do a search.

Can filebeat be configured to parse data out of the message, but leave the message field as-is rather than destroying it?

The example above used the apache2 module via the annotations:

co.elastic.logs/module: apache2
co.elastic.logs/fileset: access

but I suspect other modules have similar behavior.

@jsoriano
Copy link
Member

jsoriano commented Nov 7, 2018

Related to #8448

@ruflin ruflin added the Team:Integrations Label for the Integrations team label Nov 27, 2018
@Randy-312
Copy link

I opened a similar case elastic/ecs#543 which led me to configuration options to do what you seek.

New configuration option is added for docker input and json settings of log input named keep_original. By default it is set to false.
If the option is set, the raw content is published under log.original field. The length of the original message is limited by max_bytes option of the input. (defaults to 10MB)

So, while keep_original doesn't appear to be documented, those two items should allow us to do what we need!

@jsoriano
Copy link
Member

jsoriano commented Sep 10, 2019

The copy_fields processor can be used to keep the original message:

processors:
- copy_fields:
    fields:
      - from: message
        to: log.original
    fail_on_error: false
    ignore_missing: true

I am closing this by now, we can consider reopening if this option is not enough.

@rocketraman
Copy link
Author

@jsoriano Thanks for the update. I do note that this approach isn't ideal, as in Kibana one has to show both the message field for logs that are not parsed, and the event.original field for logs that are parsed. Because both of these fields are very wide, this view in Kibana is super-ugly and hard to read. I'm honestly not sure why lots of other people don't have an issue with this -- its terrible.

Perhaps the fix for this is in Kibana -- the ability to say:

give me a column "message" that reads from "message" if it exists, or "event.original" if it does not

However that is clunky as well.

Another alternate approach is to have a "post-processor" which runs after the parsing logic, which I can use to copy event.original back to message. Seems hackish. To me, the simplest option is to just have a flag on the parser that doesn't destroy the message field in the first place, but hey, why simplify things?

@jsoriano
Copy link
Member

Another alternate approach is to have a "post-processor" which runs after the parsing logic, which I can use to copy event.original back to message. Seems hackish. To me, the simplest option is to just have a flag on the parser that doesn't destroy the message field in the first place, but hey, why simplify things?

This could be something to evaluate per case. The thing is that it would need to be done per fileset, as each log file has a different nature (some have something like a "message", and some others not).

@rocketraman
Copy link
Author

This could be something to evaluate per case. The thing is that it would need to be done per fileset, as each log file has a different nature (some have something like a "message", and some others not).

Fair enough. That's a valid argument against the keepField parser flag approach, but it bolsters the argument for some kind of post-processor ability to create a single unified field (whether that's message or something else, I don't care) from the various types of inputs, that contains a tailable / easily viewable consistent log without jumping through hoops.

I can create a separate issue / feature request for this if you like.

@Randy-312
Copy link

So.. That's an interesting thought...
Today, due to limitations in some of our buffering tools, we avoid duplicating message fields.
ie. we wouldn't have the same in log.original and in message and pass it over the wire / process / pay for it.

BUT, if the receiver knew that a certain 'type' of message was made up of constituent fields that were already passed in, then it could reconstruct the message.
i.e, nginx.message = ${field.1} ${field.2}, etc. And Yes, I was not energetic enough to make that accurate, but you get the idea.

While that would be useful in the UI alone, we MAY also need to do this in our 'logstash' module so that our downstream systems that are not elastic (ie S3) have the 'original' message (Audit use case). Although for most, just being able to use it in the Logs UI is enough.

So, this capability has the possibility of cutting AWS Kinesis costs by at least 10% for everyone who could use this.

@soju22
Copy link

soju22 commented Nov 6, 2019

To me, the simplest option is to just have a flag on the parser that doesn't destroy the message field in the first place, but hey, why simplify things?

I agree, there should be an easy way to keep the message field intact.

@soju22
Copy link

soju22 commented Nov 6, 2019

One easy workaround is to modify the ingest pipeline :

  • delete the pipeline before if needed, e.g. : curl -X DELETE "localhost:9200/_ingest/pipeline/filebeat-7.4.1-apache-access-default",
  • modify /usr/share/filebeat/module/apache/access/ingest/default.json and remove the remove message processor,
  • restart filebeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat module Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants