Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raw log message to log handler #7207

Closed
wants to merge 8 commits into from
Closed

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented May 30, 2018

By default filebeat adds log.message to each event which contains the raw unprocessed message. This can be disabled by setting raw_message to false to save disk space.

The message in log.message contains a full multiline message, is encoded and the new line at the end is stripped.

@ruflin ruflin added in progress Pull request is currently in progress. discuss Issue needs further discussion. review Filebeat Filebeat labels May 30, 2018
@ruflin
Copy link
Contributor Author

ruflin commented May 30, 2018

@tsg I'm not sure if we should enable this by default or only have it for example in all our modules enabled.

@ruflin ruflin mentioned this pull request May 30, 2018
@ruflin ruflin force-pushed the raw-message branch 2 times, most recently from e5e67b8 to 9eedc54 Compare June 4, 2018 07:03
Current default for ignore_above is 1024. This is too short for some command line entries on Windows. This increases it to 2048.

Closes elastic#8076
@urso urso requested a review from kvch August 27, 2018 15:01
@@ -178,6 +178,8 @@ https://github.com/elastic/beats/compare/v6.2.3...master[Check the HEAD diff]
- Correctly join partial log lines when using `docker` input. {pull}6967[6967]
- Add support for TLS with client authentication to the TCP input {pull}7056[7056]
- Converted part of pipeline from treafik/access metricSet to dissect to improve efficeny. {pull}7209[7209]
- Add support for TLS with client authentication to the TCP input. {pull}7056[7056]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry does not belong here.

@@ -178,6 +178,8 @@ https://github.com/elastic/beats/compare/v6.2.3...master[Check the HEAD diff]
- Correctly join partial log lines when using `docker` input. {pull}6967[6967]
- Add support for TLS with client authentication to the TCP input {pull}7056[7056]
- Converted part of pipeline from treafik/access metricSet to dissect to improve efficeny. {pull}7209[7209]
- Add support for TLS with client authentication to the TCP input. {pull}7056[7056]
- Add log.message to each log event. {pull}[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add the number of the PR.

Copy link
Contributor

@kvch kvch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed that this feature is going to be disabled by default, because it has a big impact on users. If it is enabled silently on update for users, their whole installation could break. The size of the events are doubled, so they run out of storage. At this point this must be a conscious decision of users to enable it. Thus, we ensure they can prepare for/acknowledge the increased storage requirements.
If we decide later to enable it by default, we can do it in 7.x.

@kvch
Copy link
Contributor

kvch commented Aug 28, 2018

I've just looked at the issue which references it. I still believe that on input level, raw should not be added to the event. But I am fine with adding this to the modules. I am keeping my review as request changes, because currently it enables keeping the raw message for log input, not just for modules.

@@ -18,5 +18,9 @@ func (p *Limit) Next() (Message, error) {
if len(message.Content) > p.maxBytes {
message.Content = message.Content[:p.maxBytes]
}

if len(message.Raw) > p.maxBytes {
message.Raw = message.Raw[:p.maxBytes]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am on the fence with this. Raw message should not be truncated, because it is a form of processing. But this could protect users from injecting too long messages to their pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with truncating the raw message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually go with your first proposal and not truncate it. At the same time I would not index log.original.

Can you elaborate on what you mean with too long message to the pipeline?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean such a big message which could disrupt the output of Filebeat.

ruflin added 6 commits August 28, 2018 13:40
By default filebeat adds log.message to each event which contains the raw unprocessed message. This can be disabled by setting `raw_message` to false to save disk space.

The message in `log.message` contains a full multiline message, is encoded and the new line at the end is stripped.
Copy link
Contributor Author

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to see what the actual impact on enabling it would mean on the storage level. My hope is that it has a smaller impact as the same data exists twice it "could" compress well.

@kvch @ph This change currently only applies to the log input. Should we to something similar for the other inputs?

@@ -18,5 +18,9 @@ func (p *Limit) Next() (Message, error) {
if len(message.Content) > p.maxBytes {
message.Content = message.Content[:p.maxBytes]
}

if len(message.Raw) > p.maxBytes {
message.Raw = message.Raw[:p.maxBytes]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually go with your first proposal and not truncate it. At the same time I would not index log.original.

Can you elaborate on what you mean with too long message to the pipeline?

@kvch
Copy link
Contributor

kvch commented Aug 30, 2018

I think we should. In case of log input reprocessing can be achieved by rereading the data (if it's not yet deleted). But for other inputs it might not be straightforward to replay the incoming messages. So it might be more valuable for other inputs than for log.

@ruflin
Copy link
Contributor Author

ruflin commented Sep 27, 2018

Closing in favor of #8448

@ruflin ruflin closed this Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. Filebeat Filebeat in progress Pull request is currently in progress. review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants