-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep original messages in case of Filebeat modules #8448
Conversation
Added entry to our Changelog as a breaking change, as it is going to have a big impact on users' Elasticsearch instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We renamed this to log.original
in ECS. I think it also fits quite well with your code as you often use original: https://github.com/elastic/ecs#log
filebeat/filebeat.reference.yml
Outdated
@@ -81,6 +93,9 @@ filebeat.modules: | |||
# Input configuration (advanced). Any input configuration option | |||
# can be added under this section. | |||
#input: | |||
#Keeps the original message, so the data can be processed again on Ingest Node | |||
#It requires increased storage size, because the sizes of events are approximately doubled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not make statements about doubling the size without measuring it. I also think it's not needed to have this in the config file but we could add a note about it in the docs that the there is increase storage use.
Thanks for taking this on. I'm closing #7207 in favor of this one. Questions:
|
CHANGELOG.asciidoc
Outdated
@@ -19,6 +19,8 @@ https://github.com/elastic/beats/compare/v6.4.0...master[Check the HEAD diff] | |||
|
|||
*Filebeat* | |||
|
|||
- Keep original messages in case of Filebeat modules. {pull}8448[8448] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not put it under breaking change as it's and addition but we should definitively have a note in the migration guide about the additional storage use.
The contents of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a difference in the two implementation. As the original message was copied here (https://github.com/elastic/beats/pull/7207/files#diff-8c78b3728628b1d2ecbd6a3a066629e8R47) it is not affected afterwards by the limit reader. Same reason on why I added it in multiline.
Haven't tested it but I assume if MaxBytes is set to 10 and the line is 20 bytes long, with the current code also the original will be 10 bytes which makes it impossible to reprocess.
Yes, that's exactly how it works. I'll do refactoring as requested. For the record, in this case it's even riskier to enable this feature by default, because users cannot limit the size of |
We should definitively mention that in the docs to make users aware of it. |
@kvch I like that is just another processor that is run before any users defined processors.
I agree with you, we should not set it to true by default. Let's take a common scenario, reading java log file, by default when nothing bad happens a document will be at least the double of the original size, this will have the following direct impact:
The above should be under control when the system is in an happy state, but as soon as a stack trace happen it will generates really big multiple events and this is where things will start to explose. In that case, I would prefer that we don't set that value by default and maybe we could add section about reprocessing of events in our documentation that points to this option but also expose any caveats we could have, like the possibility of truncated values.
This makes me thing what if we provide a truncate processors? I believe this could be handy and useful to allow users to truncate a field, ie stack traces in the multiline context, or limiting content after doing a dissect. Notes: I am aware of the 500 default limit limit of multiline, but maybe we could provide more flexibility. |
var defaultConfig = inputOutletConfig{ | ||
KeepOriginalMsg: true, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly believe this should be a opt-in feature.
@ph I enabled the option by default for modules, because we agreed to do so. I think this solution is an acceptable middle ground, because it still keeps the original message read by Filebeat before it is parsed by an Ingest node. But users still have control over the message sizes. @ph @urso @ruflin do we need to have another conversation regarding this change? |
@ruflin Do you also agree to have it on by default for modules? |
For me the goal of keeping the original message around is to allow reprocessing. If we truncated it by default we loose this capability. For on by default: What if we release this option in 6.x but off by default so we can start playing around with it. Then we can still have a discussion again for 7.0 if we should enable it by default or not. This buys us time to also do some size tests. |
We still need that upper bound check for memory control, but I would say that in the majority of case that should be fine.
This looks more reasonable to me. I +1 that strategy. |
I agree -- no setting this on by default in 6.x. I'd consider it a breaking change as it could change the size of data sent dramatically. Before setting it on by default in the future, I think we'd need to do benchmarking to see what the impact is (including increase to size on disk) and have a conversation about it. cc @cdahlqvist |
If the main reason for keeping it is reprocessing, we can have it not being indexed. If it is just in the source it should compress quite well. |
@cdahlqvist The indexing is off by default: https://github.com/elastic/beats/pull/8448/files#diff-a0e7c7d7619e2c513393f5c49a980d11R119 |
Closing in favour of new processor PRs: |
From now on it is possible to keep the original messages of Filebeat modules. This makes it possible to process the raw messages event again if required.
Configuration
This is an input level configuration, so it must be configured in the
input
section of each module.This also means that it is valid to configure this option from inputs. However, it is suppressed if an event is not coming from a Filebeat module.
This is the first step of implementing #8083 and the alternative to #7207.