-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure] Sanitize message in case of malformed json #34874
Conversation
The function Therefore, the expected result has the json keys in different order (sanitization works fine). |
// clean up the message for known issues producing a malformed JSON | ||
if a.config.SanitizeOptions != nil { | ||
for _, opt := range a.config.SanitizeOptions { | ||
bMessage = sanitize(string(bMessage), opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if an unknown/invalid opt is passed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is to handle this in the sanitize
function. If an unknown option is passed, it should just return the JSON string as it was.
type sanitizationFunc func(jsonStr string) []byte | ||
|
||
func getSanitizationFuncs() map[string]sanitizationFunc { | ||
return map[string]sanitizationFunc{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a switch/case be more readable here (it would also allow having a default behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should the default behaviour be for all sanitizations to occur for Azure?
- And how about exposing the sanitization options directly in Kibana for the end-user?
What's your opinion on this? @zmoog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my take:
- The default behavior must be no sanitization; these malformed JSON documents are edge cases.
- We need to make this option available in the agent template file and on the integration settings page.
For example, we can turn it on by default for FunctionApp logs but also give the option to turn it off as an escape hatch.
This triggers a question: we will probably release this change with stack version 8.8 and backport it to 8.7.2. How can we make this fix available without raising the minimum required version for the Azure Integration logs to 8.7.2? Today the min required version is ^7.16.0 || ^8.0.0
.
It's a big bump to address and an edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we must bump the minimum version to 8.9.
Since unmrashalling happens via a map, I think we only need to worry about newlines in the key/values themselves. |
@lucianpy, please make sure the settings are easy to use in the Agent template and in integration settings UI. Set this PR ready for review when you're ready. I will ask around what are our options to make this new configuration settings available to the integration without bumping the min version from 7.17.x to 8.7.2 for an edge case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lucianpy, how do you plan to set up the integration settings UI to configure the sanitization options?
@lucianpy, we're almost ready to go! The last thing to try is setting up the new sanitization options in the integration settings UI to ensure we don't need to make any changes to the config struct. |
Co-authored-by: Maurizio Branca <[email protected]>
Co-authored-by: Maurizio Branca <[email protected]>
@zmoog I've set up the settings. They will be separate toggles for each sanitization option under 'Advanced options'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucianpy successfully tested we have everything we need to use this from the Agent integrations settings, so we're good to go!
Hey, @elastic/elastic-agent-data-plane, @lucianpy also needs your review since you are the default owner in the Beats repo. |
Looks like the data plane team is only being pinged because there is no owner set for the azureeventhub input. Rather than requiring us to review, change the codeowner for this input to the team that actually owns it and then I'll approve that. That way we aren't needlessly pinged every time this code changes. |
Classic example of wrong team being pinged because not-up-to-date CODEOWNER entry and hence delaying the PR review process. |
You are totally right @cmacknz. My apologies for the noise. I opened a PR to assign the ownership of this input to my team. |
* Add sanitization function and test for azure input (cherry picked from commit 4096f9b)
* Add sanitization function and test for azure input (cherry picked from commit 4096f9b)
* Add sanitization function and test for azure input
…json (#35622) * [Azure] Sanitize message in case of malformed json (#34874) * Add sanitization function and test for azure input (cherry picked from commit 4096f9b) * Remove extra changelog items from cherry-pick --------- Co-authored-by: lucianpy <[email protected]> Co-authored-by: Maurizio Branca <[email protected]>
… json (#35623) * [Azure] Sanitize message in case of malformed json (#34874) * Add sanitization function and test for azure input (cherry picked from commit 4096f9b) * Fix logger import * fix linter * Fix changelog --------- Co-authored-by: lucianpy <[email protected]> Co-authored-by: lucian-ioan <[email protected]> Co-authored-by: Denis <[email protected]>
What does this PR do?
This PR adds the function
sanitize
which is used inparseMultipleMessages
to clean up the message string before attempting to unmarshal the JSON.It also adds the field
SanitizeOptions
which should be added manually for the datastreams that require sanitization in integrations.Why is it important?
As pointed out by @zmoog, some logs from Azure can have issues with newlines, while others with single quotes.
This can lead to failure in processing the data in integrations (ex: the pipeline receiving one document with two records).
Filebeat sample configuration
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
Related issues