Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collector][emitter] Ignore all non-unicode-decodable characters #2941

Merged
merged 1 commit into from
Oct 24, 2016

Conversation

olivielpeau
Copy link
Member

What does this PR do?

On Japanese and Korean versions of Windows, We've seen cases where
the json encoding would fail on a UnicodeDecodeError because of
non-utf8-decodable characters in strings, which causes the whole emitter to fail.

This fixes the issue by making another pass on the
whole payload to remove all the non-decodable characters and log the
affected strings.

Testing Guidelines

Added a mock test on the function that removes non-decodable chars.

We've seen cases (on Japanese and Korean versions of Windows) where
the json encoding would fail on a UnicodeDecodeError because of
non-utf8-decodable characters in strings.

To work around this, when this happens, we make another pass on the
whole payload to remove all the non-decodable characters and log the
affected strings.
Copy link
Contributor

@gmmeyer gmmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that the json encoding doesn't allow that.

How will this affect events with non roman characters in the body?

Otherwise, LGTM.

@olivielpeau
Copy link
Member Author

Basically, when json.dumps can't encode the payload to json (and only in that case), this removes the non-ascii characters that are in python strings and that can't be decoded using the utf-8 encoding (instead of failing completely)

So events with non-roman characters should be fine as long as they're utf8-decodable.

@gmmeyer Let me know if that's clear enough for you, if so I'll merge this :)

@gmmeyer
Copy link
Contributor

gmmeyer commented Oct 24, 2016

Yes, merge this if you feel it's ready! I was just curious what the outcome would be.

@olivielpeau
Copy link
Member Author

Ok, thanks @gmmeyer! Merging

@olivielpeau olivielpeau merged commit 10de835 into master Oct 24, 2016
@olivielpeau olivielpeau deleted the olivielpeau/emitter-encoding branch October 24, 2016 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants