forked from protocolbuffers/protobuf
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
proto: reject invalid UTF-8 in strings (protocolbuffers#499)
The proto2 and proto3 specifications explicitly say that strings must be composed of valid UTF-8 characters. The C++ implementation prints an error anytime strings with invalid UTF-8 is present during parsing or serializing. For proto3, it explicitly errors out when parsing an invalid string. Thus, we cause marshal/unmarshal to error if any string fields are invalid. The error returned is fail-fast and indistinguishable. This means that the we stop the unmarshal process the moment we detect an invalid string, as opposed to finishing the unmarshal. An indistinguishable error means that we provide no API for the user to programmatically check specifically for invalid UTF-8 errors so that they can ignore it. In conversations with the protobuf team, they felt strongly that there should be no ability for users to ignore the UTF-8 validation error. However, if this change is overly problematic for users, we can consider a workaround for users already depending on strings containing invalid UTF-8.
- Loading branch information
Showing
4 changed files
with
41 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters