-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json.Unmarshal fails to process UTF-16 encoded JSON #36686
Comments
Go expect input content as utf-8 bytes. So you can use golang.org/x/text/transform. https://play.golang.org/p/CDbNzxBB63Z BTW, JSON should be UTF-8 encoded. https://tools.ietf.org/html/rfc8259#section-8.1 This is not a bug of Go. |
Fair enough. I suspected that might be the case. I wonder if maybe the json library should be updated to produce a more helpful error message though: |
unicode.Decoder#Decode return error when invalid sequences is given. json.Decoder#Decode stop to read contents if transform.Reader#Read got error. |
I would argue that this is worth fixing in Go. The older RFC (4627) did allow UTF-16 and UTF-32, and there might be servers still producing those payloads. (Evidently, people are still stumbling upon files written by such servers). Given that JSON payload must begin with an ASCII character, it is possible to auto-detect UTF-16 / UTF-32 by looking at the first couple of bytes, and check for zero bytes. In other words, it seems possible to make Go handle those legacy payloads gracefully without changing anything else in the contract. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Attempted to
json.Unmarshal
bytes extracted from a file usingioutil.ReadFile
. The bytes contained what appears to be valid JSON in any text editor, but viewed under a hex-editor it's apparent that the data is encoded with UTF-16.What did you expect to see?
Successful unmarshal. I think? I haven't found an authoritative statement about whether this is a valid way to encode JSON.
What did you see instead?
invalid character 'ÿ' looking for beginning of value
The text was updated successfully, but these errors were encountered: