Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json.Unmarshal fails to process UTF-16 encoded JSON #36686

Closed
alankm opened this issue Jan 22, 2020 · 4 comments
Closed

json.Unmarshal fails to process UTF-16 encoded JSON #36686

alankm opened this issue Jan 22, 2020 · 4 comments

Comments

@alankm
Copy link

alankm commented Jan 22, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.1 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\alanm\AppData\Local\go-build
set GOENV=C:\Users\alanm\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\alanm\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\alanm\AppData\Local\Temp\go-build387333814=/tmp/go-build -gno-record-gcc-switches

What did you do?

Attempted to json.Unmarshal bytes extracted from a file using ioutil.ReadFile. The bytes contained what appears to be valid JSON in any text editor, but viewed under a hex-editor it's apparent that the data is encoded with UTF-16.

What did you expect to see?

Successful unmarshal. I think? I haven't found an authoritative statement about whether this is a valid way to encode JSON.

What did you see instead?

invalid character 'ÿ' looking for beginning of value

@mattn
Copy link
Member

mattn commented Jan 22, 2020

Go expect input content as utf-8 bytes. So you can use golang.org/x/text/transform.

https://play.golang.org/p/CDbNzxBB63Z

BTW, JSON should be UTF-8 encoded.

https://tools.ietf.org/html/rfc8259#section-8.1

This is not a bug of Go.

@alankm
Copy link
Author

alankm commented Jan 22, 2020

Fair enough. I suspected that might be the case.

I wonder if maybe the json library should be updated to produce a more helpful error message though: invalid character 'ÿ' isn't always helpful. The function could check if it failed by running into a UTF replacement sequence, perhaps?

@alankm alankm closed this as completed Jan 22, 2020
@mattn
Copy link
Member

mattn commented Jan 22, 2020

unicode.Decoder#Decode return error when invalid sequences is given. json.Decoder#Decode stop to read contents if transform.Reader#Read got error.

@dejwk
Copy link

dejwk commented Jun 18, 2020

I would argue that this is worth fixing in Go. The older RFC (4627) did allow UTF-16 and UTF-32, and there might be servers still producing those payloads. (Evidently, people are still stumbling upon files written by such servers). Given that JSON payload must begin with an ASCII character, it is possible to auto-detect UTF-16 / UTF-32 by looking at the first couple of bytes, and check for zero bytes.

In other words, it seems possible to make Go handle those legacy payloads gracefully without changing anything else in the contract.

@golang golang locked and limited conversation to collaborators Jun 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants