encoding/json: mangled unmarshal string result #38105

AllenX2018 · 2020-03-27T03:59:25Z

What version of Go are you using (`go version`)?

$ go version
go version go1.14.1 windows/amd64

Does this issue reproduce with the latest release?

Yes, it reproduces with the latest release.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows

What did you do?

I create a type whose real type is string and implement encoding.TextUnmarshaler and encoding.TextUnmarshaler. I use this type as a map's key type. Then I just simply do a roundtrip test with encoding/json's Marshal and Unmarshal function. When my input json contains non-ascii characters, something unexpected happened.

type KeyString string

func (k KeyString) MarshalText() ([]byte, error) {
	return []byte("Prefix__" + k), nil
}

func (k *KeyString) UnmarshalText(text []byte) error {
	*k = KeyString(strings.TrimPrefix(string(text), "Prefix__"))
	return nil
}

func test() {
	b := []byte(`{"开源":"12345开源"}`)
	m := make(map[KeyString]string)

	if err := json.Unmarshal(b, &m); err == nil {
		fmt.Println(m)
		s, _ := json.Marshal(m)
		fmt.Println(string(s))
	}
}

What did you expect to see?

Output:
map[开源:12345开源]
{"开源":"12345开源"}

What did you see instead?

Output:
map[开��:12345开源]
{"开\ufffd\ufffd�":"12345开源"}

The text was updated successfully, but these errors were encountered:

dsnet · 2020-03-27T05:53:49Z

\cc @mvdan

Git bisect indicates 5469770.

Since this is a regression for go1.14, the fix should be probably be cherry-pick worthy for the go1.14 releases branch.

dsnet · 2020-03-27T05:55:31Z

@gopherbot, please open a backport issue for 1.14.

gopherbot · 2020-03-27T05:55:39Z

Backport issue(s) opened: #38106 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

mvdan · 2020-03-27T18:11:16Z

Thanks @dsnet - I'll try to have a look at it this weekend.

gopherbot · 2020-03-28T00:14:12Z

Change https://golang.org/cl/226218 mentions this issue: encoding/json: don't mangle strings in an edge case when decoding

gopherbot · 2020-05-08T21:20:16Z

Change https://golang.org/cl/233057 mentions this issue: [release-branch.go1.14] encoding/json: don't mangle strings in an edge case when decoding

…e case when decoding The added comment contains some context. The original optimization assumed that each call to unquoteBytes (or unquote) followed its corresponding call to rescanLiteral. Otherwise, unquoting a literal might use d.safeUnquote from another re-scanned literal. Unfortunately, this assumption is wrong. When decoding {"foo": "bar"} into a map[T]string where T implements TextUnmarshaler, the sequence of calls would be as follows: 1) rescanLiteral "foo" 2) unquoteBytes "foo" 3) rescanLiteral "bar" 4) unquoteBytes "foo" (for UnmarshalText) 5) unquoteBytes "bar" Note that the call to UnmarshalText happens in literalStore, which repeats the work to unquote the input string literal. But, since that happens after we've re-scanned "bar", we're using the wrong safeUnquote field value. In the added test case, the second string had a non-zero number of safe bytes, and the first string had none since it was all non-ASCII. Thus, "safely" unquoting a number of the first string's bytes could cut a rune in half, and thus mangle the runes. A rather simple fix, without a full revert, is to only allow one use of safeUnquote per call to unquoteBytes. Each call to rescanLiteral when we have a string is soon followed by a call to unquoteBytes, so it's no longer possible for us to use the wrong index. Also add a test case from #38126, which is the same underlying bug, but affecting the ",string" option. Before the fix, the test would fail, just like in the original two issues: --- FAIL: TestUnmarshalRescanLiteralMangledUnquote (0.00s) decode_test.go:2443: Key "开源" does not exist in map: map[开��:12345开源] decode_test.go:2458: Unmarshal unexpected error: json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string Fixes #38106. For #38105. For #38126. Change-Id: I761e54924e9a971a4f9eaa70bbf72014bb1476e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/226218 Run-TryBot: Daniel Martí <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Joe Tsai <[email protected]> (cherry picked from commit 55361a2) Reviewed-on: https://go-review.googlesource.com/c/go/+/233057 Run-TryBot: Dmitri Shuralyov <[email protected]> Reviewed-by: Daniel Martí <[email protected]>

dsnet changed the title ~~Unexpected marshal/unmarshal result with encoding/json~~ encoding/json: mangled unmarshal string result Mar 27, 2020

gopherbot mentioned this issue Mar 27, 2020

encoding/json: mangled unmarshal string result [1.14 backport] #38106

Closed

rkolp mentioned this issue Mar 27, 2020

encoding/json: encoding.UnmarshalText() produces unexpected values when json.Unmarshaling. #38046

Closed

mvdan self-assigned this Mar 27, 2020

mvdan added this to the Go1.15 milestone Mar 27, 2020

mvdan added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 27, 2020

mvdan mentioned this issue Mar 28, 2020

encoding/json: unexpected result when json.Unmarshaling #38126

Closed

gopherbot closed this as completed in 55361a2 May 8, 2020

mvdan mentioned this issue May 8, 2020

encoding/json: Unintuitive behavior when using custom UnmarshalText #38947

Closed

mvdan mentioned this issue May 9, 2020

encoding/json: possible bug when unmarshaling maps with a custom struct as their key #38771

Closed

urso mentioned this issue May 29, 2020

Update go version to go 1.14.3 elastic/beats#18829

Closed

5 tasks

mrwonko mentioned this issue Jun 12, 2020

encoding/json: incorrect object key unmarshaling when using custom TextUnmarshaler as Key with string values #39555

Closed

golang locked and limited conversation to collaborators May 8, 2021

gopherbot added the FrozenDueToAge label May 8, 2021

rsc unassigned mvdan Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/json: mangled unmarshal string result #38105

encoding/json: mangled unmarshal string result #38105

AllenX2018 commented Mar 27, 2020

dsnet commented Mar 27, 2020

dsnet commented Mar 27, 2020

gopherbot commented Mar 27, 2020

mvdan commented Mar 27, 2020

gopherbot commented Mar 28, 2020

gopherbot commented May 8, 2020

encoding/json: mangled unmarshal string result #38105

encoding/json: mangled unmarshal string result #38105

Comments

AllenX2018 commented Mar 27, 2020

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

dsnet commented Mar 27, 2020

dsnet commented Mar 27, 2020

gopherbot commented Mar 27, 2020

mvdan commented Mar 27, 2020

gopherbot commented Mar 28, 2020

gopherbot commented May 8, 2020

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?