Skip to content

Commit

Permalink
[release-branch.go1.14] encoding/json: properly encode strings with "…
Browse files Browse the repository at this point in the history
…,string" again

golang.org/cl/193604 fixed one bug when one encodes a string with the
",string" option: if SetEscapeHTML(false) is used, we should not be
using HTML escaping for the inner string encoding. The CL correctly
fixed that.

The CL also tried to speed up this edge case. By avoiding an entire new
call to Marshal, the new Issue34127 benchmark reduced its time/op by
45%, and lowered the allocs/op from 3 to 2.

However, that last optimization wasn't correct:

	Since Go 1.2 every string can be marshaled to JSON without error
	even if it contains invalid UTF-8 byte sequences. Therefore
	there is no need to use Marshal again for the only reason of
	enclosing the string in double quotes.

JSON string encoding isn't just about adding quotes and taking care of
invalid UTF-8. We also need to escape some characters, like tabs and
newlines.

The new code failed to do that. The bug resulted in the added test case
failing to roundtrip properly; before our fix here, we'd see an error:

	invalid use of ,string struct tag, trying to unmarshal "\"\b\f\n\r\t\"\\\"" into string

If you pay close attention, you'll notice that the special characters
like tab and newline are only encoded once, not twice. When decoding
with the ",string" option, the outer string decode works, but the inner
string decode fails, as we are now decoding a JSON string with unescaped
special characters.

The fix we apply here isn't to go back to Marshal, as that would
re-introduce the bug with SetEscapeHTML(false). Instead, we can use a
new encode state from the pool - it results in minimal performance
impact, and even reduces allocs/op further. The performance impact seems
fair, given that we need to check the entire string for characters that
need to be escaped.

	name          old time/op    new time/op    delta
	Issue34127-8    89.7ns ± 2%   100.8ns ± 1%  +12.27%  (p=0.000 n=8+8)

	name          old alloc/op   new alloc/op   delta
	Issue34127-8     40.0B ± 0%     32.0B ± 0%  -20.00%  (p=0.000 n=8+8)

	name          old allocs/op  new allocs/op  delta
	Issue34127-8      2.00 ± 0%      1.00 ± 0%  -50.00%  (p=0.000 n=8+8)

Instead of adding another standalone test, we convert an existing
"string tag" test to be table-based, and add another test case there.

One test case from the original CL also had to be amended, due to the
same problem - when escaping '<' due to SetEscapeHTML(true), we need to
end up with double escaping, since we're using ",string".

Fixes #38178.
For #38173.

Change-Id: I2b0df9e4f1d3452fff74fe910e189c930dde4b5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/226498
Run-TryBot: Daniel Martí <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Joe Tsai <[email protected]>
(cherry picked from commit b1a48af)
Reviewed-on: https://go-review.googlesource.com/c/go/+/233037
Reviewed-by: Daniel Martí <[email protected]>
Run-TryBot: Dmitri Shuralyov <[email protected]>
  • Loading branch information
mvdan authored and dmitshur committed May 27, 2020
1 parent 5112209 commit 8ca58ff
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 37 deletions.
11 changes: 6 additions & 5 deletions src/encoding/json/encode.go
Original file line number Diff line number Diff line change
Expand Up @@ -635,11 +635,12 @@ func stringEncoder(e *encodeState, v reflect.Value, opts encOpts) {
return
}
if opts.quoted {
b := make([]byte, 0, v.Len()+2)
b = append(b, '"')
b = append(b, []byte(v.String())...)
b = append(b, '"')
e.stringBytes(b, opts.escapeHTML)
e2 := newEncodeState()
// Since we encode the string twice, we only need to escape HTML
// the first time.
e2.string(v.String(), opts.escapeHTML)
e.stringBytes(e2.Bytes(), false)
encodeStatePool.Put(e2)
} else {
e.string(v.String(), opts.escapeHTML)
}
Expand Down
87 changes: 58 additions & 29 deletions src/encoding/json/encode_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,37 +79,66 @@ type StringTag struct {
NumberStr Number `json:",string"`
}

var stringTagExpected = `{
"BoolStr": "true",
"IntStr": "42",
"UintptrStr": "44",
"StrStr": "\"xzbit\"",
"NumberStr": "46"
}`

func TestStringTag(t *testing.T) {
var s StringTag
s.BoolStr = true
s.IntStr = 42
s.UintptrStr = 44
s.StrStr = "xzbit"
s.NumberStr = "46"
got, err := MarshalIndent(&s, "", " ")
if err != nil {
t.Fatal(err)
}
if got := string(got); got != stringTagExpected {
t.Fatalf(" got: %s\nwant: %s\n", got, stringTagExpected)
func TestRoundtripStringTag(t *testing.T) {
tests := []struct {
name string
in StringTag
want string // empty to just test that we roundtrip
}{
{
name: "AllTypes",
in: StringTag{
BoolStr: true,
IntStr: 42,
UintptrStr: 44,
StrStr: "xzbit",
NumberStr: "46",
},
want: `{
"BoolStr": "true",
"IntStr": "42",
"UintptrStr": "44",
"StrStr": "\"xzbit\"",
"NumberStr": "46"
}`,
},
{
// See golang.org/issues/38173.
name: "StringDoubleEscapes",
in: StringTag{
StrStr: "\b\f\n\r\t\"\\",
NumberStr: "0", // just to satisfy the roundtrip
},
want: `{
"BoolStr": "false",
"IntStr": "0",
"UintptrStr": "0",
"StrStr": "\"\\u0008\\u000c\\n\\r\\t\\\"\\\\\"",
"NumberStr": "0"
}`,
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
// Indent with a tab prefix to make the multi-line string
// literals in the table nicer to read.
got, err := MarshalIndent(&test.in, "\t\t\t", "\t")
if err != nil {
t.Fatal(err)
}
if got := string(got); got != test.want {
t.Fatalf(" got: %s\nwant: %s\n", got, test.want)
}

// Verify that it round-trips.
var s2 StringTag
err = NewDecoder(bytes.NewReader(got)).Decode(&s2)
if err != nil {
t.Fatalf("Decode: %v", err)
}
if !reflect.DeepEqual(s, s2) {
t.Fatalf("decode didn't match.\nsource: %#v\nEncoded as:\n%s\ndecode: %#v", s, string(got), s2)
// Verify that it round-trips.
var s2 StringTag
if err := Unmarshal(got, &s2); err != nil {
t.Fatalf("Decode: %v", err)
}
if !reflect.DeepEqual(test.in, s2) {
t.Fatalf("decode didn't match.\nsource: %#v\nEncoded as:\n%s\ndecode: %#v", test.in, string(got), s2)
}
})
}
}

Expand Down
8 changes: 5 additions & 3 deletions src/encoding/json/stream_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,22 +144,24 @@ func TestEncoderSetEscapeHTML(t *testing.T) {
},
{
"stringOption", stringOption,
`{"bar":"\"\u003chtml\u003efoobar\u003c/html\u003e\""}`,
`{"bar":"\"\\u003chtml\\u003efoobar\\u003c/html\\u003e\""}`,
`{"bar":"\"<html>foobar</html>\""}`,
},
} {
var buf bytes.Buffer
enc := NewEncoder(&buf)
if err := enc.Encode(tt.v); err != nil {
t.Fatalf("Encode(%s): %s", tt.name, err)
t.Errorf("Encode(%s): %s", tt.name, err)
continue
}
if got := strings.TrimSpace(buf.String()); got != tt.wantEscape {
t.Errorf("Encode(%s) = %#q, want %#q", tt.name, got, tt.wantEscape)
}
buf.Reset()
enc.SetEscapeHTML(false)
if err := enc.Encode(tt.v); err != nil {
t.Fatalf("SetEscapeHTML(false) Encode(%s): %s", tt.name, err)
t.Errorf("SetEscapeHTML(false) Encode(%s): %s", tt.name, err)
continue
}
if got := strings.TrimSpace(buf.String()); got != tt.want {
t.Errorf("SetEscapeHTML(false) Encode(%s) = %#q, want %#q",
Expand Down

0 comments on commit 8ca58ff

Please sign in to comment.