-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decode_json_field: process objects and arrays only #11312
Conversation
@@ -146,6 +146,10 @@ func unmarshal(maxDepth int, text string, fields *interface{}, processArray bool | |||
return v, false | |||
} | |||
|
|||
if !strings.HasPrefix(str, "[") && !strings.HasPrefix(str, "{") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You check here for [
but couldn't find an example of it the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a bug with arrays solved in PR #11318 which also includes tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it, the mentioned bug is about processing when it should not process, so it should be fine to test it with processing enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a quite simple heuristic, but should be fine. It still can fail, though. But decodeJSON should have failed in the first place. For example given the string "2016-20-10"
the lexer should find that -
is an invalid character. The dec.More()
call only returns false (as there is no more valid data in the stream), but More
also returns false
if there was an error. We can improve decodeJSON
by checking the dec.Token()
error code after dec.More()
.
For the heuristics (we can still guard decodeJSON using some heuristics), we should remove whitespace when testing as well:
func isStructured(s string) bool {
s = strings.TrimSpace(s)
end = len(s) - 1
return end > 0 && (
(s[0] == '[' && s[end] == ']') ||
(s[0] == '{' && s[end] == '}')
)
}
@michalpristas Should #11318 be merged before this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM.
@michalpristas I see you marked it as breaking change. I would rather think of it as a bug fix even though I see your point that for some users it potentially change the data format. But my thinking is that before it was not as expected.
@urso @andrewkroh Could you chime in here as you were working / looking into the initial issue?
@ruflin thanks, a good eye with the changelog. i must admit it was not intentional |
@@ -146,6 +146,10 @@ func unmarshal(maxDepth int, text string, fields *interface{}, processArray bool | |||
return v, false | |||
} | |||
|
|||
if !strings.HasPrefix(str, "[") && !strings.HasPrefix(str, "{") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a quite simple heuristic, but should be fine. It still can fail, though. But decodeJSON should have failed in the first place. For example given the string "2016-20-10"
the lexer should find that -
is an invalid character. The dec.More()
call only returns false (as there is no more valid data in the stream), but More
also returns false
if there was an error. We can improve decodeJSON
by checking the dec.Token()
error code after dec.More()
.
For the heuristics (we can still guard decodeJSON using some heuristics), we should remove whitespace when testing as well:
func isStructured(s string) bool {
s = strings.TrimSpace(s)
end = len(s) - 1
return end > 0 && (
(s[0] == '[' && s[end] == ']') ||
(s[0] == '{' && s[end] == '}')
)
}
Problem
Processor unmarshals fields incorrectly causing invalid decoding
E.g:
"2017 some string" = > 2017
"2016-09-28T01:40:26.760+0000" => 2016
"123" => 123
Solution
Process only objects and arrays to avoid data loss.
Fixes #3534