decode_json_field: process objects and arrays only #11312

michalpristas · 2019-03-19T15:34:17Z

Problem
Processor unmarshals fields incorrectly causing invalid decoding
E.g:
"2017 some string" = > 2017
"2016-09-28T01:40:26.760+0000" => 2016
"123" => 123

Solution
Process only objects and arrays to avoid data loss.

Fixes #3534

libbeat/processors/actions/decode_json_fields_test.go

ruflin · 2019-03-20T08:33:49Z

libbeat/processors/actions/decode_json_fields.go

@@ -146,6 +146,10 @@ func unmarshal(maxDepth int, text string, fields *interface{}, processArray bool
 			return v, false
 		}

+		if !strings.HasPrefix(str, "[") && !strings.HasPrefix(str, "{") {


You check here for [ but couldn't find an example of it the tests?

there's a bug with arrays solved in PR #11318 which also includes tests

I added it, the mentioned bug is about processing when it should not process, so it should be fine to test it with processing enabled

It's a quite simple heuristic, but should be fine. It still can fail, though. But decodeJSON should have failed in the first place. For example given the string "2016-20-10" the lexer should find that - is an invalid character. The dec.More() call only returns false (as there is no more valid data in the stream), but More also returns false if there was an error. We can improve decodeJSON by checking the dec.Token() error code after dec.More().

For the heuristics (we can still guard decodeJSON using some heuristics), we should remove whitespace when testing as well:

func isStructured(s string) bool { s = strings.TrimSpace(s) end = len(s) - 1 return end > 0 && ( (s[0] == '[' && s[end] == ']') || (s[0] == '{' && s[end] == '}') ) }

ruflin · 2019-03-21T10:04:32Z

@michalpristas Should #11318 be merged before this one?

michalpristas · 2019-03-21T10:42:03Z

@ruflin i added arrays with explicit processArray true, so it is not influenced by #11318
so there's no need to favor one

ruflin

Change LGTM.

@michalpristas I see you marked it as breaking change. I would rather think of it as a bug fix even though I see your point that for some users it potentially change the data format. But my thinking is that before it was not as expected.

@urso @andrewkroh Could you chime in here as you were working / looking into the initial issue?

michalpristas · 2019-03-22T07:15:19Z

@ruflin thanks, a good eye with the changelog. i must admit it was not intentional

urso · 2019-03-25T11:20:36Z

libbeat/processors/actions/decode_json_fields.go

@@ -146,6 +146,10 @@ func unmarshal(maxDepth int, text string, fields *interface{}, processArray bool
 			return v, false
 		}

+		if !strings.HasPrefix(str, "[") && !strings.HasPrefix(str, "{") {


It's a quite simple heuristic, but should be fine. It still can fail, though. But decodeJSON should have failed in the first place. For example given the string "2016-20-10" the lexer should find that - is an invalid character. The dec.More() call only returns false (as there is no more valid data in the stream), but More also returns false if there was an error. We can improve decodeJSON by checking the dec.Token() error code after dec.More().

For the heuristics (we can still guard decodeJSON using some heuristics), we should remove whitespace when testing as well:

func isStructured(s string) bool { s = strings.TrimSpace(s) end = len(s) - 1 return end > 0 && ( (s[0] == '[' && s[end] == ']') || (s[0] == '{' && s[end] == '}') ) }

process objects and arrays only

eff9c5a

michalpristas added review libbeat labels Mar 19, 2019

michalpristas requested a review from a team as a code owner March 19, 2019 15:34

changelog

cf94c09

michalpristas added the :Processors label Mar 20, 2019

ruflin reviewed Mar 20, 2019

View reviewed changes

michalpristas added 3 commits March 21, 2019 08:19

Merge branch 'master' of https://github.com/elastic/beats into fix-3534

4d2bffa

table tests for depth

acc9bbc

added array into testcase

58cf873

ruflin approved these changes Mar 22, 2019

View reviewed changes

michalpristas added 2 commits March 22, 2019 08:13

moved to fixes

ca70995

Merge branch 'master' into fix-3534

b5d5534

Merge branch 'master' into fix-3534

2f9c61d

ruflin approved these changes Mar 25, 2019

View reviewed changes

michalpristas added 2 commits March 25, 2019 11:58

conflicts with 11318

0b517d5

Merge branch 'fix-3534' of github.com:michalpristas/beats into fix-3534

9b7785d

urso suggested changes Mar 25, 2019

View reviewed changes

added structure check & decoder error check

502caba

urso approved these changes Mar 25, 2019

View reviewed changes

Merge branch 'master' into fix-3534

6dce24b

michalpristas merged commit 6bff9a6 into elastic:master Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decode_json_field: process objects and arrays only #11312

decode_json_field: process objects and arrays only #11312

michalpristas commented Mar 19, 2019

ruflin Mar 20, 2019

michalpristas Mar 20, 2019

michalpristas Mar 21, 2019

urso Mar 25, 2019

ruflin commented Mar 21, 2019

michalpristas commented Mar 21, 2019 •

edited

Loading

ruflin left a comment

michalpristas commented Mar 22, 2019

urso Mar 25, 2019

decode_json_field: process objects and arrays only #11312

decode_json_field: process objects and arrays only #11312

Conversation

michalpristas commented Mar 19, 2019

ruflin Mar 20, 2019

Choose a reason for hiding this comment

michalpristas Mar 20, 2019

Choose a reason for hiding this comment

michalpristas Mar 21, 2019

Choose a reason for hiding this comment

urso Mar 25, 2019

Choose a reason for hiding this comment

ruflin commented Mar 21, 2019

michalpristas commented Mar 21, 2019 • edited Loading

ruflin left a comment

Choose a reason for hiding this comment

michalpristas commented Mar 22, 2019

urso Mar 25, 2019

Choose a reason for hiding this comment

michalpristas commented Mar 21, 2019 •

edited

Loading