Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: [ and ] in field names still cause parse failures #14821

Open
garethhumphriesgkc opened this issue Dec 19, 2022 · 4 comments
Open

Comments

@garethhumphriesgkc
Copy link

Logstash information:

$ docker run --rm -p 6996:6996 docker.elastic.co/logstash/logstash:8.5.3 -e '' --version
Using bundled JDK: /usr/share/logstash/jdk
Sending Logstash logs to /usr/share/logstash/logs which is now configured via log4j2.properties
[2022-12-19T20:20:49,552][INFO ][logstash.runner          ] Log4j configuration path used is: /usr/share/logstash/config/log4j2.properties
logstash 8.5.3
jruby 9.3.9.0 (2.6.8) 2022-10-24 537cd1f8bc OpenJDK 64-Bit Server VM 17.0.5+8 on 17.0.5+8 +indy +jit [x86_64-linux]
java 17.0.5 (Eclipse Adoptium)
jvm OpenJDK 64-Bit Server VM / 17.0.5+8

OS version (uname -a if on a Unix-like system):

$ uname -a ; docker --version
Linux workstation 5.19.0-26-generic #27-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 23 20:44:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Docker version 20.10.16, build 20.10.16-0ubuntu1

Description of the problem including expected versus actual behaviour:
I raised issue 13606 around the failure of logstash to handle incoming records with embedded ['s in field names. It was closed under PR 14044, but I tested it today with 8.5.3 and the original repro still works:

$ docker run --rm -p 6996:6996 docker.elastic.co/logstash/logstash:8.5.3 -e 'input {tcp { port => 6996 } }filter { json { source => "message" } }output { stdout {} }' --field-reference-escape-style ampersand
$ echo '{"str[substr]": true}' | nc -w1 127.0.0.1 6996
.
.
.
[2022-12-19T20:17:12,486][WARN ][logstash.filters.json    ][main][a700f2c38ad0ec2c2223fd9c1342d70b4cac73422170d6d5c123552c5bca662b] Exception caught in json filter {:source=>"message", :raw=>"{\"str[substr]\": true}", :exception=>#<RuntimeError: Invalid FieldReference: `str[substr]`>}
{
      "@version" => "1",
       "message" => "{\"str[substr]\": true}",
    "@timestamp" => 2022-12-19T20:17:12.344138503Z,
          "tags" => [
        [0] "_jsonparsefailure"
    ],
         "event" => {
        "original" => "{\"str[substr]\": true}"
    }
}

The CSV filter also still exhibits parse failures:

$ docker run --rm -p 6996:6996 docker.elastic.co/logstash/logstash:8.5.3 -e 'input {tcp { port => 6996 } }filter { csv{autodetect_column_names => true }}output { stdout {} }' --field-reference-escape-style ampersand
echo -e 'field[name],value\nstr[substr],true' | nc -w1 127.0.0.1 6996
.
.
.
[2022-12-19T20:40:59,187][WARN ][logstash.filters.csv     ][main][128ac161fdf34f40fbbcb2d3638ef1f2a3ddef55622bee33ab4926a03304276f] Error parsing csv {:field=>"message", :source=>"field[name],value", :exception=>#<RuntimeError: Invalid FieldReference: `[str[substr]]`>}
{
         "event" => {
        "original" => "field[name],value"
    },
       "message" => "field[name],value",
    "@timestamp" => 2022-12-19T20:40:58.959832635Z,
      "@version" => "1",
          "tags" => [
        [0] "_csvparsefailure"
    ]
}

I haven't tested any other filters.

@garethhumphriesgkc
Copy link
Author

garethhumphriesgkc commented Dec 19, 2022

Tagging @andsel who looked at the original ticket and @yaauie who write the PR.

@yaauie
Copy link
Member

yaauie commented Dec 19, 2022

🤔 #14044 solved the ability to create events verbatim as-structured, or to set fields whose values are key/value maps containing invalid field references, but the Event API's methods (Event#get(field_reference),Event#set(field_reference, value), Event#include?(field_reference), Event#remove(field_reference)) still require the field references provided to be valid. This solves the codec's case in the title of the linked #13606 (since it is creating the event), but not the example that uses a filter (since it is using setters with parsed keys).

If the JSON filter is configured with a target, the solution from #14044 also works because the filter is using the event API to set the (valid field reference) target's value to be a key/value map (and is agnostic to its contents).

At first glance, I cannot think of a way to make the existing Event API's accessor methods — which require valid field references — to work with top-level fields that are not valid field references, and therefore cannot think of a way for filters like the JSON filter to set top-level fields with invalid field references.

The Event API does have an Event#append(Event) method, which merges a provided Event into the receiving Event, but using this API would require the filter to first create a temporary Event, then to remove "default" fields that were not from the parsed payload before merging back into the source event, which adds both overhead and edge-cases.

@garethhumphriesgkc
Copy link
Author

Would a suitable workaround be to use the codec to parse into a target, and then use a ruby filter to copy everything from there into the root? Or would the ruby filter hit all the same API issues, just a step later?

In a similar vein, could/should the filters always go via a target, but if none was specified use a default one and copy the contents to root afterward?

Or, is the correct fix a new (or changes to an existing) API call in logstash?

I'm a long way out of my depth here - will help where I can, but the internal logstash API is one (if not several) layers beyond me.

@yaauie
Copy link
Member

yaauie commented Dec 21, 2022

Or would the ruby filter hit all the same API issues, just a step later?

Bingo. We have no generic way to set a top-level field whose name is not a valid field reference.

With escapes enabled (still opt-in), assuming that we had set the parsed contents with target => "[@metadata][temp]", the ruby filter could escape them with something like (untested):

filter {
  ruby {
    init => "@repl = {'['=>'&#91;', ']'=>'&#93;'}"
    code => "
      event.remove('[@metadata][temp]')&.each do |key, value|
        event.set(key.gsub(@repl.keys, @repl), value)
      end
    "
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants