Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiline value enclosed in doublequotes cannot be parsed #75

Open
0asp0 opened this issue Aug 23, 2019 · 1 comment
Open

multiline value enclosed in doublequotes cannot be parsed #75

0asp0 opened this issue Aug 23, 2019 · 1 comment

Comments

@0asp0
Copy link

0asp0 commented Aug 23, 2019

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

  • Version: logstash 7.3.0
  • Operating System: Windows

csv rfc says that a value can contain multiple lines, broken by CRLF as long as the value is enclosed into double quotes:

https://tools.ietf.org/html/rfc4180

 6.  Fields containing line breaks (CRLF), double quotes, and commas
       should be enclosed in double-quotes.  For example:

       "aaa","b CRLF
       bb","ccc" CRLF
       zzz,yyy,xxx

We have multiline fields stored in elasticsearch like stacktraces.
I exported them in discovery via csv export. Then I tried to import them via logstash to another elasticsearch instance.

CSV filter is throwing exception that a quote is missing, because it does not find it on the next new line.

Here are my filters:

    input
    {
      file
      {
        path => ['C:/work/elastic/input/csv/*.csv']
        sincedb_path => "C:/work/elastic/input/csv/db"
        start_position => "beginning"
        codec => multiline
        {
          pattern   => '(^\"\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}.\d{3}\")|(^\"\@timestamp\")'
          negate    => "true"
          what      => "previous"
          max_bytes => "200 MiB"
          max_lines => 10000
          auto_flush_interval=> 2
          }
          }
    }

    filter
    {
      # workaround. Without gsub it will fail
      mutate
      {
        gsub =>  ["message", '\n"', '\\n"']
      }
      csv
      {
        autodetect_column_names => true
        autogenerate_column_names => true
        separator => ";"
        source => "message"
        skip_empty_columns => "true"
        target=> "mycsv"
      }
    }

I found the workaround with mutate's gsub to replace newlines with \n. But I would declare it as a bug which should be solved.

@blacksudoku
Copy link

blacksudoku commented Dec 16, 2020

I also have same case. Removing the \r from line end solved my problem.

  mutate {
    gsub => [
      "message", "\r$", ""
    ]
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants