Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash CSV Filter - Quote character parse failure #64

Open
nachiket-lab opened this issue Jan 24, 2018 · 5 comments
Open

Logstash CSV Filter - Quote character parse failure #64

nachiket-lab opened this issue Jan 24, 2018 · 5 comments
Assignees

Comments

@nachiket-lab
Copy link

Hi
I have a CSV file and the format is something like this:

"102","60","Open","I hope this works out for \"[email protected]\""

When i parse this using the CSV filter i get the following error:

[2018-01-23T13:11:58,523][WARN ][logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>"\"102\",\"60\",\"Open\",\"I hope this works out for \\\"[email protected]\\\"\"", :exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1>}

The quote characters seem to be malformed in the error message. I have currently worked around this issue by using gsub before passing the data to the csv filter. Is this a know bug with the csv filter?

https://discuss.elastic.co/t/csv-filter-quote-character-parse-failure/116611

@chnsh
Copy link

chnsh commented Feb 13, 2018

I too have the same issue!

@SHSauler
Copy link

Could you please show how you work around this issue with gsub? I tried removing quote characters and whitespace (as per issue #44), but it still leads to _csvparsefailure.

@nachiket-lab
Copy link
Author

Hi,

My issue was that I had extra double quotes inside the field. The issue you referenced has spaces between two fields. I had similar issue once with a csv. I ended up using the pandas python library to parse and index that data.

You could do the following and check if it works:

mutate {
    gsub => [
      'fieldname', '\"', ''
      'fieldname', ',\s+', ','
    ]
  }

Open a post on the forum (discuss.elastic.co) and ping me. We can continue over there if needed.

@jsvd
Copy link
Member

jsvd commented Apr 27, 2018

WRT the initial issue, the example doesn't seem to be wellformed csv: https://csvlint.io/validation/5ae2c74704a9ea0004000048

Also, a csv linter in go only accepts the file with a flag to "try to parse improperly escaped quotes":

% cat txt.csv
"102","60","Open","I hope this works out for \"[email protected]\""
% ./go/bin/csvlint txt.csv
Record #0 has error: extraneous or missing " in quoted-field

unable to parse any further
% ./go/bin/csvlint -lazyquotes txt.csv
Warning: not using defaults, may not validate CSV to RFC 4180
file is valid

@droberts195
Copy link

The RFC for CSV says "a double-quote appearing inside a field must be escaped by preceding it with another double quote" (it's rule 7 in section 2).

A non-standard alternative is of course to escape quotes using some other character, usually a backslash.

Other CSV parsers have also had the dilemma over whether to support these non-standard CSV formats. For example, SuperCSV agonised over it for a while in super-csv/super-csv#14 before eventually adding an option to support it in super-csv/super-csv#103.

I've added this comment to (a) make sure there's a clear statement of the dilemma and (b) subscribe to the issue because other parts of the Elastic Stack also use CSV now, and it would be nice if there was consistency about which escaping options are supported. Currently the find_file_structure endpoint added in ML 6.5.0 doesn't support non-standard escaping of quotes. But this could be added to find_file_structure in a future release if Logstash and or Filebeat ever support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants