Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New decode_csv_fields processor #11753

Merged
merged 17 commits into from
Apr 26, 2019
Merged

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Apr 10, 2019

This patch introduces a new processor, decode_csv_field decode_csv_fields that decodes
rows of CSV-formatted data into a string array, one element per column.

processors:
- decode_csv_fields:
    fields:
      message: csv
    separator: ,
    overwrite_keys: false
    ignore_missing: false
    trim_leading_space: false
    fail_on_error: true

This patch introduces a new processor, `decode_csv_field` that decoded
rows of CSV-formatted data into a string array, one element per column.

processors:
- truncate_fields:
    field: message
    target: csv
    separator: ,
    overwrite_keys: false
    ignore_missing: false
    trim_leading_space: false
@adriansr adriansr requested review from a team as code owners April 10, 2019 21:57
@adriansr adriansr added discuss Issue needs further discussion. enhancement needs_docs review labels Apr 10, 2019
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A while back I had hacked together a lookup table processor that could load data from a CSV file. I found it useful to have a setting that allowed me to directly name specific numbered columns like

- processors:
    - decode_csv_field:
        field: message
        target: "user"
        columns:
          # Target Field Name -> CSV Column Number
          email: 0 # Write column 0 to user.email.
          name:  2 # Write column 2 to user.name. (Column 1 is ignored.)

When columns is not specified then I'd have it write an array of strings to the target like you have.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM.

Can you add this processor to here.

And also add a changelog entry.

libbeat/processors/actions/decode_csv_field.go Outdated Show resolved Hide resolved
libbeat/processors/actions/decode_csv_field_test.go Outdated Show resolved Hide resolved
@adriansr
Copy link
Contributor Author

@andrewkroh
I originally devised it using the columns mapping too, the problem is that it won't easily fit my current use-case, as I have 50+ columns and need to inspect one of the first columns (a.k.a "type") to decide which mapping to use for the rest.

So I thought maybe it was better to have this processor to decode to an array and then add a generic "extract_array" processor or do this inside an ingest pipeline. I'm planning to discuss this on today's sync with Beats team.

@adriansr adriansr force-pushed the feature_csv_processor branch from 4e87509 to 9e0a78b Compare April 12, 2019 09:39
@andrewkroh
Copy link
Member

This is looking good. I think it just needs a section added to the asciidocs now.

@adriansr adriansr changed the title New decode_csv_field processor New decode_csv_fields processor Apr 16, 2019
@adriansr
Copy link
Contributor Author

adriansr commented Apr 16, 2019

@andrewkroh I've modified the processor a little bit to help align with the rest. Do you mind reviewing again? Now it has docs.

Main change is the rename to *_fields and the ability to process more than one field at the same time (I've copied the conf style from your dns processor). This adds also the new flag fail_on_error, which is also common on another processors.

There is currently no mechanism to inject this reference config on
selected Beats.
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

libbeat/docs/processors-using.asciidoc Outdated Show resolved Hide resolved
@adriansr adriansr removed discuss Issue needs further discussion. needs_docs labels Apr 26, 2019
@adriansr adriansr merged commit e03993f into elastic:master Apr 26, 2019
adriansr added a commit to adriansr/beats that referenced this pull request Apr 26, 2019
adriansr added a commit that referenced this pull request Apr 26, 2019
* Missing changelog entry for #11753

* Update csv processor to support `when` clause

* Docs fixes for csv processor
DStape pushed a commit to DStape/beats that referenced this pull request Aug 20, 2019
This patch introduces a new processor, `decode_csv_fields` that decodes
rows of CSV-formatted data into a string array, one element per column.

processors:
- decode_csv_fields:
    fields:
      message: csv
    separator: ,
    overwrite_keys: false
    ignore_missing: false
    trim_leading_space: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants