-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CSV encoder configurable #17261
Comments
Would the proposed approach be reasonable? I may find some time trying to implement this. If so would one be so kind and hint me at how to implement new config options? |
I'll see if I can get @vectordotdev/ux-team to respond by the end of the week - we have started adding some options on codecs like So I could see the |
Agreed, the options can be nested as suggested like:
We'd be happy to see a PR for this if you feel motivated @scMarkus. The options would be added here: vector/lib/codecs/src/encoding/format/csv.rs Lines 51 to 60 in c21f892
|
A note for the community
Use Cases
Currently the CSV encoder used in multiple places like the S3 sink is using a standard configuration in terms of column separator (
,
) and quotation escaping ("
). This conforms to rfc 4180. Anyway we are currently using a setup where we are using a tab delimited (TSV:\t
) variant of it. Furthermore the respective data is read via Apache Spark which uses\
as quotation escape character as its default. We would like to transparently move the logging pipeline towards vector without disrupting the downstream processes which would make said configuration options mandatory to us when writing the data.Attempted Solutions
As far as I can see there is no documented configuration option to change the described behavior. Looking into the vector source it seams to me as the relevant part is:
vector/lib/codecs/src/encoding/format/csv.rs
Line 80 in 4b80c71
This is using from_writer of the respective CSV library which sets default values. Alternatively a WriterBuilder of the same library could be used which has option to set a delimiter and quotation escape char as well as some other configs.
Proposal
What indeed does exists in the documentation already is the required
encoding.csv.fields
option. In my mind it would be a nice fit to extendencoding.csv
byencoding.csv.delimiter
andencoding.csv.quotation_escape
and use those to configure thecsv::WriterBuilder
.References
feat(codec): Add csv encoding for sinks #16603
Version
vector 0.29.0 (x86_64-unknown-linux-gnu 33b3868 2023-04-11 22:19:54.512366263)
The text was updated successfully, but these errors were encountered: