Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable line terminator config #331

Closed
scMarkus opened this issue Jul 31, 2023 · 5 comments
Closed

Disable line terminator config #331

scMarkus opened this issue Jul 31, 2023 · 5 comments

Comments

@scMarkus
Copy link

What version of the csv crate are you using?

csv: 1.2.2
csv-core: 0.1.10

Briefly describe the question, bug or feature request.

When using wtr.write_field() the following code fails since wtr.write_record(None::<&[u8]>)?; has not been called as documented. Adding said line not only adds the missing quoting character but additionally adds \n to the respective line. Both escape and terminator are configurable in csv::WriterBuilder but is there a way to disable them entirely?

Include a complete program demonstrating a problem.

no closing quote

fn example1() -> Result<(), Box<dyn Error>> {
    let mut wtr = Writer::from_writer(vec![]);
    wtr.write_field("hallo \" world")?;

    let data = String::from_utf8(wtr.into_inner()?)?;
    assert_eq!(data, "\"hallo \"\" world\""); //  `"\"hallo \"\" world"`
    Ok(())
}

additional line feed

fn example2() -> Result<(), Box<dyn Error>> {
    let mut wtr = Writer::from_writer(vec![]);
    wtr.write_field("hallo \" world")?;
    wtr.write_record(None::<&[u8]>)?;

    let data = String::from_utf8(wtr.into_inner()?)?;
    assert_eq!(data, "\"hallo \"\" world\""); // `"\"hallo \"\" world\"\n"`
    Ok(())
}

desired behavior

fn example3() -> Result<(), Box<dyn Error>> {
    let mut wtr = WriterBuilder::new()
        .terminator(None::<&[u8]>)) // or similar ???
        .from_writer(vec![]);

    wtr.write_field("hallo \" world")?;
    wtr.write_record(None::<&[u8]>)?;

    let data = String::from_utf8(wtr.into_inner()?)?;
    assert_eq!(data, "\"hallo \"\" world\""); // \"hallo \"\" world\"
    Ok(())
}

some background for this request

At the moment I am trying to contribute to the open source tool vector. Specifically #17261. The tricky part is that the csv crate is only used as a line encoder since the so called framing is done independently by that tool.

https://github.com/vectordotdev/vector/blob/4b80c714b68bb9acc2869c48b71848d11954c6aa/lib/codecs/src/encoding/format/csv.rs#L78-L100

Therefore I am curious if there is a way to disable the terminator? If not I would be quite happy if the Terminator enum could add an additional special value implementing this behavior.

In regards to the escape() character from what I am seeing the csv::QuoteStyle::Never would likely disable all quoting / escaping? This would have been my second request to handle corner cases where such behavior is desired.

/// This *never* writes quotes, even if it would produce invalid CSV data.
Never,

@BurntSushi
Copy link
Owner

I don't understand why you would not want to write a terminator. It is part of writing data in a CSV format.

@BurntSushi
Copy link
Owner

You're going to need to spell this out in more detail for me. I understand you've mentioned some tool called vector, but your motivation is described in terms of that project's vocabulary. It's high context and I do not have the time to obtain that context. So you're going to need to break this down at a more fundamental level before I can understand the actual problem being solved here.

@scMarkus
Copy link
Author

scMarkus commented Aug 15, 2023

@BurntSushi
Agreed. Let me try do state the issue in more general terms better highlighting my intention in context of handling CSV data:

When serializing a stream of events in a larger context the termination of such events may be handled in a different place. Still rust-csv can be used for encoding a single event. But at the moment this would forcefully add an additional termination character to the event which is not part of that events data.

As the grammar of RFC4180 shows the terminator is not part of a record but of a file.Which is further empathized by paragraph 2.2 of said RFC stating the last line having no need for a terminator.

The ABNF grammar [[2](https://datatracker.ietf.org/doc/html/rfc4180#ref-2)] appears as follows:
   file = [header CRLF] record *(CRLF record) [CRLF]
   header = name *(COMMA name)
   record = field *(COMMA field)
   name = field
   field = (escaped / non-escaped)
   escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
   non-escaped = *TEXTDATA
   COMMA = %x2C
   CR = %x0D ;as per [section 6.1 of RFC 2234](https://datatracker.ietf.org/doc/html/rfc2234#section-6.1)

@BurntSushi
Copy link
Owner

The terminators are record terminators. They aren't file terminators. The last terminator being optional doesn't change that.

I still don't understand the need here. It doesn't make sense to me to use csv in a context where it writes some part of the format and something else writes another part of the format.

If you need this level of control, you can use csv-core and just not call Writer::terminator.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2023
@scMarkus
Copy link
Author

I tinkered around with with csv-core and it seams to work out for me. Thanks a lot for that hint @BurntSushi
Furthermore I have come across the finish() method which I am using now. If this might be made available in the csv api it would similarly solve my initial issue of events in a stream being kind of there last csv line each and every time. Anyhow thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants