-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow escaping of non-printable characters in CSV output/input #124
Comments
Yes, I think this makes sense -- and yes, CSV is a mess, making this tricky. The main challenge is really that of figuring API extensions that would be needed. Obviously there's also the problem that although escape character may be specified, there's no real standard for CSV on what escapes would be allowed, and thereby parser/decoder does not support handling either. |
Understood; there are different flavors of CSV. At least our customers would probably be more ok with a CSV output that at least had predictable lines, even though they would now parse the fields (or, more likely use some technology that interprets |
On plus side, CSV format backend is relatively simple... well, there's the state machine, but wrt quoting should be possible to see how that works. So I think there are 3 approaches:
I wish I had more time to work on these as this is highly doable thing, and very useful too. |
Would you talk a change for Jackson 2.9 as well? We are still somewhat locked in the Java 8 world and upgrading to Java 9+ is a scary thought for a number of team. I would try my hand at it in that case. |
This is a proposed solution for FasterXML#124. It introduces a new Feature, `ESCAPE_CONTROL_CHARS_WITH_ESCAPE_CHAR`, which will apply the standard ASCII escapes from JSON to all characters that the CSV generator writes. If this solution is workable, I will add tests.
@cowtowncoder any chance to look at #125 ? |
@hgschmie Will check it out now. As to 2.10: note that JDK baseline does NOT require Java 9: runtime minimum is still JDK 6, build now requires JDK 8. So 2.10 should be fine wrt Java 8, even with added module-info. That's the beauty of Moditect approach. |
Ok so my main concern is performance. I'll see if I can quickly see what effect it might have on |
With quick serialization check I don't see a significant change (there might be 1-2% slowdown but that's within margin of error unless I run a longer test) so that's probably just fine. I'll merge this in 2.9(.9) and it'll be sort of undocumented feature, officially included in 2.10. |
This is a proposed solution for FasterXML#124. It introduces a new Feature, `ESCAPE_CONTROL_CHARS_WITH_ESCAPE_CHAR`, which will apply the standard ASCII escapes from JSON to all characters that the CSV generator writes. If this solution is workable, I will add tests.
Seems I forgot to close this. Officially in 2.10, although technically already in 2.9.9. |
We have a use case where we replace an existing CSV writer (based on Apache Commons CSV) with Jackson. The old CSV writer was configured to write "special characters" such as CR and LF as
\r
and\n
. Jackson does not support this but adheres to RFC 4180 (where no escaping exists). This causes a lot of pain for our customers as the data we write contains often CR and LF characters.Here is some test code:
Which produces
I can get it to produce
by adding
But what I am actually looking for is
There seems to be no way to get the generator (and probably also the parser) to generate and parse control characters. Having CR and LF within quotes is legal from the RFC 4180 PoV, however most of the CSV that our systems produce get parsed by legacy ("brain dead") tools that assume that every LF is a record separator.
Apache Commons CSV has a nice summary on their Javadoc page for
CSVFormat
: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#DEFAULT (and below)(god, CSV is such a mess. And that is the standard format for enterprise data???)
The text was updated successfully, but these errors were encountered: