-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default splitter in dlmread #1852
Comments
I definitely think that dlmread should default to comma-separation and matching dlmwrite, and tend to think that empty fields should not be discarded in either case. |
I think the space-separated and output using tabs would be sensible. If you want to read csv, you should use a real csv reader. |
OK. Worth noting here that DataFrames.jl has a souped-up CSV reader that now supports escaped-quotes, newlines in quoted fields, UTF-8, etc. It might be worth replacing the existing https://github.com/HarlanH/DataFrames.jl/blob/master/src/io.jl |
+1 for @HarlanH's idea. If our
|
Is there a clear difference between TSV and CSV, e.g. that TSV is just tab-separated numbers, but CSV might include various other formatting? We might want to have a separate function for simple delimited data if "real csv" is complex. |
My understanding is that it's all a holy mess with no standards and no On Sat, Jan 5, 2013 at 12:17 PM, Jeff Bezanson [email protected]:
|
I think delimited files are like HTML: there are standards, but lots of buggy examples that systems like Excel have evolved to accept. I think that TSV can be used with all of the complexity of CSV, but that this is required less often because of the relatively low frequency with which a field contains a tab that would have to be escaped. Let's open another issue. I'll prepare a pull request for |
Sounds like the resolution to the original issue is as Stefan suggests, On Sat, Jan 5, 2013 at 1:04 PM, John Myles White
|
For future reference: the IANA standard for TSV disallows tabs within fields. See http://www.iana.org/assignments/media-types/text/tab-separated-values |
Presumably newlines are also not allowed in fields, although it doesn't say that. |
Somebody brought this up on the list. dlmwrite uses ',' by default, but dlmread calls
split(line)
by default, which uses whitespace, and also discards empty fields. If a delimiter is specified then it usessplit(line, dlm, true)
. I don't remember why we did this, but perhaps the default should just besplit(line, ',', true)
. The empty fields thing can be changed, but if so then for both cases.The text was updated successfully, but these errors were encountered: