-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support csv files ending with empty last line #150
Comments
This is intended behavior. An error on an empty last line is probably because you have header row enabled, and my guess is that your header row has more than 1 field. With a header row, all rows need to have the same number of fields. The main problem is that a line break in CSV is significant. The standard says, "The last record in the file may or may not have an ending line break." - and the problem is that there's no way to know whether or not that file does actually have an ending line break or whether it doesn't and the last field is actually missing values. I've erred on the side of caution in handling those cases. Anyway, I guess we could add a config option, but if possible, I'd like to avoid adding more. Do you happen to have any suggestions for knowing whether the file has an ending line break? |
I haven't looked at the internals of papaParse, so I don't know if it's feasible, but how about adding a check when a line is not parsed correctly:
|
That assumes that the last line is supposed to be empty and isn't missing values. Suppose that a process generating the CSV file was about to write the next line but had some silent error and didn't finish (it happens). I would be OK with doing what you suggest if it wasn't potentially covering up a problem with the CSV file. Then again, Papa Parse is not a CSV validator (which is a separate task entirely). Right now, the behavior of So I'm not sure what to do yet. Maybe having the user assert the empty last line is OK is the best way to do it (via a config setting) -- although it could be annoying. I'll leave this issue open until I'm more decided or hear from more users that this would be the desired behavior. |
ok |
It might be worth adding this precision inside the documentation ( |
I think that's a good idea! |
skipEmptyLines would be one solution here, no? |
What if the file has an empty line before the header's row? First line empty? Papa parses the rest of the file giving the "TooManyFields" error to every single row. |
FWIW, this happens not only on files that not end with an empty line but also on files that end with a linebreak as most files do (or should). |
@jseabold Ending with an "empty line" and ending with a "line break" are the same thing. |
Sure. I guess I misread the example in the other issue. Isn't every line supposed to end with a newline? Anyway, fwiw, found this confusing for a bit until I came across these issues. |
Hi, stumbled upon this topic while looking for different problem. Just have one comment for CSV parsing: Therefore according to RFC4180 CSV standard whether last line is empty (previous line contains line ending) or last content line ends with just eof does not break CSV compatibility. |
@s2131 IIUC the last line is not empty as it contains the line values but not the line break. That's not an empty line. |
On Unix and Linux systems it has been standard for decades for the last character of a text-type file to be a newline. This behavior fails on such files, and, worse, fails to parse files that comply with the spec. I think the notion that a spec-compliant file might have missing data to be not at all sufficient for breaking the standard. |
So according to format defenition, trailing line ending belongs to the previous record. So its presence (intentional or not) should't imply one more record. But after Papa Paprse it will be there - the empty record, header mode or not. |
Delete empty last line, to make the file compatible with PapaParse: mholt/PapaParse#150
The the parser sees an EOL on the final line as denoting an invalid blank line. When in fact that's kind of normal for CSVs, at least on Unix. And so we will get an error on every CSV we parse with such a final EOL. We probably care about spurious blank lines less than spurious errors like this, so let's use this option, which seems to be the only (simple?) workaround. Discussed here: mholt/PapaParse#150
It is not uncommon to have files ending with an empty line.
Papaparse fails in handling these, returning errors about malformed file. How about adding support for this edge case?
The text was updated successfully, but these errors were encountered: