Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support csv files ending with empty last line #150

Closed
mef opened this issue Jan 6, 2015 · 16 comments
Closed

support csv files ending with empty last line #150

mef opened this issue Jan 6, 2015 · 16 comments

Comments

@mef
Copy link

mef commented Jan 6, 2015

It is not uncommon to have files ending with an empty line.

Papaparse fails in handling these, returning errors about malformed file. How about adding support for this edge case?

@mholt
Copy link
Owner

mholt commented Jan 6, 2015

This is intended behavior. An error on an empty last line is probably because you have header row enabled, and my guess is that your header row has more than 1 field. With a header row, all rows need to have the same number of fields.

The main problem is that a line break in CSV is significant. The standard says, "The last record in the file may or may not have an ending line break." - and the problem is that there's no way to know whether or not that file does actually have an ending line break or whether it doesn't and the last field is actually missing values.

I've erred on the side of caution in handling those cases.

Anyway, I guess we could add a config option, but if possible, I'd like to avoid adding more. Do you happen to have any suggestions for knowing whether the file has an ending line break?

@mef
Copy link
Author

mef commented Jan 6, 2015

I haven't looked at the internals of papaParse, so I don't know if it's feasible, but how about adding a check when a line is not parsed correctly:

  • if the line is empty and it's the last line of the file, just ignore the error and return the results
    • else: return the error (as it's currently done).

@mholt
Copy link
Owner

mholt commented Jan 6, 2015

That assumes that the last line is supposed to be empty and isn't missing values. Suppose that a process generating the CSV file was about to write the next line but had some silent error and didn't finish (it happens).

I would be OK with doing what you suggest if it wasn't potentially covering up a problem with the CSV file. Then again, Papa Parse is not a CSV validator (which is a separate task entirely). Right now, the behavior of header is very consistent. If we add this exception, it'll complicate things.

So I'm not sure what to do yet. Maybe having the user assert the empty last line is OK is the best way to do it (via a config setting) -- although it could be annoying. I'll leave this issue open until I'm more decided or hear from more users that this would be the desired behavior.

@mef
Copy link
Author

mef commented Jan 6, 2015

ok

@mholt mholt added deferred and removed discussion labels Jan 6, 2015
@mef
Copy link
Author

mef commented Jan 8, 2015

With a header row, all rows need to have the same number of fields

It might be worth adding this precision inside the documentation (The Parse Config Object > Config Options > header)

@mholt
Copy link
Owner

mholt commented Jan 8, 2015

I think that's a good idea!

@bluej100
Copy link
Contributor

skipEmptyLines would be one solution here, no?

@vbenso
Copy link

vbenso commented Jan 12, 2015

What if the file has an empty line before the header's row? First line empty? Papa parses the rest of the file giving the "TooManyFields" error to every single row.

@mholt
Copy link
Owner

mholt commented Jan 12, 2015

Oh, yes, skipEmptyLines should do the trick. @vbenso I saw your issue #154 - we'll have to look into that to make sure it's a bug since combining header and skipEmptyLines should do what want.

I think that config option should do the trick then.

@jseabold
Copy link

FWIW, this happens not only on files that not end with an empty line but also on files that end with a linebreak as most files do (or should).

@mholt
Copy link
Owner

mholt commented Jul 16, 2017

@jseabold Ending with an "empty line" and ending with a "line break" are the same thing.

@jseabold
Copy link

Sure. I guess I misread the example in the other issue. Isn't every line supposed to end with a newline? Anyway, fwiw, found this confusing for a bit until I came across these issues.

@s2131
Copy link

s2131 commented Dec 14, 2017

Hi, stumbled upon this topic while looking for different problem. Just have one comment for CSV parsing:
According to this CSV specification https://tools.ietf.org/html/rfc4180 in a valid csv:
"The last record in the file may or may not have an ending line break."

Therefore according to RFC4180 CSV standard whether last line is empty (previous line contains line ending) or last content line ends with just eof does not break CSV compatibility.

@pokoli
Copy link
Collaborator

pokoli commented Dec 14, 2017

@s2131 IIUC the last line is not empty as it contains the line values but not the line break. That's not an empty line.

@chrislong
Copy link

chrislong commented Nov 25, 2019

On Unix and Linux systems it has been standard for decades for the last character of a text-type file to be a newline. This behavior fails on such files, and, worse, fails to parse files that comply with the spec. I think the notion that a spec-compliant file might have missing data to be not at all sufficient for breaking the standard.

@pavloDeshko
Copy link

pavloDeshko commented Feb 24, 2020

The last record in the file may or may not have an ending line break.

So according to format defenition, trailing line ending belongs to the previous record. So its presence (intentional or not) should't imply one more record. But after Papa Paprse it will be there - the empty record, header mode or not.

aahna-ashina added a commit to nation3/nationcred-datasets that referenced this issue Nov 3, 2022
Delete empty last line, to make the file compatible with PapaParse:  mholt/PapaParse#150
wu-lee pushed a commit to DigitalCommons/mykomap that referenced this issue Mar 13, 2024
The the parser sees an EOL on the final line as denoting an invalid
blank line. When in fact that's kind of normal for CSVs, at least on
Unix. And so we will get an error on every CSV we parse with such a
final EOL. We probably care about spurious blank lines less than
spurious errors like this, so let's use this option, which seems to be
the only (simple?) workaround.

Discussed here: mholt/PapaParse#150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants