-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8-BOM string parsing - header first name incorrectly enclosed in a double quote #840
Comments
This seems to be related to the fact that I read the input file using fs.readFile(filename, 'utf-8') and this apparently doesn't strip off the BOM markers. I found #407 after posting. It would be useful if PapaParse would handle this itself instead. |
Solved by removing the first character of the readFile output: It may be useful to include this in PapaParse, to save many people encountering and struggling with this repeatedly. |
Could you please submit a pull request that add your code to papaparse and adds a test to ensure the behaviour? We should read the first caracter before setting the encodding and if it is the BOM, we remove it and force the encoding to UTF-8. |
Hi. Is there any update regarding this issue? I believe I've also encountered it. Here is my case: csv file content:
my code:
Output:
I've run an addition check:
In the output picture a whitespace character before 'Id' can be seen, but it's get lost when I copy the output. |
I'm having the same issue in 2022. I was given some external CSV file, probably edited/written on Windows, processing it on Linux with papaparse and I was unable to access the first row property defined by the header. When I
I edited the original CSV and simply retyped the first character in the head, then reran:
I'm using |
I went with this approach, not the most efficient:
Since csvFile is a read stream, not a pre-read file, I just tossed it in there for each step. I could do it only for the 1st step and skip if its anything but the 1st row. |
When a file is encoded as UTF-8-BOM, PapaParse CSV to Json incorrectly returns the records with the first object key name enclosed in a single quote. One cannot then reference the field called name (example below). record.name then doesn't exist. The field is record.'name' which is not easily accessible in JavaScript using record.name or record[name] etc. You can only see by printing the record to the console, or using a for-in loop.
The subsequent object keys are correct without quotes.
Change the file encoding to UTF-8 and the keys are normal, without a quote.
Papa.parse(csvData, PapaConfig)
csvData (subset):
UTF-8-BOM encoding:
UTF-8 encoding:
Excel exports csv files to UTF-8-BOM, possibly because that encoding is supposedly faster and more reliable.
Can PapaParse be changed to handle UTF-8-BOM correctly?
The text was updated successfully, but these errors were encountered: