-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't read xls with non UTF-8 encoding #564
Comments
I can reproduce what you're seeing. It has nothing to do with the loop. It's the files, which seem to have Latin-1 encoded contents. And readxl (libxls, maybe?) can't deal with it. I assume these xls files were written by some 3rd party tool. Sometimes 3rd party tools write files that technically comply with the spec but are very unusual. Excel itself can read them but other xls-reading tools -- like readxl/liblxs -- may struggle. Opening them in Excel normalizes the encoding problem away and subsequent attempts to read them from readxl work. I suspect you see success with your first file, then not with your next 5, because someone had already done this once with the first file. I'll report this upstream. You could report your problems to whatever ?instrument vendor? is producing these xls files. If you have a way to get this data with a less exotic format, e.g. csv, I would take it. |
Having updated libxls in 2c20f5c, I tried these files again and now I get a segfault:
Given the claim in libxls that it can read these files (libxls/libxls#55), I need to build xls2csv and see if that still holds / holds for me. Then I can figure out if the problem lies in readxl. |
I can read these files with xls2csv built from libxls v1.6.2, so the problem lies in readxl. |
Just replicated both the segfault with readxl and the success with xls2csv (no great surprise). |
When I open so I suspect this is truly a malformed file. I.e. a file written by some "creative" 3rd party tool that would never be created by Excel and that, after opening in Excel, does not retain its weirdness. In the end, I don't think this was really about encoding, but rather a malformed cell format. However I have found a relatively harmless null pointer check that allows the file, in all of it peculiar glory, to be read without a segfault. |
I want to read xls files in loop but the read_xls seems can only read the first xls file, but when I open the second or other xls file in EXCEL2016 window(PS:NO file broken or other infomation occured), then close it without any changes.
Rerun the code below , then the read_xls can read the second xls file. I am so confused.
I have put the data here Data
My code show as below
rerun the code after I open the second XLS file
The text was updated successfully, but these errors were encountered: