-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic character type encoding? #804
Comments
Can you explain this a little more? I think the values probably are |
You are right, here is the output
|
There are two separate issues here:
|
It doesn't seem like there's anything actionable for CSV.jl here, right? |
Well you could add a dependency on StringEncodings and on an encoding detector, but I guess we don't want to do that. Maybe the day optional dependencies will be supported... |
Well, I was going off the comment of "auto string encoding detection isn't very reliable", so I figured we wouldn't want to get into that. |
I'm working with Japanese data right now, and it's encoded in
SHIFT_JIS
, except the metadata of the file doesn't know that. So it's gibberish when you try to open it in, say, a text editor.CSV fails to read it, giving columns of byte arrays instead of strings. But
uchardet
worked successfully for at least one of the files. So maybe there is room for improvement.It failed on R, on linux but not windows, curiously.
The text was updated successfully, but these errors were encountered: