-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode: encoding/decoding errors in python2 #315
Comments
This is relevant because it concerns non-normalized unicode filenames. It also hints at problems of unicode non-normalization in general. Or should I say: co-normalization? As a matter of fact, in valid unicode there exist several equivalent normal forms, and identitical characters (glyphs) may be equivalently represented by different codepoints. Yes, the same unicode text, in the same UTF encoding and byte order, can be stored in different byte sequences which don't compare equal. For example:
So, it's on us to be aware of such things. One significant consequence is that a HSF+ filename generated by OS X may be different from the same name as handled by Linux (see original link above). This means that, no, we cannot enforce a certain normal form without potentially invalidating stuff like filenames. In general, this means that string data generated by the program in one environment is not in all cases interoperable with the same program in another environment. More stuff to watch out for, hooh yeah. |
This issue is really bugging me and preventing nearly 50% of the songs in my library from even appearing in the web front end. My server.log is simply overflowing with errors about decoding various filenames. I'll paste a few here for reference..... I'm going to be installing convmv and simply adding a hack to python so these errors run convmv on the files that throw exceptions. That tool will be able to fix the filenames and I really wish it was just something cherrymusic ran by itself or supported..... ERROR [2014-10-07 14:43:48,447] : cherrypy.error.139977052412496 : from line (201) at ERROR [2014-10-07 14:43:41,425] : cherrymusicserver.sqlitecache : from line (688) at ERROR [2014-10-07 14:43:41,427] : cherrymusicserver.sqlitecache : from line (688) at |
Hi @acidtonic! Thanks for your report! I just checked what could be wrong using the information you gave us and it seems the encoding of the filenames files and your file system encoding differ from one another. The file is encoded using UTF-8:
but your filesystem seems to be using ASCII, or maybe windows-1252 (but that would probably rather lead to mojibake in this case), as far as I can tell. Can you please post the output of the following command to find out if my theory is correct?
|
$ python2 -c "import sys; print sys.getfilesystemencoding()" I ended up solving the problem by downloading a tool called "detox" instead of the one I mentioned before. The command I used was .... (-n is dry-run for no changes) detox -s utf_8 -r -v -n * I do wish however that there was some screen that listed all the tracks that were skipped for import errors. So I can catch if anyone else using the server is accidentally missing tracks because of this without babysitting it. |
you can get that information from the error log file, which is probably located in ~/.local/share/cherrymusic/error.log
should do the trick |
For the 🍶 of Python 2, we need to ensure we're handling non-ASCII string data in the same uniform way that Python 3 includes by design. Internally, all strings should have the same representation.
Create a central place that takes strings (
str
,byte
orunicode
) of various encodings and returns them in standard unicode form (py2->unicode
, py3->str
). All external string data needs to be passed through as soon as possible, probably the moment it enters thecherrymusicserver
namespace.There should probably be two different implementations: one for each major version of Python.
External sources:
os
module: filesystem, text file contents (config!), ...sys
❓Reset or consider as external:
Please feel free to add other relevant sources, observations or notable consequences that are still missing.
The text was updated successfully, but these errors were encountered: