-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled. #91156
Comments
Currently,
I am not sure that UTF-8 mode becomes the default or not. So I think Currently, UTF-8 mode affects to |
I created a related topic on discuss.python.org. If we recommend If we don't change |
There are multiple "locale encodings":
Include/pyport.h: #if defined(__ANDROID__) || defined(__VXWORKS__)
// Use UTF-8 as the locale encoding, ignore the LC_CTYPE locale.
// See _Py_GetLocaleEncoding(), PyUnicode_DecodeLocale()
// and PyUnicode_EncodeLocale().
# define _Py_FORCE_UTF8_LOCALE
#endif
#if defined(_Py_FORCE_UTF8_LOCALE) || defined(__APPLE__)
// Use UTF-8 as the filesystem encoding.
// See PyUnicode_DecodeFSDefaultAndSize(), PyUnicode_EncodeFSDefault(),
// Py_DecodeLocale() and Py_EncodeLocale().
# define _Py_FORCE_UTF8_FS_ENCODING
#endif See bpo-43552 "Add locale.get_locale_encoding() and locale.get_current_locale_encoding()" (rejected). Marc-Andre Lemburg dislikes locale.getpreferredencoding(False) API and suggested adding a new function locale.getencoding() with no argument: |
If you want to change the default, would it be possible to add a function to get this encoding? |
I created another topic relating this issue. If we add another option (e.g. legacy_text_encoding), we do not need to change UTF-8 mode behavior. |
FWIW: I don't think the "locale" encoding is a good idea. Instead of When it comes to encodings, explicit is better than implicit. If an application wants to work with some user defined locale settings, There are too many ways this can be done and trying to build |
I would like to deprecate getlocale(), see bpo-43557. |
IMO it's a different use case and it should be a different thing. Changing encoding="locale" today is too late, since it's already shipped in Python 3.10 (PEP-597). I proposed the "current locale" name to distinguish it from the existing "locale":
The unclear part to me is if "current locale" must change if the LC_CTYPE locale is changed, or if it should be read once at startup and then never change. There *are* use case to really read the *current* LC_CTYPE locale encoding. There is already C API for that:
See also the "current_locale" parameter of the private API _Py_EncodeLocaleEx() and _Py_DecodeLocaleEx(). |
I propose:
None of these functions do locale.setlocale(locale.LC_CTYPE, "") to get the user preferred encoding. Only the locale.getpreferredencoding() function uses locale.setlocale(locale.LC_CTYPE, ""). Usage of locale.getpreferredencoding() should be discouraged in the documentation, but I don't think that it can be deprecated and scheduled for removal right now: too much code rely on it :-( --- So we have 3 encodings:
Examples of usage:
|
Yes, althoguh PYTHONLEGACYWINDOWSFSENCODING takes priority.
I proposed
Hmm, I don't add it to the PEP-686 because it is not relating to UTF-8 mode nor EncodingWarning. Since
Note that we have |
sys.getlocaleencoding() versus locale.getencoding(). For me, the Python locale module should use the C API to access the Unix locales like LC_CTYPE, nl_langinfo(CODESET), etc. The sys module are more for things specific to Python, like sys.getfilesystemencoding(). Since sys.getlocaleencoding() would be a fixed value for the whole process life time, I agree that the sys module is a better place. I can write a PR adding sys.getlocaleencoding() if we agree on the API. |
I am not sure about we really need "locale encoding at Python startup". For this issue, I don't want to change On the other hand, I know Eryk wants to support locale on Windows. So |
@vstiner Since UTF-8 mode affects If no objections, I will choose |
Please see https://bugs.python.org/issue47000#msg415769 for what Victor In particular, the locale module uses the "no underscore" convention. I would like to reiterate my concern with the "locale" encoding, though. As mentioned earlier, I believe it adds too much magic. It would be better It's better to expose easy to use APIs to access the various different After all, Mojibake potentially corrupts important data, without the |
Of course, I read it.
Victor didn't mention about "no underscore" convention.
I don't recommend to use "locale" encoding for users.
In some case, user need to decide "not change the encoding for now".
Changing the default encoding will temporary increase this risk. |
…H-91732) Co-authored-by: Victor Stinner <[email protected]>
io.text_encoding()
respects UTF-8 mode #32003locale.getencoding()
#32068Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: