-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-91156: Use locale.getencoding()
instead of getpreferredencoding
#91732
Conversation
instead of locale.getpreferredencoding()
Lib/xml/etree/ElementTree.py
Outdated
@@ -737,7 +737,7 @@ def write(self, file_or_filename, | |||
if enc_lower == "unicode": | |||
# Retrieve the default encoding for the xml declaration | |||
import locale | |||
declared_encoding = locale.getpreferredencoding() | |||
declared_encoding = locale.getpreferredencoding(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep using getpreferredencoding(False) here because this function should use UTF-8 in UTF-8 mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to write a separated PR for this change? It's not directly unrelated, and you should document it with a separated NEWS entry.
self.addCleanup(setattr, locale, 'getpreferredencoding', | ||
getpreferredencoding) | ||
locale.getpreferredencoding = lambda: 'ascii' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hack doesn't work for most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove import locale
at the top.
This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8")
since Python 3.3: commit 82ac9bc.
Lib/xml/etree/ElementTree.py
Outdated
@@ -737,7 +737,7 @@ def write(self, file_or_filename, | |||
if enc_lower == "unicode": | |||
# Retrieve the default encoding for the xml declaration | |||
import locale | |||
declared_encoding = locale.getpreferredencoding() | |||
declared_encoding = locale.getpreferredencoding(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to write a separated PR for this change? It's not directly unrelated, and you should document it with a separated NEWS entry.
Co-authored-by: Victor Stinner <[email protected]>
Doc/library/curses.rst
Outdated
@@ -37,7 +37,7 @@ Linux and the BSD variants of Unix. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark unrelated to your PR.
you have to call:func:
locale.setlocale
in the application
This is wrong: Python now always call setlocale(LC_CTYPE, "")
at startup.
Calling locale.setlocale(locale.LC_ALL, '')
is no longer needed.
Moreover, curses likely use mbstowcs() and wcstombs() functions, rather than nl_langinfo() (nl_langinfo(CODESET)?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's remove this note.
Doc/library/curses.rst
Outdated
@@ -924,7 +924,7 @@ the following methods and attributes: | |||
Encoding used to encode method arguments (Unicode strings and characters). | |||
The encoding attribute is inherited from the parent window when a subwindow | |||
is created, for example with :meth:`window.subwin`. By default, the locale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to replace "locale encoding" with "current locale encoding"? Just to remind that it can be changed at runtime.
For example, the encoding used by the readline module is the currrent encoding, the encoding is not stored anywere. Its C code uses PyUnicode_EncodeLocale() and PyUnicode_DecodeLocale(): current LC_CTYPE locale encoding.
@@ -533,6 +533,12 @@ def test_defaults_UTF8(self): | |||
if orig_getlocale is not None: | |||
_locale._getdefaultlocale = orig_getlocale | |||
|
|||
def test_getencoding(self): | |||
# Invoke getencoding to make sure it does not cause exceptions. | |||
enc = locale.getencoding() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to test the type: add self.assertIsInstance(enc, str)
.
Maybe also ensure that the string is not empty? add self.assertNotEqual(enc, "")
.
# Invoke getencoding to make sure it does not cause exceptions. | ||
enc = locale.getencoding() | ||
# make sure it is valid | ||
codecs.lookup(enc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PyUnicode_InitEncodings() fails it config.filesystem_encoding or config.stdio_encoding is not known by codecs.lookup(name). So this call should not fail.
If tomorrow this test fails, I suggest to remove it and only check that the string is non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When UTF-8 mode is enabled, both of stdio encoding and filesystem encoding are UTF-8, not locale encoding.
self.addCleanup(setattr, locale, 'getpreferredencoding', | ||
getpreferredencoding) | ||
locale.getpreferredencoding = lambda: 'ascii' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove import locale
at the top.
This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8")
since Python 3.3: commit 82ac9bc.
Co-authored-by: Victor Stinner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It seems like this PR fix multiple bugs in the doc and mojibake issues when the UTF-8 mode is used, nice!
patchcheck failed on the Azure Ubuntu job:
|
No description provided.