Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91156: Use locale.getencoding() instead of getpreferredencoding #91732

Merged
merged 7 commits into from
Apr 22, 2022

Conversation

methane
Copy link
Member

@methane methane commented Apr 20, 2022

No description provided.

instead of locale.getpreferredencoding()
@methane methane requested a review from vstinner April 20, 2022 09:05
@methane methane requested a review from a team as a code owner April 20, 2022 09:05
@@ -737,7 +737,7 @@ def write(self, file_or_filename,
if enc_lower == "unicode":
# Retrieve the default encoding for the xml declaration
import locale
declared_encoding = locale.getpreferredencoding()
declared_encoding = locale.getpreferredencoding(False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep using getpreferredencoding(False) here because this function should use UTF-8 in UTF-8 mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to write a separated PR for this change? It's not directly unrelated, and you should document it with a separated NEWS entry.

self.addCleanup(setattr, locale, 'getpreferredencoding',
getpreferredencoding)
locale.getpreferredencoding = lambda: 'ascii'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hack doesn't work for most cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove import locale at the top.

This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8") since Python 3.3: commit 82ac9bc.

Lib/test/pythoninfo.py Outdated Show resolved Hide resolved
@@ -737,7 +737,7 @@ def write(self, file_or_filename,
if enc_lower == "unicode":
# Retrieve the default encoding for the xml declaration
import locale
declared_encoding = locale.getpreferredencoding()
declared_encoding = locale.getpreferredencoding(False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to write a separated PR for this change? It's not directly unrelated, and you should document it with a separated NEWS entry.

@@ -37,7 +37,7 @@ Linux and the BSD variants of Unix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark unrelated to your PR.

you have to call:func:locale.setlocale in the application

This is wrong: Python now always call setlocale(LC_CTYPE, "") at startup.

Calling locale.setlocale(locale.LC_ALL, '') is no longer needed.

Moreover, curses likely use mbstowcs() and wcstombs() functions, rather than nl_langinfo() (nl_langinfo(CODESET)?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's remove this note.

@@ -924,7 +924,7 @@ the following methods and attributes:
Encoding used to encode method arguments (Unicode strings and characters).
The encoding attribute is inherited from the parent window when a subwindow
is created, for example with :meth:`window.subwin`. By default, the locale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to replace "locale encoding" with "current locale encoding"? Just to remind that it can be changed at runtime.

For example, the encoding used by the readline module is the currrent encoding, the encoding is not stored anywere. Its C code uses PyUnicode_EncodeLocale() and PyUnicode_DecodeLocale(): current LC_CTYPE locale encoding.

Lib/test/support/__init__.py Outdated Show resolved Hide resolved
@@ -533,6 +533,12 @@ def test_defaults_UTF8(self):
if orig_getlocale is not None:
_locale._getdefaultlocale = orig_getlocale

def test_getencoding(self):
# Invoke getencoding to make sure it does not cause exceptions.
enc = locale.getencoding()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to test the type: add self.assertIsInstance(enc, str).

Maybe also ensure that the string is not empty? add self.assertNotEqual(enc, "").

# Invoke getencoding to make sure it does not cause exceptions.
enc = locale.getencoding()
# make sure it is valid
codecs.lookup(enc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyUnicode_InitEncodings() fails it config.filesystem_encoding or config.stdio_encoding is not known by codecs.lookup(name). So this call should not fail.

If tomorrow this test fails, I suggest to remove it and only check that the string is non-empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When UTF-8 mode is enabled, both of stdio encoding and filesystem encoding are UTF-8, not locale encoding.

self.addCleanup(setattr, locale, 'getpreferredencoding',
getpreferredencoding)
locale.getpreferredencoding = lambda: 'ascii'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove import locale at the top.

This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8") since Python 3.3: commit 82ac9bc.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It seems like this PR fix multiple bugs in the doc and mojibake issues when the UTF-8 mode is used, nice!

@vstinner
Copy link
Member

patchcheck failed on the Azure Ubuntu job:

Getting the list of files that have been added/changed ... 14 files
Fixing Python file whitespace ... 1 file:
  Lib/test/test_mimetypes.py
Fixing C file whitespace ... 0 files
Fixing docs whitespace ... 0 files
Please fix the 1 file(s) with whitespace issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants