gh-91156: Use `locale.getencoding()` instead of getpreferredencoding #91732

methane · 2022-04-20T09:05:34Z

No description provided.

instead of locale.getpreferredencoding()

methane · 2022-04-20T09:08:20Z

Lib/xml/etree/ElementTree.py

@@ -737,7 +737,7 @@ def write(self, file_or_filename,
                if enc_lower == "unicode":
                    # Retrieve the default encoding for the xml declaration
                    import locale
-                    declared_encoding = locale.getpreferredencoding()
+                    declared_encoding = locale.getpreferredencoding(False)


I keep using getpreferredencoding(False) here because this function should use UTF-8 in UTF-8 mode.

Would you mind to write a separated PR for this change? It's not directly unrelated, and you should document it with a separated NEWS entry.

methane · 2022-04-20T09:09:29Z

Lib/test/test_mimetypes.py

-        self.addCleanup(setattr, locale, 'getpreferredencoding',
-                                 getpreferredencoding)
-        locale.getpreferredencoding = lambda: 'ascii'
-


This hack doesn't work for most cases.

Please remove import locale at the top.

This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8") since Python 3.3: commit 82ac9bc.

Lib/test/pythoninfo.py

vstinner · 2022-04-20T09:36:57Z

Lib/xml/etree/ElementTree.py

@@ -737,7 +737,7 @@ def write(self, file_or_filename,
                if enc_lower == "unicode":
                    # Retrieve the default encoding for the xml declaration
                    import locale
-                    declared_encoding = locale.getpreferredencoding()
+                    declared_encoding = locale.getpreferredencoding(False)


Would you mind to write a separated PR for this change? It's not directly unrelated, and you should document it with a separated NEWS entry.

Co-authored-by: Victor Stinner <[email protected]>

vstinner · 2022-04-20T11:01:35Z

Doc/library/curses.rst

@@ -37,7 +37,7 @@ Linux and the BSD variants of Unix.



Remark unrelated to your PR.

you have to call:func:locale.setlocale in the application

This is wrong: Python now always call setlocale(LC_CTYPE, "") at startup.

Calling locale.setlocale(locale.LC_ALL, '') is no longer needed.

Moreover, curses likely use mbstowcs() and wcstombs() functions, rather than nl_langinfo() (nl_langinfo(CODESET)?).

OK. Let's remove this note.

vstinner · 2022-04-20T11:07:04Z

Doc/library/curses.rst

@@ -924,7 +924,7 @@ the following methods and attributes:
   Encoding used to encode method arguments (Unicode strings and characters).
   The encoding attribute is inherited from the parent window when a subwindow
   is created, for example with :meth:`window.subwin`. By default, the locale


Would you mind to replace "locale encoding" with "current locale encoding"? Just to remind that it can be changed at runtime.

For example, the encoding used by the readline module is the currrent encoding, the encoding is not stored anywere. Its C code uses PyUnicode_EncodeLocale() and PyUnicode_DecodeLocale(): current LC_CTYPE locale encoding.

Lib/test/support/__init__.py

vstinner · 2022-04-20T11:10:24Z

Lib/test/test_locale.py

@@ -533,6 +533,12 @@ def test_defaults_UTF8(self):
            if orig_getlocale is not None:
                _locale._getdefaultlocale = orig_getlocale

+    def test_getencoding(self):
+        # Invoke getencoding to make sure it does not cause exceptions.
+        enc = locale.getencoding()


I suggest to test the type: add self.assertIsInstance(enc, str).

Maybe also ensure that the string is not empty? add self.assertNotEqual(enc, "").

vstinner · 2022-04-20T11:13:39Z

Lib/test/test_locale.py

+        # Invoke getencoding to make sure it does not cause exceptions.
+        enc = locale.getencoding()
+        # make sure it is valid
+        codecs.lookup(enc)


_PyUnicode_InitEncodings() fails it config.filesystem_encoding or config.stdio_encoding is not known by codecs.lookup(name). So this call should not fail.

If tomorrow this test fails, I suggest to remove it and only check that the string is non-empty.

When UTF-8 mode is enabled, both of stdio encoding and filesystem encoding are UTF-8, not locale encoding.

vstinner · 2022-04-20T11:18:04Z

Lib/test/test_mimetypes.py

-        self.addCleanup(setattr, locale, 'getpreferredencoding',
-                                 getpreferredencoding)
-        locale.getpreferredencoding = lambda: 'ascii'
-


Please remove import locale at the top.

This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8") since Python 3.3: commit 82ac9bc.

Co-authored-by: Victor Stinner <[email protected]>

vstinner

LGTM. It seems like this PR fix multiple bugs in the doc and mojibake issues when the UTF-8 mode is used, nice!

vstinner · 2022-04-21T11:27:25Z

patchcheck failed on the Azure Ubuntu job:

Getting the list of files that have been added/changed ... 14 files
Fixing Python file whitespace ... 1 file:
  Lib/test/test_mimetypes.py
Fixing C file whitespace ... 0 files
Fixing docs whitespace ... 0 files
Please fix the 1 file(s) with whitespace issues

Use locale.getencoding()

0bf6bfc

instead of locale.getpreferredencoding()

methane added the skip news label Apr 20, 2022

methane requested a review from vstinner April 20, 2022 09:05

methane requested a review from a team as a code owner April 20, 2022 09:05

bedevere-bot added the awaiting core review label Apr 20, 2022

methane commented Apr 20, 2022

View reviewed changes

vstinner reviewed Apr 20, 2022

View reviewed changes

methane and others added 2 commits April 20, 2022 19:10

Revert etree changes.

8cd88b9

Apply suggestions from code review

3005858

Co-authored-by: Victor Stinner <[email protected]>

vstinner reviewed Apr 20, 2022

View reviewed changes

methane and others added 3 commits April 21, 2022 11:44

Apply suggestions from code review

951875a

Co-authored-by: Victor Stinner <[email protected]>

Apply suggested changes.

38601b0

Merge branch 'main' into locale-getencoding

b9f088b

vstinner approved these changes Apr 21, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Apr 21, 2022

Remove trailing spaces

d8e8c26

methane merged commit 1317b70 into python:main Apr 22, 2022

methane deleted the locale-getencoding branch April 22, 2022 01:39

bedevere-bot removed the awaiting merge label Apr 22, 2022

iritkatriel mentioned this pull request Nov 7, 2022

bpo-41091: Remove recommendation in curses module documentation to initialize LC_ALL and encode strings #21159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-91156: Use `locale.getencoding()` instead of getpreferredencoding #91732

gh-91156: Use `locale.getencoding()` instead of getpreferredencoding #91732

methane commented Apr 20, 2022

methane Apr 20, 2022

vstinner Apr 20, 2022

methane Apr 20, 2022

vstinner Apr 20, 2022

vstinner Apr 20, 2022

vstinner Apr 20, 2022

methane Apr 21, 2022

vstinner Apr 20, 2022

vstinner Apr 20, 2022

vstinner Apr 20, 2022

methane Apr 21, 2022

vstinner Apr 20, 2022

vstinner left a comment

vstinner commented Apr 21, 2022

gh-91156: Use locale.getencoding() instead of getpreferredencoding #91732

gh-91156: Use locale.getencoding() instead of getpreferredencoding #91732

Conversation

methane commented Apr 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner left a comment

Choose a reason for hiding this comment

vstinner commented Apr 21, 2022

gh-91156: Use `locale.getencoding()` instead of getpreferredencoding #91732

gh-91156: Use `locale.getencoding()` instead of getpreferredencoding #91732