Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91156: Use locale.getencoding() instead of getpreferredencoding #91732

Merged
merged 7 commits into from
Apr 22, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions Doc/howto/curses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,8 +299,7 @@ The :meth:`~curses.window.addstr` method takes a Python string or
bytestring as the value to be displayed. The contents of bytestrings
are sent to the terminal as-is. Strings are encoded to bytes using
the value of the window's :attr:`encoding` attribute; this defaults to
the default system encoding as returned by
:func:`locale.getpreferredencoding`.
the default system encoding as returned by :func:`locale.getencoding`.

The :meth:`~curses.window.addch` methods take a character, which can be
either a string of length 1, a bytestring of length 1, or an integer.
Expand Down
2 changes: 1 addition & 1 deletion Doc/library/csv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,7 @@ The corresponding simplest possible writing example is::

Since :func:`open` is used to open a CSV file for reading, the file
will by default be decoded into unicode using the system default
encoding (see :func:`locale.getpreferredencoding`). To decode a file
encoding (see :func:`locale.getencoding`). To decode a file
using a different encoding, use the ``encoding`` argument of open::

import csv
Expand Down
4 changes: 2 additions & 2 deletions Doc/library/curses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Linux and the BSD variants of Unix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark unrelated to your PR.

you have to call:func:locale.setlocale in the application

This is wrong: Python now always call setlocale(LC_CTYPE, "") at startup.

Calling locale.setlocale(locale.LC_ALL, '') is no longer needed.

Moreover, curses likely use mbstowcs() and wcstombs() functions, rather than nl_langinfo() (nl_langinfo(CODESET)?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's remove this note.

import locale
locale.setlocale(locale.LC_ALL, '')
code = locale.getpreferredencoding()
code = locale.getencoding()

Then use *code* as the encoding for :meth:`str.encode` calls.

Expand Down Expand Up @@ -924,7 +924,7 @@ the following methods and attributes:
Encoding used to encode method arguments (Unicode strings and characters).
The encoding attribute is inherited from the parent window when a subwindow
is created, for example with :meth:`window.subwin`. By default, the locale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to replace "locale encoding" with "current locale encoding"? Just to remind that it can be changed at runtime.

For example, the encoding used by the readline module is the currrent encoding, the encoding is not stored anywere. Its C code uses PyUnicode_EncodeLocale() and PyUnicode_DecodeLocale(): current LC_CTYPE locale encoding.

encoding is used (see :func:`locale.getpreferredencoding`).
encoding is used (see :func:`locale.getencoding`).

.. versionadded:: 3.3

Expand Down
11 changes: 5 additions & 6 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1123,8 +1123,8 @@ are always available. They are listed here in alphabetical order.
(which on *some* Unix systems, means that *all* writes append to the end of
the file regardless of the current seek position). In text mode, if
*encoding* is not specified the encoding used is platform-dependent:
``locale.getpreferredencoding(False)`` is called to get the current locale
encoding. (For reading and writing raw bytes use binary mode and leave
:func:`locale.getencoding()` is called to get the current locale encoding.
(For reading and writing raw bytes use binary mode and leave
*encoding* unspecified.) The available modes are:

.. _filemodes:
Expand Down Expand Up @@ -1179,10 +1179,9 @@ are always available. They are listed here in alphabetical order.

*encoding* is the name of the encoding used to decode or encode the file.
This should only be used in text mode. The default encoding is platform
dependent (whatever :func:`locale.getpreferredencoding` returns), but any
:term:`text encoding` supported by Python
can be used. See the :mod:`codecs` module for
the list of supported encodings.
dependent (whatever :func:`locale.getencoding` returns), but any
:term:`text encoding` supported by Python can be used.
See the :mod:`codecs` module for the list of supported encodings.

*errors* is an optional string that specifies how encoding and decoding
errors are to be handled—this cannot be used in binary mode.
Expand Down
6 changes: 3 additions & 3 deletions Doc/library/os.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,15 +105,15 @@ of the UTF-8 encoding:

* Use UTF-8 as the :term:`filesystem encoding <filesystem encoding and error
handler>`.
* :func:`sys.getfilesystemencoding()` returns ``'UTF-8'``.
* :func:`locale.getpreferredencoding()` returns ``'UTF-8'`` (the *do_setlocale*
* :func:`sys.getfilesystemencoding()` returns ``'utf-8'``.
* :func:`locale.getpreferredencoding()` returns ``'utf-8'`` (the *do_setlocale*
argument has no effect).
* :data:`sys.stdin`, :data:`sys.stdout`, and :data:`sys.stderr` all use
UTF-8 as their text encoding, with the ``surrogateescape``
:ref:`error handler <error-handlers>` being enabled for :data:`sys.stdin`
and :data:`sys.stdout` (:data:`sys.stderr` continues to use
``backslashreplace`` as it does in the default locale-aware mode)
* On Unix, :func:`os.device_encoding` returns ``'UTF-8'`` rather than the
* On Unix, :func:`os.device_encoding` returns ``'utf-8'`` rather than the
device encoding.

Note that the standard stream settings in UTF-8 mode can be overridden by
Expand Down
3 changes: 1 addition & 2 deletions Lib/test/libregrtest/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -482,8 +482,7 @@ def display_header(self):
if cpu_count:
print("== CPU count:", cpu_count)
print("== encodings: locale=%s, FS=%s"
% (locale.getpreferredencoding(False),
sys.getfilesystemencoding()))
% (locale.getencoding(), sys.getfilesystemencoding()))

def get_tests_result(self):
result = []
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/pythoninfo.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def collect_platform(info_add):
def collect_locale(info_add):
import locale

info_add('locale.encoding', locale.getpreferredencoding(False))
info_add('locale.getencoding', locale.getencoding())


def collect_builtins(info_add):
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/support/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1445,7 +1445,7 @@ def skip_if_buggy_ucrt_strfptime(test):
global _buggy_ucrt
if _buggy_ucrt is None:
if(sys.platform == 'win32' and
locale.getpreferredencoding(False) == 'cp65001' and
locale.getencoding() == 'cp65001' and
methane marked this conversation as resolved.
Show resolved Hide resolved
time.localtime().tm_zone == ''):
_buggy_ucrt = True
else:
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test__locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def setUpModule():
locale.setlocale(locale.LC_ALL, loc)
except Error:
continue
encoding = locale.getpreferredencoding(False)
encoding = locale.getencoding()
try:
localeconv()
except Exception as err:
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test_builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -1204,7 +1204,7 @@ def test_open_default_encoding(self):
del os.environ[key]

self.write_testfile()
current_locale_encoding = locale.getpreferredencoding(False)
current_locale_encoding = locale.getencoding()
with warnings.catch_warnings():
warnings.simplefilter("ignore", EncodingWarning)
fp = open(TESTFN, 'w')
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test_cmd_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ def test_undecodable_code(self):
code = (
b'import locale; '
b'print(ascii("' + undecodable + b'"), '
b'locale.getpreferredencoding())')
b'locale.getencoding())')
p = subprocess.Popen(
[sys.executable, "-c", code],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -2726,7 +2726,7 @@ def test_default_encoding(self):
if key in os.environ:
del os.environ[key]

current_locale_encoding = locale.getpreferredencoding(False)
current_locale_encoding = locale.getencoding()
b = self.BytesIO()
with warnings.catch_warnings():
warnings.simplefilter("ignore", EncodingWarning)
Expand Down
8 changes: 7 additions & 1 deletion Lib/test/test_locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ class TestEnUSCollation(BaseLocalizedTest, TestCollation):
locale_type = locale.LC_ALL

def setUp(self):
enc = codecs.lookup(locale.getpreferredencoding(False) or 'ascii').name
enc = codecs.lookup(locale.getencoding() or 'ascii').name
if enc not in ('utf-8', 'iso8859-1', 'cp1252'):
raise unittest.SkipTest('encoding not suitable')
if enc != 'iso8859-1' and (sys.platform == 'darwin' or is_android or
Expand Down Expand Up @@ -533,6 +533,12 @@ def test_defaults_UTF8(self):
if orig_getlocale is not None:
_locale._getdefaultlocale = orig_getlocale

def test_getencoding(self):
# Invoke getencoding to make sure it does not cause exceptions.
enc = locale.getencoding()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to test the type: add self.assertIsInstance(enc, str).

Maybe also ensure that the string is not empty? add self.assertNotEqual(enc, "").

# make sure it is valid
codecs.lookup(enc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyUnicode_InitEncodings() fails it config.filesystem_encoding or config.stdio_encoding is not known by codecs.lookup(name). So this call should not fail.

If tomorrow this test fails, I suggest to remove it and only check that the string is non-empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When UTF-8 mode is enabled, both of stdio encoding and filesystem encoding are UTF-8, not locale encoding.


def test_getpreferredencoding(self):
# Invoke getpreferredencoding to make sure it does not cause exceptions.
enc = locale.getpreferredencoding()
Expand Down
5 changes: 0 additions & 5 deletions Lib/test/test_mimetypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,11 +145,6 @@ def test_guess_all_types(self):
self.assertNotIn('.no-such-ext', all)

def test_encoding(self):
getpreferredencoding = locale.getpreferredencoding
self.addCleanup(setattr, locale, 'getpreferredencoding',
getpreferredencoding)
locale.getpreferredencoding = lambda: 'ascii'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hack doesn't work for most cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove import locale at the top.

This code is correct. I don't think that the locale encoding is still used: MimeType.read() calls open(filename, encoding="utf-8") since Python 3.3: commit 82ac9bc.

filename = support.findfile("mime.types")
mimes = mimetypes.MimeTypes([filename])
exts = mimes.guess_all_extensions('application/vnd.geocube+xml',
Expand Down