Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-47000: Add locale.getencoding() #32068

Merged
merged 13 commits into from
Apr 9, 2022
12 changes: 6 additions & 6 deletions Doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -706,15 +706,15 @@ Glossary

locale encoding
On Unix, it is the encoding of the LC_CTYPE locale. It can be set with
``locale.setlocale(locale.LC_CTYPE, new_locale)``.
:func:`locale.setlocale(locale.LC_CTYPE, new_locale) <locale.setlocale>`.

methane marked this conversation as resolved.
Show resolved Hide resolved
On Windows, it is the ANSI code page (ex: ``cp1252``).
On Windows, it is the ANSI code page (ex: ``"cp1252"``).

``locale.getpreferredencoding(False)`` can be used to get the locale
encoding.
On Android and VxWorks, Python uses ``"utf-8"`` as the locale encoding.

Python uses the :term:`filesystem encoding and error handler` to convert
between Unicode filenames and bytes filenames.
``locale.getencoding()`` can be used to get the locale encoding.
methane marked this conversation as resolved.
Show resolved Hide resolved

See also the :term:`filesystem encoding and error handler`.

list
A built-in Python :term:`sequence`. Despite its name it is more akin
Expand Down
24 changes: 22 additions & 2 deletions Doc/library/locale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,17 +327,37 @@ The :mod:`locale` module defines the following exception and functions:
is not necessary or desired, *do_setlocale* should be set to ``False``.

On Android or if the :ref:`Python UTF-8 Mode <utf8-mode>` is enabled, always
return ``'UTF-8'``, the :term:`locale encoding` and the *do_setlocale*
return ``'utf-8'``, the :term:`locale encoding` and the *do_setlocale*
argument are ignored.

The :ref:`Python preinitialization <c-preinit>` configures the LC_CTYPE
locale. See also the :term:`filesystem encoding and error handler`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to copy/paste this paragraph to getencoding() doc.


.. versionchanged:: 3.7
The function now always returns ``UTF-8`` on Android or if the
The function now always returns ``"utf-8"`` on Android or if the
:ref:`Python UTF-8 Mode <utf8-mode>` is enabled.


.. function:: getencoding()

Get the current :term:`locale encoding`:

* On Android and VxWorks, return ``"utf-8"``.
* On Unix, return the encoding of the current :data:`LC_CTYPE` locale.
Return ``"utf-8"`` if ``nl_langinfo(CODESET)`` returns an empty string:
for example, if the current LC_CTYPE locale is not supported.
* On Windows, return the ANSI code page.

The :ref:`Python preinitialization <c-preinit>` configures the LC_CTYPE
locale. See also the :term:`filesystem encoding and error handler`.

This function is similar to
:func:`getpreferredencoding(False) <getpreferredencoding>` except this
function ignores the :ref:`Python UTF-8 Mode <utf8-mode>`.

.. versionadded:: 3.11


.. function:: normalize(localename)

Returns a normalized locale code for the given locale name. The returned locale
Expand Down
7 changes: 7 additions & 0 deletions Doc/whatsnew/3.11.rst
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,13 @@ inspect
* Add :func:`inspect.ismethodwrapper` for checking if the type of an object is a
:class:`~types.MethodWrapperType`. (Contributed by Hakan Çelik in :issue:`29418`.)

locale
------

* Add :func:`locale.getencoding` to get the current locale encoding. It is similar to
``locale.getpreferredencoding(False)`` but ignores the
:ref:`Python UTF-8 Mode <utf8-mode>`.

math
----

Expand Down
24 changes: 12 additions & 12 deletions Lib/locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"setlocale", "resetlocale", "localeconv", "strcoll", "strxfrm",
"str", "atof", "atoi", "format", "format_string", "currency",
"normalize", "LC_CTYPE", "LC_COLLATE", "LC_TIME", "LC_MONETARY",
"LC_NUMERIC", "LC_ALL", "CHAR_MAX"]
"LC_NUMERIC", "LC_ALL", "CHAR_MAX", "getencoding"]

def _strcoll(a,b):
""" strcoll(string,string) -> int.
Expand Down Expand Up @@ -637,45 +637,45 @@ def resetlocale(category=LC_ALL):


try:
from _locale import _get_locale_encoding
from _locale import getencoding
except ImportError:
def _get_locale_encoding():
def getencoding():
if hasattr(sys, 'getandroidapilevel'):
# On Android langinfo.h and CODESET are missing, and UTF-8 is
# always used in mbstowcs() and wcstombs().
return 'UTF-8'
if sys.flags.utf8_mode:
return 'UTF-8'
return 'utf-8'
encoding = getdefaultlocale()[1]
if encoding is None:
# LANG not set, default conservatively to ASCII
encoding = 'ascii'
# LANG not set, default to UTF-8
encoding = 'utf-8'
return encoding

try:
CODESET
except NameError:
def getpreferredencoding(do_setlocale=True):
"""Return the charset that the user is likely using."""
return _get_locale_encoding()
if sys.flags.utf8_mode:
return 'utf-8'
return getencoding()
else:
# On Unix, if CODESET is available, use that.
def getpreferredencoding(do_setlocale=True):
"""Return the charset that the user is likely using,
according to the system configuration."""
if sys.flags.utf8_mode:
return 'UTF-8'
return 'utf-8'

if not do_setlocale:
return _get_locale_encoding()
return getencoding()

old_loc = setlocale(LC_CTYPE)
try:
try:
setlocale(LC_CTYPE, "")
except Error:
pass
return _get_locale_encoding()
return getencoding()
finally:
setlocale(LC_CTYPE, old_loc)

Expand Down
6 changes: 3 additions & 3 deletions Lib/test/test_utf8_mode.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,12 +203,12 @@ def test_pyio_encoding(self):
def test_locale_getpreferredencoding(self):
code = 'import locale; print(locale.getpreferredencoding(False), locale.getpreferredencoding(True))'
out = self.get_output('-X', 'utf8', '-c', code)
self.assertEqual(out, 'UTF-8 UTF-8')
self.assertEqual(out, 'utf-8 utf-8')

for loc in POSIX_LOCALES:
with self.subTest(LC_ALL=loc):
out = self.get_output('-X', 'utf8', '-c', code, LC_ALL=loc)
self.assertEqual(out, 'UTF-8 UTF-8')
self.assertEqual(out, 'utf-8 utf-8')

@unittest.skipIf(MS_WINDOWS, 'test specific to Unix')
def test_cmd_line(self):
Expand Down Expand Up @@ -276,7 +276,7 @@ def test_device_encoding(self):
# In UTF-8 Mode, device_encoding(fd) returns "UTF-8" if fd is a TTY
with open(filename, encoding="utf8") as fp:
out = fp.read().rstrip()
self.assertEqual(out, 'True UTF-8')
self.assertEqual(out, 'True utf-8')


if __name__ == "__main__":
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Add :func:`locale.getencoding` to get the current locale encoding.
It is similar to ``locale.getpreferredencoding(False)`` but ignores the
:ref:`Python UTF-8 Mode <utf8-mode>`.
8 changes: 7 additions & 1 deletion Modules/_io/textio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1145,7 +1145,13 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
}
}
if (encoding == NULL && self->encoding == NULL) {
self->encoding = _Py_GetLocaleEncodingObject();
if (_PyRuntime.preconfig.utf8_mode) {
_Py_DECLARE_STR(utf_8, "utf-8");
self->encoding = Py_NewRef(&_Py_STR(utf_8));
}
else {
self->encoding = _Py_GetLocaleEncodingObject();
}
if (self->encoding == NULL) {
goto error;
}
Expand Down
8 changes: 4 additions & 4 deletions Modules/_localemodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -773,14 +773,14 @@ _locale_bind_textdomain_codeset_impl(PyObject *module, const char *domain,


/*[clinic input]
_locale._get_locale_encoding
_locale.getencoding
Get the current locale encoding.
[clinic start generated code]*/

static PyObject *
_locale__get_locale_encoding_impl(PyObject *module)
/*[clinic end generated code: output=e8e2f6f6f184591a input=513d9961d2f45c76]*/
_locale_getencoding_impl(PyObject *module)
/*[clinic end generated code: output=86b326b971872e46 input=6503d11e5958b360]*/
{
return _Py_GetLocaleEncodingObject();
}
Expand Down Expand Up @@ -811,7 +811,7 @@ static struct PyMethodDef PyLocale_Methods[] = {
_LOCALE_BIND_TEXTDOMAIN_CODESET_METHODDEF
#endif
#endif
_LOCALE__GET_LOCALE_ENCODING_METHODDEF
_LOCALE_GETENCODING_METHODDEF
{NULL, NULL}
};

Expand Down
16 changes: 8 additions & 8 deletions Modules/clinic/_localemodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 9 additions & 9 deletions Python/fileutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ _Py_device_encoding(int fd)

return PyUnicode_FromFormat("cp%u", (unsigned int)cp);
#else
if (_PyRuntime.preconfig.utf8_mode) {
_Py_DECLARE_STR(utf_8, "utf-8");
return Py_NewRef(&_Py_STR(utf_8));
}
return _Py_GetLocaleEncodingObject();
#endif
}
Expand Down Expand Up @@ -873,10 +877,10 @@ _Py_EncodeLocaleEx(const wchar_t *text, char **str,

// Get the current locale encoding name:
//
// - Return "UTF-8" if _Py_FORCE_UTF8_LOCALE macro is defined (ex: on Android)
// - Return "UTF-8" if the UTF-8 Mode is enabled
// - Return "utf-8" if _Py_FORCE_UTF8_LOCALE macro is defined (ex: on Android)
// - Return "utf-8" if the UTF-8 Mode is enabled
// - On Windows, return the ANSI code page (ex: "cp1250")
// - Return "UTF-8" if nl_langinfo(CODESET) returns an empty string.
// - Return "utf-8" if nl_langinfo(CODESET) returns an empty string.
// - Otherwise, return nl_langinfo(CODESET).
//
// Return NULL on memory allocation failure.
Expand All @@ -888,12 +892,8 @@ _Py_GetLocaleEncoding(void)
#ifdef _Py_FORCE_UTF8_LOCALE
// On Android langinfo.h and CODESET are missing,
// and UTF-8 is always used in mbstowcs() and wcstombs().
return _PyMem_RawWcsdup(L"UTF-8");
return _PyMem_RawWcsdup(L"utf-8");
#else
const PyPreConfig *preconfig = &_PyRuntime.preconfig;
if (preconfig->utf8_mode) {
return _PyMem_RawWcsdup(L"UTF-8");
}

#ifdef MS_WINDOWS
wchar_t encoding[23];
Expand All @@ -906,7 +906,7 @@ _Py_GetLocaleEncoding(void)
if (!encoding || encoding[0] == '\0') {
// Use UTF-8 if nl_langinfo() returns an empty string. It can happen on
// macOS if the LC_CTYPE locale is not supported.
return _PyMem_RawWcsdup(L"UTF-8");
return _PyMem_RawWcsdup(L"utf-8");
}

wchar_t *wstr;
Expand Down
8 changes: 7 additions & 1 deletion Python/initconfig.c
Original file line number Diff line number Diff line change
Expand Up @@ -1779,7 +1779,13 @@ static PyStatus
config_get_locale_encoding(PyConfig *config, const PyPreConfig *preconfig,
wchar_t **locale_encoding)
{
wchar_t *encoding = _Py_GetLocaleEncoding();
wchar_t *encoding;
if (preconfig->utf8_mode) {
encoding = _PyMem_RawWcsdup(L"utf-8");
}
else {
encoding = _Py_GetLocaleEncoding();
}
if (encoding == NULL) {
return _PyStatus_NO_MEMORY();
}
Expand Down