From 72f4666aca21e695fb21a2aaea3b6df81c567a3d Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Fri, 18 Mar 2022 18:47:25 +0900 Subject: [PATCH 1/4] PEP 686: Make UTF-8 mode default --- pep-0686.rst | 149 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 149 insertions(+) create mode 100644 pep-0686.rst diff --git a/pep-0686.rst b/pep-0686.rst new file mode 100644 index 00000000000..c1fe6523495 --- /dev/null +++ b/pep-0686.rst @@ -0,0 +1,149 @@ +PEP: 686 +Title: Make UTF-8 mode default +Author: Inada Naoki +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: XX-Mar-2022 +Python-Version: 3.12 + + +Abstract +======== + +This PEP proposes making UTF-8 mode [1]_ on by default. + +With this change, Python uses UTF-8 for default encoding of files, stdio, and +pipes consistently. + + +Motivation +========== + +UTF-8 becomes de-facto standard text encoding. + +* Default encoding of Python source files is UTF-8. +* JSON, TOML, YAML uses UTF-8. +* Most text editors including VS Code and Windows notepad use UTF-8 by default. +* Most websites and text data on the internet uses UTF-8. +* And many other popular programming languages including node.js, Go, Rust, + Ruby, and Java uses UTF-8 by default. + +Changing the default encoding to UTF-8 makes Python easier to interoperate with them. + + + +Specification +============= + +Changes to UTF-8 mode +--------------------- + +Currently, UTF-8 mode affects to ``locale.getpreferredencoding()``. + +This PEP proposes to remove this override. UTF-8 mode will not affect to +``locale`` module. + +After this change, UTF-8 mode affects to: + +* stdio + + * User can override stdio encoding with ``PYTHONIOENCODING``. + +* filesystem encoding + +* ``TextIOWrapper`` and APIs using it including ``open()``, + ``Path.read_text()``, ``subprocess.Popen(cmd, text=True)``, etc... + +This change will be introduced in Python 3.11 if possible. + + +Enable UTF-8 mode by default +---------------------------- + +Python enables UTF-8 mode by default. + +User can disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. + + +Backward Compatibility +====================== + +Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its +locale is C or POSIX. So this change mostly affects Windows users. + +When a Python program depends on the default encoding, this change may cause +``UnicodeError``, mojibake, or even silent data corruption. So this change +should be announced very loudly. + +To resolve this backward incompatibility, users can do: + +* Disable UTF-8 mode +* Use ``EncodingWarning`` to find where the default encoding is used and use + ``encoding="locale"`` option to keep using locale encoding. [2]_ + + +Preceding examples +================== + +* Ruby changed the default ``external_encoding`` to UTF-8 on Windows in Ruby + 3.0 (2020). [3]_ +* Java changed the default text encoding to UTF-8 in JDK 18. (2022). [4]_ + +Both Ruby and Java have an option for backward compatibility. +They don't provide any warning like ``EncodingWarning`` [2]_ in Python for use +of the default encoding. + + +Rejected Alternative +==================== + +Deprecate implicit encoding +--------------------------- + +Deprecating use of the default encoding is considered. + +But there are many cases user uses the default encoding when just they need +ASCII. And some users use Python only on Unix with UTF-8 locale. + +So forcing users to specify the ``encoding`` option everywhere is too painful. + +Java also rejected this idea [4]_. + + +How to teach this +================= + +For new users, this change reduces things that need to teach. + +Users can delay learning about text encoding until they need to handle +non-UTF-8 text files. + +For existing users, see `Backward compatibility`_ section. + + +Resources +========= + +.. [1] `PEP 540 – Add a new UTF-8 Mode`__ + + __ https://peps.python.org/pep-0540/ + +.. [2] `PEP 597 – Add optional EncodingWarning`__ + + __ https://peps.python.org/pep-0597/ + +.. [3] `Set default for Encoding.default_external to UTF-8 on Windows`__ + + __ https://bugs.ruby-lang.org/issues/16604 + +.. [4] `JEP 400: UTF-8 by Default`__ + + __ https://openjdk.java.net/jeps/400 + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. From bd4a5dde8350ff41547ba069afe3956d840a0697 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Fri, 18 Mar 2022 18:59:32 +0900 Subject: [PATCH 2/4] wrap by 79 column --- pep-0686.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index c1fe6523495..b0a4cc67f87 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -24,12 +24,14 @@ UTF-8 becomes de-facto standard text encoding. * Default encoding of Python source files is UTF-8. * JSON, TOML, YAML uses UTF-8. -* Most text editors including VS Code and Windows notepad use UTF-8 by default. +* Most text editors including VS Code and Windows notepad use UTF-8 by + default. * Most websites and text data on the internet uses UTF-8. * And many other popular programming languages including node.js, Go, Rust, Ruby, and Java uses UTF-8 by default. -Changing the default encoding to UTF-8 makes Python easier to interoperate with them. +Changing the default encoding to UTF-8 makes Python easier to interoperate +with them. From 9d51523c5180c6a41e7a1aab2d4a1235fadb6838 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Fri, 18 Mar 2022 19:01:52 +0900 Subject: [PATCH 3/4] Set Created field --- pep-0686.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0686.rst b/pep-0686.rst index b0a4cc67f87..76bf2e95248 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -4,7 +4,7 @@ Author: Inada Naoki Status: Draft Type: Standards Track Content-Type: text/x-rst -Created: XX-Mar-2022 +Created: 18-Mar-2022 Python-Version: 3.12 From 3f0922a9b8a0be0c79cf1d2d592b4968aea841f0 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Fri, 18 Mar 2022 19:55:04 +0900 Subject: [PATCH 4/4] small fixup --- pep-0686.rst | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index 76bf2e95248..a53d994b1ae 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -33,6 +33,10 @@ UTF-8 becomes de-facto standard text encoding. Changing the default encoding to UTF-8 makes Python easier to interoperate with them. +Additionally, many Python developers using Unix forget that the default +encoding is platform dependant. They omit to specify ``encoding="utf-8"`` when +they read text files encoded in UTF-8 (e.g. JSON, TOML, Markdown, and Python +source files). Inconsistent default encoding caused many bugs. Specification @@ -48,9 +52,9 @@ This PEP proposes to remove this override. UTF-8 mode will not affect to After this change, UTF-8 mode affects to: -* stdio +* stdin, stdout, stderr - * User can override stdio encoding with ``PYTHONIOENCODING``. + * User can override it with ``PYTHONIOENCODING``. * filesystem encoding @@ -65,7 +69,7 @@ Enable UTF-8 mode by default Python enables UTF-8 mode by default. -User can disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. +User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. Backward Compatibility