From 96bfcd082d0d8f6820e204bd8a830153cd40311e Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Fri, 18 Mar 2022 19:57:52 +0900 Subject: [PATCH] PEP 686: Make UTF-8 mode default (#2443) --- pep-0686.rst | 155 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) create mode 100644 pep-0686.rst diff --git a/pep-0686.rst b/pep-0686.rst new file mode 100644 index 00000000000..a53d994b1ae --- /dev/null +++ b/pep-0686.rst @@ -0,0 +1,155 @@ +PEP: 686 +Title: Make UTF-8 mode default +Author: Inada Naoki +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 18-Mar-2022 +Python-Version: 3.12 + + +Abstract +======== + +This PEP proposes making UTF-8 mode [1]_ on by default. + +With this change, Python uses UTF-8 for default encoding of files, stdio, and +pipes consistently. + + +Motivation +========== + +UTF-8 becomes de-facto standard text encoding. + +* Default encoding of Python source files is UTF-8. +* JSON, TOML, YAML uses UTF-8. +* Most text editors including VS Code and Windows notepad use UTF-8 by + default. +* Most websites and text data on the internet uses UTF-8. +* And many other popular programming languages including node.js, Go, Rust, + Ruby, and Java uses UTF-8 by default. + +Changing the default encoding to UTF-8 makes Python easier to interoperate +with them. + +Additionally, many Python developers using Unix forget that the default +encoding is platform dependant. They omit to specify ``encoding="utf-8"`` when +they read text files encoded in UTF-8 (e.g. JSON, TOML, Markdown, and Python +source files). Inconsistent default encoding caused many bugs. + + +Specification +============= + +Changes to UTF-8 mode +--------------------- + +Currently, UTF-8 mode affects to ``locale.getpreferredencoding()``. + +This PEP proposes to remove this override. UTF-8 mode will not affect to +``locale`` module. + +After this change, UTF-8 mode affects to: + +* stdin, stdout, stderr + + * User can override it with ``PYTHONIOENCODING``. + +* filesystem encoding + +* ``TextIOWrapper`` and APIs using it including ``open()``, + ``Path.read_text()``, ``subprocess.Popen(cmd, text=True)``, etc... + +This change will be introduced in Python 3.11 if possible. + + +Enable UTF-8 mode by default +---------------------------- + +Python enables UTF-8 mode by default. + +User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. + + +Backward Compatibility +====================== + +Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its +locale is C or POSIX. So this change mostly affects Windows users. + +When a Python program depends on the default encoding, this change may cause +``UnicodeError``, mojibake, or even silent data corruption. So this change +should be announced very loudly. + +To resolve this backward incompatibility, users can do: + +* Disable UTF-8 mode +* Use ``EncodingWarning`` to find where the default encoding is used and use + ``encoding="locale"`` option to keep using locale encoding. [2]_ + + +Preceding examples +================== + +* Ruby changed the default ``external_encoding`` to UTF-8 on Windows in Ruby + 3.0 (2020). [3]_ +* Java changed the default text encoding to UTF-8 in JDK 18. (2022). [4]_ + +Both Ruby and Java have an option for backward compatibility. +They don't provide any warning like ``EncodingWarning`` [2]_ in Python for use +of the default encoding. + + +Rejected Alternative +==================== + +Deprecate implicit encoding +--------------------------- + +Deprecating use of the default encoding is considered. + +But there are many cases user uses the default encoding when just they need +ASCII. And some users use Python only on Unix with UTF-8 locale. + +So forcing users to specify the ``encoding`` option everywhere is too painful. + +Java also rejected this idea [4]_. + + +How to teach this +================= + +For new users, this change reduces things that need to teach. + +Users can delay learning about text encoding until they need to handle +non-UTF-8 text files. + +For existing users, see `Backward compatibility`_ section. + + +Resources +========= + +.. [1] `PEP 540 – Add a new UTF-8 Mode`__ + + __ https://peps.python.org/pep-0540/ + +.. [2] `PEP 597 – Add optional EncodingWarning`__ + + __ https://peps.python.org/pep-0597/ + +.. [3] `Set default for Encoding.default_external to UTF-8 on Windows`__ + + __ https://bugs.ruby-lang.org/issues/16604 + +.. [4] `JEP 400: UTF-8 by Default`__ + + __ https://openjdk.java.net/jeps/400 + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.