diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 9f05d6c3302..ae557e92a1f 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -555,6 +555,7 @@ pep-0677.rst @gvanrossum pep-0678.rst @iritkatriel pep-0679.rst @pablogsal pep-0680.rst @encukou +pep-0681.rst @jellezijlstra # ... # pep-0754.txt # ... diff --git a/pep-0681.rst b/pep-0681.rst new file mode 100644 index 00000000000..bb4b4a49262 --- /dev/null +++ b/pep-0681.rst @@ -0,0 +1,734 @@ +PEP: 681 +Title: Data Class Transforms +Author: Erik De Bonte , + Eric Traut +Sponsor: Jelle Zijlstra +Discussions-To: https://mail.python.org/archives/list/typing-sig@python.org/thread/EAALIHA3XEDFDNG2NRXTI3ERFPAD65Z4/ +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 02-Dec-2021 +Python-Version: 3.11 +Post-History: + + +Abstract +======== + +:pep:`557` introduced the dataclass to the Python stdlib. Several popular +libraries have behaviors that are similar to dataclasses, but these +behaviors cannot be described using standard type annotations. Such +projects include attrs, pydantic, and object relational mapper (ORM) +packages such as Django and EdgeDB. + +Most type checkers, linters and language servers have full support for +dataclasses. This proposal aims to generalize this functionality and +provide a way for third-party libraries to indicate that certain +decorator functions, classes, and metaclasses provide behaviors +similar to dataclasses. + +These behaviors include: + +* Synthesizing an ``__init__`` method based on declared + data fields. +* Optionally synthesizing ``__eq__``, ``__ne__``, ``__lt__``, + ``__le__``, ``__gt__`` and ``__ge__`` methods. +* Supporting "frozen" classes, a way to enforce immutability during + static type checking. +* Supporting "field descriptors", which describe attributes of + individual fields that a static type checker must be aware of, + such as whether a default value is provided for the field. + +Motivation +========== + +There is no existing, standard way for libraries with dataclass-like +semantics to declare their behavior to type checkers. To work around +this limitation, Mypy custom plugins have been developed for many of +these libraries, but these plugins don't work with other type +checkers, linters or language servers. They are also costly to +maintain for library authors, and they require that Python developers +know about the existence of these plugins and download and configure +them within their environment. + + +Rationale +========= + +The intent of this proposal is not to support every feature of every +library with dataclass-like semantics, but rather to make it possible +to use the most common features of these libraries in a way that is +compatible with static type checking. If a user values these libraries +and also values static type checking, they may need to avoid using +certain features or make small adjustments to the way they use them. +That's already true for the Mypy custom plugins, which +don't support every feature of every dataclass-like library. + + +Specification +============= + +The ``dataclass_transform`` decorator +------------------------------------- + +This specification introduces a new decorator function in +the ``typing`` module named ``dataclass_transform``. This decorator +can be applied to either a function that is itself a decorator, +a class, or a metaclass. The presence of +``dataclass_transform`` tells a static type checker that the decorated +function, class, or metaclass performs runtime "magic" that transforms +a class, endowing it with dataclass-like behaviors. + +If ``dataclass_transform`` is applied to a function, using the decorated +function as a decorator is assumed to apply dataclass-like semantics. +If ``dataclass_transform`` is applied to a class, dataclass-like +semantics will be assumed for any class that derives from the +decorated class or uses the decorated class as a metaclass. + +Examples of each approach are shown in the following sections. Each +example creates a ``CustomerModel`` class with dataclass-like semantics. +The implementation of the decorated objects is omitted for brevity, +but we assume that they modify classes in the following ways: + +* They synthesize an ``__init__`` method using data fields declared + within the class and its parent classes. +* They synthesize ``__eq__`` and ``__ne__`` methods. + +Type checkers supporting this PEP will recognize that the +``CustomerModel`` class can be instantiated using the synthesized +``__init__`` method: + +.. code-block:: python + + # Using positional arguments + c1 = CustomerModel(327, "John Smith") + + # Using keyword arguments + c2 = CustomerModel(id=327, name="John Smith") + + # These calls will generate runtime errors and should be flagged as + # errors by a static type checker. + c3 = CustomerModel() + c4 = CustomerModel(327, first_name="John") + c5 = CustomerModel(327, "John Smith", 0) + +Decorator function example +`````````````````````````` + +.. code-block:: python + + _T = TypeVar("_T") + + # The ``create_model`` decorator is defined by a library. + # This could be in a type stub or inline. + @typing.dataclass_transform() + def create_model(cls: Type[_T]) -> Type[_T]: + cls.__init__ = ... + cls.__eq__ = ... + cls.__ne__ = ... + return cls + + + # The ``create_model`` decorator can now be used to create new model + # classes, like this: + @create_model + class CustomerModel: + id: int + name: str + +Class example +````````````` + +.. code-block:: python + + # The ``ModelBase`` class is defined by a library. This could be in + # a type stub or inline. + @typing.dataclass_transform() + class ModelBase: ... + + + # The ``ModelBase`` class can now be used to create new model + # subclasses, like this: + class CustomerModel(ModelBase): + id: int + name: str + +Metaclass example +````````````````` + +.. code-block:: python + + # The ``ModelMeta`` metaclass and ``ModelBase`` class are defined by + # a library. This could be in a type stub or inline. + @typing.dataclass_transform() + class ModelMeta(type): ... + + class ModelBase(metaclass=ModelMeta): ... + + + # The ``ModelBase`` class can now be used to create new model + # subclasses, like this: + class CustomerModel(ModelBase): + id: int + name: str + +Decorator function and class/metaclass parameters +------------------------------------------------- + +A decorator function, class, or metaclass that provides dataclass-like +functionality may accept parameters that modify certain behaviors. +This specification defines the following parameters that static type +checkers must honor if they are used by a dataclass transform. Each of +these parameters accepts a bool argument, and it must be possible for +the bool value (``True`` or ``False``) to be statically evaluated. + +* ``eq``. ``order``, ``frozen``, ``init`` and ``unsafe_hash`` are parameters + supported in the stdlib dataclass, with meanings defined in :pep:`557 <557#id7>`. +* ``hash`` is an alias for the ``unsafe_hash`` parameter. +* ``kw_only`` and ``slots`` are parameters supported in the stdlib dataclass, + first introduced in Python 3.10. + +``dataclass_transform`` parameters +---------------------------------- + +Parameters to ``dataclass_transform`` allow for some basic +customization of default behaviors: + +.. code-block:: python + + _T = TypeVar("_T") + + def dataclass_transform( + *, + eq_default: bool = True, + order_default: bool = False, + kw_only_default: bool = False, + field_descriptors: Tuple[type, ...] = (()), + ) -> Callable[[_T], _T]: ... + +* ``eq_default`` indicates whether the ``eq`` parameter is assumed to + be True or False if it is omitted by the caller. If not specified, + ``eq_default`` will default to True (the default assumption for + dataclass). +* ``order_default`` indicates whether the ``order`` parameter is + assumed to be True or False if it is omitted by the caller. If not + specified, ``order_default`` will default to False (the default + assumption for dataclass). +* ``kw_only_default`` indicates whether the ``kw_only`` parameter is + assumed to be True or False if it is omitted by the caller. If not + specified, ``kw_only_default`` will default to False (the default + assumption for dataclass). +* ``field_descriptors`` specifies a static list of supported classes + that describe fields. Some libraries also supply functions to + allocate instances of field descriptors, and those functions may + also be specified in this tuple. If not specified, + ``field_descriptors`` will default to an empty tuple (no field + descriptors supported). The standard dataclass behavior supports + only one type of field descriptor called ``Field`` plus a helper + function (``field``) that instantiates this class, so if we were + describing the stdlib dataclass behavior, we would provide the + tuple argument ``(dataclasses.Field, dataclasses.field)``. + +The following sections provide additional examples showing how these +parameters are used. + +Decorator function example +`````````````````````````` + +.. code-block:: python + + # Indicate that the ``create_model`` function assumes keyword-only + # parameters for the synthesized ``__init__`` method unless it is + # invoked with ``kw_only=False``. It always synthesizes order-related + # methods and provides no way to override this behavior. + @typing.dataclass_transform(kw_only_default=True, order_default=True) + def create_model( + *, + frozen: bool = False, + kw_only: bool = True, + ) -> Callable[[Type[_T]], Type[_T]]: ... + + + # Example of how this decorator would be used by code that imports + # from this library: + @create_model(frozen=True, kw_only=False) + class CustomerModel: + id: int + name: str + +Class example +````````````` + +.. code-block:: python + + # Indicate that classes that derive from this class default to + # synthesizing comparison methods. + @typing.dataclass_transform(eq_default=True, order_default=True) + class ModelBase: + def __init_subclass__( + cls, + *, + init: bool = True, + frozen: bool = False, + eq: bool = True, + order: bool = True, + ): + ... + + + # Example of how this class would be used by code that imports + # from this library: + class CustomerModel( + ModelBase, + init=False, + frozen=True, + eq=False, + order=False, + ): + id: int + name: str + +Metaclass example +````````````````` + +.. code-block:: python + + # Indicate that classes that use this metaclass default to + # synthesizing comparison methods. + @typing.dataclass_transform(eq_default=True, order_default=True) + class ModelMeta(type): + def __new__( + cls, + name, + bases, + namespace, + *, + init: bool = True, + frozen: bool = False, + eq: bool = True, + order: bool = True, + ): + ... + + class ModelBase(metaclass=ModelMeta): + ... + + + # Example of how this class would be used by code that imports + # from this library: + class CustomerModel( + ModelBase, + init=False, + frozen=True, + eq=False, + order=False, + ): + id: int + name: str + + +Field descriptors +----------------- + +Most libraries that support dataclass-like semantics provide one or +more "field descriptor" types that allow a class definition to provide +additional metadata about each field in the class. This metadata can +describe, for example, default values, or indicate whether the field +should be included in the synthesized ``__init__`` method. + +Field descriptors can be omitted in cases where additional metadata is +not required: + +.. code-block:: python + + @dataclass + class Employee: + # Field with no descriptor + name: str + + # Field that uses field descriptor class instance + age: Optional[int] = field(default=None, init=False) + + # Field with type annotation and simple initializer to + # describe default value + is_paid_hourly: bool = True + + # Not a field (but rather a class variable) because type + # annotation is not provided. + office_number = "unassigned" + + +Field descriptor parameters +``````````````````````````` + +Libraries that support dataclass-like semantics and support field +descriptor classes typically use common parameter names to construct +these field descriptors. This specification formalizes the names and +meanings of the parameters that must be understood for static type +checkers. These standardized parameters must be keyword-only. +Field descriptor classes are allowed to use other +parameters in their constructors, and those parameters can be +positional and may use other names. + +* ``init`` is an optional bool parameter that indicates whether the + field should be included in the synthesized ``__init__`` method. If + unspecified, ``init`` defaults to True. Field descriptor functions + can use overloads that implicitly specify the value of ``init`` + using a literal bool value type + (``Literal[False]`` or ``Literal[True]``). +* ``default`` is an optional parameter that provides the default value + for the field. +* ``default_factory`` is an optional parameter that provides a runtime + callback that returns the default value for the field. If neither + ``default`` nor ``default_factory`` are specified, the field is + assumed to have no default value and must be provided a value when + the class is instantiated. +* ``factory`` is an alias for ``default_factory``. Stdlib dataclasses + use the name ``default_factory``, but attrs uses the name ``factory`` + in many scenarios, so this alias is necessary for supporting attrs. +* ``alias`` is an optional str parameter that provides an alternative + name for the field. This alternative name is used in the synthesized + ``__init__`` method. + +It is an error to specify more than one of ``default``, +``default_factory`` and ``factory``. + +This example demonstrates the above: + +.. code-block:: python + + # Library code (within type stub or inline) + # In this library, passing a resolver means that init must be False, + # and the overload with Literal[False] enforces that. + @overload + def model_field( + *, + default: Optional[Any] = ..., + resolver: Callable[[], Any], + init: Literal[False] = False, + ) -> Any: ... + + @overload + def model_field( + *, + default: Optional[Any] = ..., + resolver: None = None, + init: bool = True, + ) -> Any: ... + + @typing.dataclass_transform( + kw_only_default=True, + field_descriptors=(model_field, )) + def create_model( + *, + init: bool = True, + ) -> Callable[[Type[_T]], Type[_T]]: ... + + + # Code that imports this library: + @create_model(init=False) + class CustomerModel: + id: int = model_field(resolver=lambda : 0) + name: str + + +Runtime behavior +---------------- + +At runtime, the ``dataclass_transform`` decorator's only effect is to +set a string attribute named ``__dataclass_transform__`` on the +decorated function or class to support introspection. The value of the +attribute should be a dict mapping the names of the +``dataclass_transform`` parameters to their values. + +For example: + +.. code-block:: python + + { + "eq_default": True, + "order_default": False, + "kw_only_default": False, + "field_descriptors": () + } + + +Dataclass semantics +------------------- + +The following dataclass semantics are implied when a function or class +decorated with ``dataclass_transform`` is in use. + +* Frozen dataclasses cannot inherit from non-frozen dataclasses. A + class that has been decorated with ``dataclass_transform`` is + considered neither frozen nor non-frozen, thus allowing frozen + classes to inherit from it. Similarly, a class that directly + specifies a metaclass that is decorated with ``dataclass_transform`` + is considered neither frozen nor non-frozen. + + Consider these class examples: + + .. code-block:: python + + # ModelBase is not considered either "frozen" or "non-frozen" + # because it is decorated with ``dataclass_transform`` + @typing.dataclass_transform() + class ModelBase(): ... + + # Vehicle is considered non-frozen because it does not specify + # "frozen=True". + class Vehicle(ModelBase): + name: str + + # Car is a frozen class that derives from Vehicle, which is a + # non-frozen class. This is an error. + class Car(Vehicle, frozen=True): + wheel_count: int + + And these similar metaclass examples: + + .. code-block:: python + + @typing.dataclass_transform() + class ModelMeta(type): ... + + # ModelBase is not considered either "frozen" or "non-frozen" + # because it directly specifies ModelMeta as its metaclass. + class ModelBase(metaclass=ModelMeta): ... + + # Vehicle is considered non-frozen because it does not specify + # "frozen=True". + class Vehicle(ModelBase): + name: str + + # Car is a frozen class that derives from Vehicle, which is a + # non-frozen class. This is an error. + class Car(Vehicle, frozen=True): + wheel_count: int + +* Field ordering and inheritance is assumed to follow the rules + specified in :pep:`557 <557#inheritance>`. This includes the effects of + overrides (redefining a field in a child class that has already been + defined in a parent class). + +* :pep:`PEP 557 indicates <557#post-init-parameters>` that + all fields without default values must appear before + fields with default values. Although not explicitly + stated in PEP 557, this rule is ignored when ``init=False``, and + this specification likewise ignores this requirement in that + situation. Likewise, there is no need to enforce this ordering when + keyword-only parameters are used for ``__init__``, so the rule is + not enforced if ``kw_only`` semantics are in effect. + +* As with dataclass, method synthesis is skipped if it would + overwrite a method that is explicitly declared within the class. + For example, if a class declares an ``__init__`` method explicitly, + an ``__init__`` method will not be synthesized for that class. + +* KW_ONLY sentinel values are supported as described in `the Python + docs <#kw-only-docs_>`_ and `bpo-43532 <#kw-only-issue_>`_. + +* ClassVar attributes are not considered dataclass fields and are + `ignored by dataclass mechanisms <#class-var_>`_. + + +Alternate form +-------------- + +To avoid delaying adoption of this proposal until after +``dataclass_transform`` has been added to the ``typing`` module, type +checkers may support the alternative form ``__dataclass_transform__``. +This form can be defined locally without any reliance on the +``typing`` or ``typing_extensions`` modules, and allows immediate +adoption of this specification by library authors. Type checkers that +have not yet adopted this specification will retain their current +behavior. + +To use this alternate form, library authors should include the +following declaration within their type stubs or source files: + +.. code-block:: python + + _T = TypeVar("_T") + + def __dataclass_transform__( + *, + eq_default: bool = True, + order_default: bool = False, + kw_only_default: bool = False, + field_descriptors: Tuple[Union[type, Callable[..., Any]], ...] = (()), + ) -> Callable[[_T], _T]: + # If used within a stub file, the following implementation can + # be replaced with "...". + return lambda a: a + +Undefined behavior +------------------ + +If multiple ``dataclass_transform`` decorators are found, either on a +single function/class or within a class hierarchy, the resulting +behavior is undefined. Library authors should avoid these scenarios. + + +Reference Implementation +======================== + +The `Pyright <#pyright_>`_ type checker supports the +``__dataclass_transform__`` `alternate form`_. Pyright's +``dataClasses.ts`` `source file <#pyright-impl_>`_ would be a good +starting point for understanding the implementation. + +The `attrs <#attrs-usage_>`_ and `pydantic <#pydantic-usage_>`_ +libraries are using the ``__dataclass_transform__`` `alternate form`_. + + +Rejected Ideas +============== + +``auto_attribs`` parameter +-------------------------- + +The attrs library supports an ``auto_attribs`` parameter that +indicates whether class members decorated with :pep:`526` variable +annotations but with no assignment should be treated as data fields. + +We considered supporting ``auto_attribs`` and a corresponding +``auto_attribs_default`` parameter, but decided against this because it +is specific to attrs and appears to be a legacy behavior. Instead of +supporting this in the new standard, we recommend that the maintainers +of attrs move away from the legacy semantics and adopt +``auto_attribs`` behaviors by default. + +Django does not support declaring fields using type annotations only, +so Django users who leverage ``dataclass_transform`` should be aware +that they should always supply assigned values. + +``cmp`` parameter +----------------- + +The attrs library supports a bool parameter ``cmp`` that is equivalent +to setting both ``eq`` and ``order`` to True. We chose not to support +a ``cmp`` parameter, since it only applies to attrs. Attrs users +should use the dataclass-standard ``eq`` and ``order`` parameter names +instead. + +``kw_only`` field descriptor parameter +-------------------------------------- + +The attrs library supports a ``kw_only`` parameter for individual +fields. We chose not to support a ``kw_only`` parameter, since it is +specific to attrs. + +Automatic field name aliasing +----------------------------- + +The attrs library performs `automatic aliasing <#attrs-aliasing_>`_ of +field names that start with a single underscore, stripping the +underscore from the name of the corresponding ``__init__`` parameter. + +This proposal omits that behavior since it is specific to attrs. Users +can manually alias these fields using the ``alias`` parameter. + + +Alternate field ordering algorithms +----------------------------------- + +The attrs library currently supports two approaches to ordering the +fields within a class: + +* Dataclass order: The same ordering used by dataclasses. This is the + default behavior of the older APIs (e.g. ``attr.s``). +* Method Resolution Order (MRO): This is the default behavior of the + newer APIs (e.g. define, mutable, frozen). Older APIs (e.g. ``attr.s``) + can opt into this behavior by specifying ``collect_by_mro=True``. + +The resulting field orderings can differ in certain diamond-shaped +multiple inheritance scenarios. + +For simplicity, this proposal does not support any field ordering +other than that used by dataclasses. + +Fields redeclared in subclasses +------------------------------- + +The attrs library differs from stdlib dataclasses in how it +handles inherited fields that are redeclared in subclasses. The +dataclass specification preserves the original order, but attrs +defines a new order based on subclasses. + +For simplicity, we chose to only support the dataclass behavior. +Users of attrs who rely on the attrs-specific ordering will not see +the expected order of parameters in the synthesized ``__init__`` +method. + +Django primary and foreign keys +------------------------------- + +Django applies `additional logic for primary and foreign keys +<#django-ids_>`_. For example, it automatically adds an ``id`` field +(and ``__init__`` parameter) if there is no field designated as a +primary key. + +As this is not broadly applicable to dataclass libraries, this +additional logic is not accommodated with this proposal, so +users of Django would need to explicitly declare the ``id`` field. + +This limitation may make it impractical to use the +``dataclass_transform`` mechanism with Django. + +Open Issues +=========== + +``converter`` field descriptor parameter +---------------------------------------- + +The attrs library supports a ``converter`` field descriptor parameter, +which is a callable that is called by the generated +``__init__`` method to convert the supplied value to some other +desired value. This is tricky to support since the parameter type in +the synthesized __init__ method needs to accept uncovered values, but +the resulting field is typed according to the output of the converter. + +There may be no good way to support this because there's not enough +information to derive the type of the input parameter. We currently +have two ideas: + +1. Add support for a ``converter`` field descriptor parameter but then + use the Any type for the corresponding parameter in the __init__ + method. + +2. Say that converters are unsupported and recommend that attrs users + avoid them. + +Some aspects of this issue are detailed in a +`Pyright discussion <#converters_>`_. + +References +========== +.. _#pyright: https://github.com/Microsoft/pyright +.. _#pyright-impl: https://github.com/microsoft/pyright/blob/main/packages/pyright-internal/src/analyzer/dataClasses.ts +.. _#attrs-usage: https://github.com/python-attrs/attrs/pull/796 +.. _#pydantic-usage: https://github.com/samuelcolvin/pydantic/pull/2721 +.. _#attrs-aliasing: https://www.attrs.org/en/stable/init.html#private-attributes +.. _#django-ids: https://docs.djangoproject.com/en/4.0/topics/db/models/#automatic-primary-key-fields +.. _#converters: https://github.com/microsoft/pyright/discussions/1782?sort=old#discussioncomment-653909 +.. _#kw-only-docs: https://docs.python.org/3/library/dataclasses.html#dataclasses.KW_ONLY +.. _#kw-only-issue: https://bugs.python.org/issue43532 +.. _#class-var: https://docs.python.org/3/library/dataclasses.html#class-variables + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: