diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 27acae546d8..bb98413d806 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -570,6 +570,7 @@ pep-0686.rst @methane pep-0687.rst @encukou pep-0688.rst @jellezijlstra pep-0689.rst @encukou +pep-0690.rst @warsaw # ... # pep-0754.txt # ... diff --git a/AUTHOR_OVERRIDES.csv b/AUTHOR_OVERRIDES.csv index 4d58942d445..72289408f75 100644 --- a/AUTHOR_OVERRIDES.csv +++ b/AUTHOR_OVERRIDES.csv @@ -9,3 +9,4 @@ Just van Rossum,"van Rossum, Just (JvR)",JvR Martin v. Löwis,"von Löwis, Martin",von Löwis Nathaniel Smith,"Smith, Nathaniel J.",Smith P.J. Eby,"Eby, Phillip J.",Eby +Germán Méndez Bravo,"Méndez Bravo, Germán",Méndez Bravo diff --git a/pep-0690.rst b/pep-0690.rst new file mode 100644 index 00000000000..bf57d9fc741 --- /dev/null +++ b/pep-0690.rst @@ -0,0 +1,412 @@ +PEP: 690 +Title: Lazy Imports +Author: Germán Méndez Bravo , Carl Meyer +Sponsor: Barry Warsaw +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 29-Apr-2022 +Python-Version: 3.12 + + +Abstract +======== + +This PEP proposes a feature to transparently defer the execution of imported +modules until the moment when an imported object is used. Since Python +programs commonly import many more modules than a single invocation of the +program is likely to use in practice, lazy imports can greatly reduce the +overall number of modules loaded, improving startup time and memory usage. Lazy +imports also mostly eliminate the risk of import cycles. + + +Motivation +========== + +Common Python code style :pep:`prefers <8#imports>` imports at module +level, so they don't have to be repeated within each scope the imported object +is used in, and to avoid the inefficiency of repeated execution of the import +system at runtime. This means that importing the main module of a program +typically results in an immediate cascade of imports of most or all of the +modules that may ever be needed by the program. + +Consider the example of a Python command line program with a number of +subcommands. Each subcommand may perform different tasks, requiring the import +of different dependencies. But a given invocation of the program will only +execute a single subcommand, or possibly none (i.e. if just ``--help`` usage +info is requested). Top-level eager imports in such a program will result in +the import of many modules that will never be used at all; the time spent +(possibly compiling and) executing these modules is pure waste. + +In an effort to improve startup time, some large Python CLIs tools make imports +lazy by manually placing imports inline into functions to delay imports of +expensive subsystems. This manual approach is labor-intensive and fragile; one +misplaced import or refactor can easily undo painstaking optimization work. + +Existing import-hook-based solutions such as `demandimport +`_ or `importlib.util.LazyLoader +`_ +are limited in that only certain styles of import can be made truly lazy +(imports such as ``from foo import a, b`` will still eagerly import the module +``foo``) and they impose additional runtime overhead on every module attribute +access. + +This PEP proposes a more comprehensive solution for lazy imports that does not +impose detectable overhead in real-world use. The implementation in this PEP +has already `demonstrated +`_ +startup time improvements up to 70% and memory-use reductions up to +40% on real-world Python CLIs. + +Lazy imports also eliminate most import cycles. With eager imports, "false +cycles" can easily occur which are fixed by simply moving an import to the +bottom of a module or inline into a function, or switching from ``from foo +import bar`` to ``import foo``. With lazy imports, these "cycles" just work. +The only cycles which will remain are those where two modules actually each use +a name from the other at module level; these "true" cycles are only fixable by +refactoring the classes or functions involved. + + +Rationale +========= + +The aim of this feature is to make imports transparently lazy. "Lazy" means +that the import of a module (execution of the module body and addition of the +module object to ``sys.modules``) should not occur until the module (or a name +imported from it) is actually referenced during execution. "Transparent" means +that besides the delayed import (and necessarily observable effects of that, +such as delayed import side effects and changes to ``sys.modules``), there is +no other observable change in behavior: the imported object is present in the +module namespace as normal and is transparently loaded whenever first used: its +status as a "lazy imported object" is not directly observable from Python or +from C extension code. + +The requirement that the imported object be present in the module namespace as +usual, even before the import has actually occurred, means that we need some +kind of "lazy object" placeholder to represent the not-yet-imported object. +The transparency requirement dictates that this placeholder must never be +visible to Python code; any reference to it must trigger the import and replace +it with the real imported object. + +Given the possibility that Python (or C extension) code may pull objects +directly out of a module ``__dict__``, the only way to reliably prevent +accidental leakage of lazy objects is to have the dictionary itself be +responsible to ensure resolution of lazy objects on lookup. + +To avoid a performance penalty on the vast majority of dictionaries which never +contain any lazy objects, we install a specialized lookup function +(``lookdict_unicode_lazy``) for module namespace dictionaries when they first +gain a lazy-object value. When this lookup function finds that the key +references a lazy object, it resolves the lazy object immediately before +returning it. + +Some operations on dictionaries (e.g. iterating all values) don't go through +the lookup function; in these cases we have to add a check if the lookup +function is ``lookdict_unicode_lazy`` and if so, resolve all lazy values first. + +This implementation comprehensively prevents leakage of lazy objects, ensuring +they are always resolved to the real imported object before anyone can get hold +of them for any use, while avoiding any significant performance impact on +dictionaries in general. + + +Specification +============= + +Lazy imports are opt-in, and globally enabled via a new ``-L`` flag to the +Python interpreter, or a ``PYTHONLAZYIMPORTS`` environment variable. + +When enabled, the loading and execution of all (and only) top level imports is +deferred until the imported name is used. This could happen immediately (e.g. +on the very next line after the import statement) or much later (e.g. while +using the name inside a function being called by some other code at some later +time.) + +For these top level imports, there are two exceptions which will make them +eager (not lazy): imports inside ``try``/``except``/``finally`` or ``with`` +blocks, and star imports (``from foo import *``.) Imports inside +exception-handling blocks (this includes ``with`` blocks, since those can also +"catch" and handle exceptions) remain eager so that any exceptions arising from +the import can be handled. Star imports must remain eager since performing the +import is the only way to know which names should be added to the namespace. + +Imports inside class definitions or inside functions/methods are not "top +level" and are never lazy. + +Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are +also never lazy. + + +Debuggability +------------- + +The implementation will ensure that exceptions resulting from a deferred import +have metadata attached pointing the user to the original import statement, to +ease debuggability of errors from lazy imports. + +Additionally, debug logging from ``python -v`` will include logging when an +import statement has been encountered but execution of the import will be +deferred. + +Python's ``-X importtime`` feature for profiling import costs adapts naturally +to lazy imports; the profiled time is the time spent actually importing. + + +Per-module opt out +------------------ + +Due to the backwards compatibility issues mentioned below, it may be necessary +to force some imports to be eager. + +In first-party code, since imports inside a ``try`` or ``with`` block are never +lazy, this can be easily accomplished:: + + try: # force these imports to be eager + import foo + import bar + finally: + pass + +This PEP proposes to add a new ``importlib.eager_imports()`` context manager, +so the above technique can be less verbose and doesn't require comments to +clarify its intent:: + + with eager_imports(): + import foo + import bar + +Since imports within context managers are always eager, the ``eager_imports()`` +context manager can just be an alias to a null context manager. The context +manager does not force all imports to be recursively eager: ``foo`` and ``bar`` +will be imported eagerly, but imports within those modules will still follow +the usual laziness rules. + +The more difficult case can occur if an import in third-party code that can't +easily be modified must be forced to be eager. For this purpose, we propose to +add an API to ``importlib`` that can be called early in the process to specify +a list of module names within which all imports will be eager:: + + from importlib import set_eager_imports + + set_eager_imports(["one.mod", "another"]) + +The effect of this is also shallow: all imports within ``one.mod`` will be +eager, but not imports in all modules imported by ``one.mod``. + + +Backwards Compatibility +======================= + +This proposal preserves full backwards compatibility when the feature is +disabled, which is the default. + +Even when enabled, most code will continue to work normally without any +observable change (other than improved startup time and memory usage.) +Namespace packages are not affected: they work just as they do currently, +except lazily. + +In some existing code, lazy imports could produce currently unexpected results +and behaviors. The problems that we may see when enabling lazy imports in an +existing codebase are related to: + + +Import Side Effects +------------------- + +Import side effects that would otherwise be produced by the execution of +imported modules during the execution of import statements will be deferred at +least until the imported objects are used. + +These import side effects may include: + +* code executing any side-effecting logic during import; +* relying on imported submodules being set as attributes in the parent module. + +A relevant and typical affected case is the `click +`_ library for building Python command-line +interfaces. If e.g. ``cli = click.group()`` is defined in ``main.py``, and +``sub.py`` imports ``cli`` from ``main`` and adds subcommands to it via +decorator (``@cli.command(...)``), but the actual ``cli()`` call is in +``main.py``, then lazy imports may prevent the subcommands from being +registered, since in this case Click is depending on side effects of the import +of ``sub.py``. In this case the fix is to ensure the import of ``sub.py`` is +eager, e.g. by using the ``importlib.eager_imports()`` context manager. + + +Dynamic Paths +------------- + +There could be issues related to dynamic Python import paths; particularly, +adding (and then removing after the import) paths from ``sys.path``:: + + sys.path.insert(0, "/path/to/foo/module") + import foo + del sys.path[0] + foo.Bar() + +In this case, with lazy imports enabled, the import of ``foo`` will not +actually occur while the addition to ``sys.path`` is present. + + +Deferred Exceptions +------------------- + +All exceptions arising from import (including ``ModuleNotFoundError``) are +deferred from import time to first-use time, which could complicate debugging. +Accessing an object in the middle of any code could trigger a deferred import +and produce ``ImportError`` or any other exception resulting from the +resolution of the deferred object, while loading and executing the related +imported module. The implementation will provide debugging assistance in +lazy-import-triggered tracebacks to mitigate this issue. + + +Security Implications +===================== + +Deferred execution of code could produce security concerns if process owner, +path, ``sys.path``, or other sensitive environment or contextual states change +between the time the ``import`` statement is executed and the time where the +imported object is used. + + +Performance Impact +================== + +The reference implementation has shown that the feature has negligible +performance impact on existing real-world codebases (Instagram Server and other +several CLI programs at Meta), while providing substantial improvements to +startup time and memory usage. + +The reference implementation shows small performance regressions in a few +pyperformance benchmarks, but improvements in others. (TODO update with +detailed data from 3.11 port of implementation.) + + +How to Teach This +================= + +Since the feature is opt-in, beginners should not encounter it by default. +Documentation of the ``-L`` flag and ``PYTHONLAZYIMPORTS`` environment variable +can clarify the behavior of lazy imports. + +Some best practices to deal with some of the issues that could arise and to +better take advantage of lazy imports are: + +* Avoid relying on import side effects. Perhaps the most common reliance on + import side effects is the registry pattern, where population of some + external registry happens implicitly during the importing of modules, often + via decorators. Instead, the registry should be built via an explicit call + that perhaps does a discovery process to find decorated functions or classes. + +* Always import needed submodules explicitly, don't rely on some other import + to ensure a module has its submodules as attributes. That is, do ``import + foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only + works (unreliably) because the attribute ``foo.bar`` is added as a side + effect of ``foo.bar`` being imported somewhere else. With lazy imports this + may not always happen on time. + +* Avoid using star imports, as those are always eager. + +* When possible, do not import whole submodules. Import specific names instead; + i.e.: do ``from foo.bar import Baz``, not ``import foo.bar`` and then + ``foo.bar.Baz``. If you import submodules (such as ``foo.qux`` and + ``foo.fred``), with lazy imports enabled, when you access the parent module's + name (``foo`` in this case), that will trigger loading all of the sibling + submodules of the parent module (``foo.bar``, ``foo.qux`` and ``foo.fred``), + not only the one being accessed, because the parent module ``foo`` is the + actual deferred object name. + + +Reference Implementation +======================== + +The current reference implementation is available as part of +`Cinder `_. +Reference implementation is in use within Meta Platforms and has proven to +achieve improvements in startup time (and total runtime for some applications) +in the range of 40%-70%, as well as significant reduction in memory footprint +(up to 40%), thanks to not needing to execute imports that end up being unused +in the common flow. + + +Rejected Ideas +============== + +Explicit syntax for lazy imports +-------------------------------- + +If the primary objective of lazy imports were solely to work around import +cycles and forward references, an explicitly-marked syntax for particular +targeted imports to be lazy would make a lot of sense. But in practice it would +be very hard to get robust startup time or memory use benefits from this +approach, since it would require converting most imports within your code base +(and in third-party dependencies) to use the lazy import syntax. + +It would be possible to aim for a "shallow" laziness where only the top-level +imports of subsystems from the main module are made explicitly lazy, but then +imports within the subsystems are all eager. This is extremely fragile, though +-- it only takes one mis-placed import to undo the carefully constructed +shallow laziness. Globally enabling lazy imports, on the other hand, provides +in-depth robust laziness where you always pay only for the imports you use. + + +Half-lazy imports +----------------- + +It would be possible to eagerly run the import loader to the point of finding +the module source, but then defer the actual execution of the module and +creation of the module object. The advantage of this would be that certain +classes of import errors (e.g. a simple typo in the module name) would be +caught eagerly instead of being deferred to the use of an imported name. + +The disadvantage would be that the startup time benefits of lazy imports would +be significantly reduced, since unused imports would still require a filesystem +``stat()`` call, at least. It would also introduce a possibly non-obvious split +between *which* import errors are raised eagerly and which are delayed, when +lazy imports are enabled. + +This idea is rejected for now on the basis that in practice, confusion about +import typos has not been an observed problem with the reference +implementation. Generally delayed imports are not delayed forever, and errors +show up soon enough to be caught and fixed (unless the import is truly unused.) + + +Lazy dynamic imports +-------------------- + +It would be possible to add a ``lazy=True`` or similar option to +``__import__()`` and/or ``importlib.import_module()``, to enable them to +perform lazy imports. That idea is rejected in this PEP for lack of a clear +use case. Dynamic imports are already far outside the :pep:`8` code style +recommendations for imports, and can easily be made precisely as lazy as +desired by placing them at the desired point in the code flow. These aren't +commonly used at module top level, which is where lazy imports applies. + + +Deep eager-imports override +--------------------------- + +The proposed ``importlib.eager_imports()`` context manager and +``importlib.set_eager_imports()`` override both have shallow effects: they only +force eagerness for the location where they are applied, not transitively. It +would be possible (although not simple) to provide a deep/transitive version of +one or both. That idea is rejected in this PEP because experience with the +reference implementation has not shown it to be necessary, and because it +prevents local reasoning about laziness of imports. + +A deep override can lead to confusing behavior because the +transitively-imported modules may be imported from multiple locations, some of +which use the "deep eager override" and some of which don't. Thus those modules +may still be imported lazily initially, if they are first imported from a +location that doesn't have the override. + +With deep overrides it is not possible to locally reason about whether a given +import will be lazy or eager. With the behavior specified in this PEP, such +local reasoning is possible. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.