Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow lazily loaded modules to be imported multiple times without forced resolution #127036

Open
Sachaa-Thanasius opened this issue Nov 19, 2024 · 0 comments
Labels
topic-importlib type-feature A feature request or enhancement

Comments

@Sachaa-Thanasius
Copy link
Contributor

Sachaa-Thanasius commented Nov 19, 2024

Feature or enhancement

Proposal:

tl;dr
I'd like __spec__ to be passed through by importlib.util._LazyModule.__getattribute__ without triggering the full load of the module. That way, the regular, internal import machinery doesn't accidentally trigger the full load when fishing the lazy module out of sys.modules. This can be caused by re-(lazy-)importing the module, and the result, a fully loaded module, is pretty unexpected.

Full Story
I've been trying to use importlib.util.LazyLoader lately and found a way it could be made more ergonomic.

To start off, a demonstration of what I want to work, based on the lazy import recipe in the importlib docs:

import importlib.util
import sys

def lazy_import(name):
    # Personal addition to take advantage of the module cache.
    try:
        return sys.modules[name]
    except KeyError:
        pass

    spec = importlib.util.find_spec(name)
    loader = importlib.util.LazyLoader(spec.loader)
    spec.loader = loader
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    loader.exec_module(module)
    return module

lazy_typing = lazy_import("typing")

# Let's import it a second time before actually using it here.
# This could even happen in another file. Ideally, *still* doesn't execute yet
# because we pull from the sys.modules cache.
lazy_typing = lazy_import("typing")

lazy_typing.TYPE_CHECKING  # Only *now* does the actual module execute.

The above recipe works, but without the sys.modules caching I added, the second import would cause the module to execute and populate, even though the user hasn't gotten an attribute from it yet.

Fair enough, it's just a recipe for the docs. It's not meant to cover all the edge cases and use cases. What about a different code snippet that tries not to manually perform every part of the import process, though?

Let's try again, but using an import hook like, say, a custom finder on the meta path to wrap the found spec's loader with LazyLoader. That way, it'll take advantage all the thread locks importlib uses internally and can even affect normal import statements. Here's an example:

# NOTE: This is not as robust as it could be, but it serves well enough for demonstration.

import importlib.util
import sys

# threading is needed due to circular import issues from importlib.util importing it while
# LazyFinder is on the meta path. Not relevant to this issue.
import threading

class LazyFinder:
    """A module spec finder that wraps a spec's loader, if it exists, with LazyLoader."""

    @classmethod
    def find_spec(cls, fullname: str, path=None, target=None, /):
        for finder in sys.meta_path:
            if finder is not cls:
                spec = finder.find_spec(fullname, path, target)
                if spec is not None:
                    break
        else:
            raise ModuleNotFoundError(...)

        if spec.loader is not None:
            spec.loader = importlib.util.LazyLoader(spec.loader)

        return spec


class LazyFinderContext:
    """Temporarily "lazify" some types of import statements in the runtime context."""
    
    def __enter__(self):
        if LazyFinder not in sys.meta_path:
            sys.meta_path.insert(0, LazyFinder)

    def __exit__(self, *exc_info):
        try:
            sys.meta_path.remove(LazyFinder)
        except ValueError:
            pass

lazy_finder = LazyFinderContext()

with lazy_finder:
    import typing    # Does the same thing as the earlier snippet, but for a normal import statement.

Unfortunately, the above code has the same flaw as the original importlib recipe when used directly: the module cache isn't being taken advantage of. However, it's not possible to work around from user code without a ton of copying.

Adding import typing again at the bottom will cause typing to get fully executed. This is demonstrable in two ways:

  1. Adding print statements checking the type of the module:

    ...
    with lazy_finder:
        import typing
    print(type(typing))
    
    # Doesn't matter if we're using the context manager again or not, the result is the same.
    # with lazy_finder:
    import typing
    print(type(typing))

    Output:

    > python scratch.py
    <class 'importlib.util._LazyModule'>
    <class 'module'>
  2. By putting the above code snippet in a file(e.g. scratch.py) then checking the output of python -X importtime -c "import scratch" before and after adding a second import typing statement:

    Before

    import time: self [us] | cumulative | imported package
    ...
    import time:       566 |       2035 | site
    ...
    import time:       196 |        196 |     _weakrefset
    import time:       622 |       3045 |   threading
    import time:        86 |         86 |   typing
    import time:      1569 |       5583 | scratch
    

    After

    import time: self [us] | cumulative | imported package
    ...
    import time:       504 |       1868 | site
    ...
    import time:      1127 |       3662 |   threading
    import time:        75 |         75 |   typing
    ...
    import time:       394 |       3374 |   re
    import time:        53 |         53 |   _typing
    import time:      3539 |      12346 | scratch
    

The reason for this, in my eyes, lack of correspondence, is a small implementation detail:

def _find_and_load(name, import_):
"""Find and load the module."""
# Optimization: we avoid unneeded module locking if the module
# already exists in sys.modules and is fully initialized.
module = sys.modules.get(name, _NEEDS_LOADING)
if (module is _NEEDS_LOADING or
getattr(getattr(module, "__spec__", None), "_initializing", False)):

Because the __spec__ is requested even when checking the module cache, and importlib.util._LazyModule makes no exceptions for attribute requests, well, the original loader will always execute and the module will populate. To get around this, a user would have to copy importlib.util._LazyModule and importlib.util.LazyLoader, modify them (see suggested patch below), and use those local versions instead.

Thus, I propose adding a small special case within _LazyModule to make usage with sys.modules more ergonomic:
If __spec__ is requested, just return that without loading the whole module yet. That way, instances of _LazyModule within sys.modules won't be forced to resolve immediately by regular import machinery, not until the module is visibly accessed by the user. The diff would be quite small:

--- current_3.14.py     2024-11-19 15:35:57.218717430 -0500
+++ modified_3.14.py    2024-11-19 15:36:21.608717512 -0500
@@ -171,6 +171,10 @@
     def __getattribute__(self, attr):
         """Trigger the load of the module and return the attribute."""
         __spec__ = object.__getattribute__(self, '__spec__')
+
+        if "__spec__" == attr:
+            return __spec__
+
         loader_state = __spec__.loader_state
         with loader_state['lock']:
             # Only the first thread to get the lock should trigger the load

I hope this makes sense and isn't too long-winded.

EDIT: Added a tl;dr at the top.
EDIT2: Added an easier way to demonstrate the full load being triggered.
EDIT3: Adjusted phrasing.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@Sachaa-Thanasius Sachaa-Thanasius added the type-feature A feature request or enhancement label Nov 19, 2024
@Sachaa-Thanasius Sachaa-Thanasius changed the title Allow lazily loaded modules to be imported multiple times without resolving Allow lazily loaded modules to be imported multiple times without forced resolution Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-importlib type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants