Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package __path__ is an empty list #557

Closed
lesteve opened this issue Jan 5, 2024 · 12 comments
Closed

package __path__ is an empty list #557

lesteve opened this issue Jan 5, 2024 · 12 comments
Milestone

Comments

@lesteve
Copy link
Contributor

lesteve commented Jan 5, 2024

I noticed this when working on moving scikit-learn to Meson. I tried with numpy and get the same result so I am guessing this is not something due to our particular setup?

With editable install (pip install -e . --no-build-isolation)

❯ python -c 'import sklearn; print(sklearn.__path__)'
[]

When I use normal install (pip install . --no-build-isolation), I get something like this:

❯ python -c 'import sklearn; print(sklearn.__path__)'
['/home/lesteve/micromamba/envs/skimage-dev/lib/python3.12/site-packages/sklearn']

This is not a huge issue since we were able to use __file__ for our needs, but I thought I would open an issue anyway.

@dnicolodi
Copy link
Member

It is my understanding that the __path__ and __file__ attributes of module objects are implementation details of the import mechanism and should not be relied upon to access package resources. In particular, the __path__ attribute may be an empty list. The only supported way to access package resources is via the importlib.resources module.

Editable installs as implemented by meson-python use a customized module loader to load the module code from the source directory and the build directory, as needed. A consequence of this, is that we need to make the __path__ attribute of package modules empty, to have the custom loader be called for all imports.

@rgommers
Copy link
Contributor

Agreed. For the record, I haven't encountered any issues when switching to importlib.resources in NumPy/SciPy as far as I remember (aside from the API removals in importlib for Python 3.12 that require if/else conditional code).

@lesteve
Copy link
Contributor Author

lesteve commented Jan 15, 2024

Thanks for your answer, it seems like a side-effect of this is that pkgutil.walk_packages does not find subpackages.

This is a mild annoyance since we are using pkgutil.walk_packages to collect some of our tests. We call this kind of tests "common tests", because they check some invariant for all the scikit-learn models (e.g. that if you fit a model on 1d array, you will get an error of a certain kind). See scikit-learn/scikit-learn#28040 (comment) for more details.

@dnicolodi
Copy link
Member

dnicolodi commented Jan 15, 2024

Please refer to the documentation of pkgutil.walk_packages(). In particular:

Note: Only works for a finder which defines an iter_modules() method. This interface is non-standard, so the module also provides implementations for importlib.machinery.FileFinder and zipimport.zipimporter.

The fact that the custom loader exposes an empty __path__ has nothing to do with this other issue. We could implement this method for the import machinery used by editable installs, but as the interface is not standardized there is no guarantee that this will keep working for future Python releases. Thus, if anything, it is very low priority.

Looking at this a tiny bit more in depth, the interface of iter_modules() is not even documented. If we want to implement this we would need to base it on what the other module finders implement, rather than on a specification.

@lesteve
Copy link
Contributor Author

lesteve commented Jan 15, 2024

OK thanks for your pointers, I'll try to look into it a bit more. I have to admit I am not familiar at all with this kind of subtle import machinery.

@rgommers
Copy link
Contributor

I'll note that this usage of pkg_util.walk_packages works: https://github.com/scipy/scipy/blob/8e9d43838b7db8d40fc4567f3aafeed68466b464/scipy/_lib/tests/test_public_api.py#L241.
That test (pytest scipy/_lib -k test_all_modules_are_expected) passes for me with an editable install.

It'd be better to avoid __path__ and __file__ completely though, because it's indeed not clear when this will or won't work.

@lesteve
Copy link
Contributor Author

lesteve commented Jan 15, 2024

OK weird, just to make 100% sure, can you confirm that scipy.__path__ is not [] then when using meson editable install?

If that's the case, there may be something that is sklearn specific, I need to have a closer look.

@rgommers
Copy link
Contributor

Eh oh, good catch. The test is silently passing, but it is an empty list and hence the test does not work as intended in editable mode:

scipy.__path__:  []
scipy.__file__:  /Users/rgommers/code/bldscipy/scipy/__init__.py
scipy.__name__:  scipy

@lesteve
Copy link
Contributor Author

lesteve commented Jan 15, 2024

Eh oh, good catch. The test is silently passing, but it is an empty list and hence the test does not work as intended in editable mode

OK so at least this is not scikit-learn specific 😌

@ogrisel
Copy link

ogrisel commented Jan 15, 2024

Thanks for the feedback @dnicolodi. After a bit of trial and error, here is the only way I could get to write an alternative to pkgutil.walk_packages that never relies on the __path__ attribute of inspected packages.

from pathlib import Path
from importlib.util import find_spec
from pkgutil import iter_modules

# Alternative to sklearn.__path__ that does not rely on sklearn.__file__:

def get_package_path(package):
    spec = find_spec(package)
    if spec is None:
        raise ImportError(f"No module named {package!r}")
    return [Path(spec.origin).parent]


print(f"{get_package_path('sklearn') = }")


# Alternative to pkgutil.walk_packages that does not rely on pkg.__path__:

def walk_packages(path=None, prefix='', onerror=None):
    """Yields ModuleInfo for all modules recursively
    on path, or, if path is None, all accessible modules.

    'path' should be either None or a list of paths to look for
    modules in.

    'prefix' is a string to output on the front of every module name
    on output.

    Note that this function must import all *packages* (NOT all
    modules!) on the given path, in order to access the __path__
    attribute to find submodules.

    'onerror' is a function which gets called with one argument (the
    name of the package which was being imported) if any exception
    occurs while trying to import a package.  If no onerror function is
    supplied, ImportErrors are caught and ignored, while all other
    exceptions are propagated, terminating the search.

    Examples:

    # list all modules python can access
    walk_packages()

    # list all submodules of ctypes
    walk_packages(ctypes.__path__, ctypes.__name__+'.')
    """

    def seen(p, m={}):
        if p in m:
            return True
        m[p] = True

    for info in iter_modules(path, prefix):
        yield info

        if info.ispkg:
            try:
                __import__(info.name)
            except ImportError:
                if onerror is not None:
                    onerror(info.name)
            except Exception:
                if onerror is not None:
                    onerror(info.name)
                else:
                    raise
            else:
                path = get_package_path(info.name)

                # don't traverse path items we've seen before
                path = [p for p in path if not seen(p)]

                yield from walk_packages(path, info.name+'.', onerror)


for info in walk_packages(get_package_path('sklearn'), 'sklearn.'):
    print(info.name)

The walk_packages function is just a copy of the original pkgutil.walk_packages in the standard library (https://github.com/python/cpython/blob/d457345bbc6414db0443819290b04a9a4333313d/Lib/pkgutil.py#L39). The only line that I changed is the definition of the local path variable that relies on my get_package_path helper to use the loader retrieved by find_spec instead of using the value found in __path__.

It seems to fix the problem for scikit-learn when installed in meson-python editable mode, but it's a shame not to be able to use the standard pkg.walk_packages when using the editable mode.

Shall this be considered a bug in pkgutil.walk_packages? I am not familiar enough with the package import machinery to tell whether or not its use of the __path__ attribute is unsafe.

ogrisel added a commit to ogrisel/meson-python that referenced this issue Jan 16, 2024
This solves the problem of empty __path__ attributes for packages
imported from an editable install.

Fixes mesonbuild#557.
@ogrisel
Copy link

ogrisel commented Jan 16, 2024

By reading more about import machinery in CPython and the source code of mesonpy, I think this is just a missing feature in mesonpy's custom module finder for editable install and the fix is quite straightforward, see #562.

@dnicolodi
Copy link
Member

The implementation of get_package_path() above is a complex way to get to os.path.dirname(__file__). The walk_packages() implementation is still based on pkgutil.iter_modules() that does not work for module finders that implement loading of modules from anything that is not a filesystem structure that mirrors the logical organization of the modules. The editable install module finder does just that. I haven't had time yet to investigate closely if there is a way to make pkgutil.iter_modules() work for custom module finders. If there is, it is not documented and it is not part of the interfaces that a standard module finder is supposed to be implemented. This is hinted from the documentation I quoted in a previous comment.

dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 4, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 4, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 4, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 4, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 5, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 5, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 5, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 5, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 5, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 6, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557, mesonbuild#568.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 6, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
Fixes mesonbuild#568.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 27, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
Fixes mesonbuild#568.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Feb 27, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes mesonbuild#557.
Fixes mesonbuild#568.
rgommers pushed a commit that referenced this issue Feb 28, 2024
Set the __path__ module attribute to a placeholder path.  Because how
editable installs are implemented, this cannot correspond to the
filesystem path for the package.  And add a sys.path_hook that
recognizes this paths and returns a path loader that implements
iter_modules(). This allows pkgutil.iter_packages() to work properly.

Fixes #557.
Fixes #568.
@rgommers rgommers added this to the v0.16.0 milestone Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants