Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same distribution can appear multiple times on Python 3.8 #91

Closed
jaraco opened this issue Oct 22, 2020 · 15 comments
Closed

Same distribution can appear multiple times on Python 3.8 #91

jaraco opened this issue Oct 22, 2020 · 15 comments
Milestone

Comments

@jaraco
Copy link
Member

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Sep 30, 2019, 03:09

With Python 3.8, if one imports importlib_metadata, both the PathFinder and the MetadataPathFinder will appear on sys.meta_path with find_distributions methods... causing every package to be reported twice. This behavior is fine for some calls like find_distribution(name) which will just find the first one, but less desirable for ones like find_distributions(path=['.']), where one might expect to find exactly one (distribution on the current path), but instead finds two (the same one twice).

Here are some of the goals that led us here:

  • there should be an extensible interface whereby other package providers (like pyinstaller or some tool that imports from a database or the network but not from a typical file system) to provide metadata for those packages
  • the importlib_metadata behavior should be available on Python 3.8 and later even though importlib.metadata is available (possibly providing forward compatibility and feature backports).

I'm not sure what should happen here, but there are a few options that seem possible to me:

  1. Discourage users from using importlib_metadata on Python 3.8 or later. That would prevent importlib_metadata from installing its duplicate finder. This approach goes directly counter to the second goal above... and doesn't scale well. If one library imports importlib_metadata, they all get the new global state (MetadataPathFinder on sys.meta_path).
  2. On importlib_metadata don't install the sys.meta_path hook if on Python 3.8 or later (always rely on the system-provided one). That's probably suitable but limits the scope of backward compatibility that can be provided.
  3. If using importlib_metadata, bypass the system PathFinder for find_distributions. This approach would allow the backport to avoid interactions with finders as found in CPython but would still honor other distribution finders. This approach would address the issue when using importlib_metadata but not when using importlib.metadata (but some other library has imported importlib_metadata).
  4. Don't do anything and require all the clients to be aware that a single package might be advertised twice by different distribution finders.
  5. Always de-duplicate packages by name when returning multiple results, allowing duplicates to be advertised but giving precedence to the first one found.
@jaraco jaraco added this to the 1.0 milestone Oct 22, 2020
@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Sep 30, 2019, 03:12

changed the description

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Sep 30, 2019, 03:13

changed the description

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Sep 30, 2019, 03:16

See jaraco/pytest-checkdocs#2 where I encountered this issue.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @RonnyPfannschmidt on Sep 30, 2019, 12:07

i am also observing this on a python 3.7 virtualenv where lib64 is symlinked to lib

i tested this with a ipython started in site-packages and eded up with triples there,

as a distribution was found in lib, lib64 and .

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @warsaw on Sep 30, 2019, 17:54

On importlib_metadata don't install the sys.meta_path hook if on Python 3.8 or later (always rely on the system-provided one). That's probably suitable but limits the scope of backward compatibility that can be provided.

I'm inclined toward adding logic to the standalone backport that knows about Python version peculiarities, rather than the other way around. The reasoning being that importlib_metadata can be rev'd much more quickly than can be importlib.metadata. So if for example, something changes in a 3.9 alpha, the standalone version can be quickly adapted.

Can you explain a bit more about the limits this approach imposes?

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Oct 26, 2019, 18:25

To illustrate, this patch corrects the test failure reported in #93 by disabling the stdlib distribution resolver:

diff --git a/importlib_metadata/_compat.py b/importlib_metadata/_compat.py
index 4f54864..f1439eb 100644
--- a/importlib_metadata/_compat.py
+++ b/importlib_metadata/_compat.py
@@ -52,6 +52,7 @@ __all__ = [
 
 def install(cls):
     """Class decorator for installation on sys.meta_path."""
+    del sys.meta_path[2].find_distributions
     sys.meta_path.append(cls())
     return cls
 

This patch is a rough implementation of option (3) and isn't viable, but begins to illustrate how the issue might be addressed.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 02:57

In #100, I learned that there's another issue at play here. The fact that both importlib_metadata and importlib.metadata both define Distribution classes means that each implementation is exporting objects that have no common base class, invalidating that particular check (that all results from find_distributions returns one class of object).

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 03:17

I'm inclined toward adding logic to the standalone backport that knows about Python version peculiarities, rather than the other way around.

Agreed.

Can you explain a bit more about the limits this approach imposes?

The main limitation is that from the perspective of a DistributionFinder implementation, one must register their finder with one implementation (stdlib or backport), but then have your Distribution classes discovered by either implementation. So in some sense, neither implementation can have primacy. Both implementations have to honor the presence of the other.

The patch above has one nice behavior - when it installs its own MetaPathFinder, it disables the stdlib MetaPathFinder, which will cause subsequent calls to find_distributions to pull backport Distributions. In a sense, (any behavior) importing the backport means the backport takes precedence (even by clients that only used the stdlib implementations).

I'm thinking more and more that's okay and maybe even preferable, but I can imagine risks with that approach, mainly that the code can't guarantee that the Distribution objects are of a single (super) type.

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 09:50

mentioned in commit 73427ca

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 10:10

mentioned in merge request !98

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 19:29

closed via commit 73427ca

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 19:29

mentioned in commit c854e9a

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Nov 30, 2019, 19:29

closed via merge request !98

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @layday on Jan 27, 2020, 18:50

@jaraco FYI this patch broke compatibility with PyOxidizer whose finder is not attached to a module:

Traceback (most recent call last):
  [...]
  File "importlib_metadata", line 471, in <module>
  File "importlib_metadata._compat", line 62, in install
  File "importlib_metadata._compat", line 79, in disable_stdlib_finder
  File "importlib_metadata._compat", line 76, in matches
AttributeError: 'PyOxidizerFinder' object has no attribute '__module__'

Is there an expectation that a finder should have a __module__ attribute?

@jaraco
Copy link
Member Author

jaraco commented Oct 22, 2020

In GitLab by @jaraco on Jan 28, 2020, 03:07

Is there an expectation that a finder should have a __module__ attribute?

Not necessarily. In fact, probably not if the finders spec doesn't require it. The relevant specs, like PEP 302, don't specifically call for a __module__, but they do reference the finder as a "object", and the __module__ is defined on user-defined functions and classes where they're defined. It's a little surprising that the PyOxidizerFinder doesn't have an __module__, but that's no reason importlib_metadata can't support that case. I filed an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant