-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find_packages() doesn't find PEP 420 packages #97
Comments
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): I'd like to say that Setuptools should support PEP 420 packages. The harder question is how should I imagine setuptools could include all suitable directories, possibly omitting explicitly excluded paths. This change would be backwards-incompatible, but could provide packagers with options to produce compatible dists. I considered using Python's import machinery to find packages, but I don't believe the import machinery provides any discovery tools. The other thing to consider - if setuptools supports this mode, it will not work on other versions of Python, so will require Python 3.3+. This gives me an idea - perhaps setuptools can use the package metadata to determine when PEP 420 package discovery is enabled. It could inspect the Trove classifiers and if Python versions are specified and only include versions >= 3.3, the feature would be enabled. That technique would be somewhat implicit, but properly-documented, it could provide an elegant way to allow a packager to both include PEP 420 packages and declare the Python requirement at the same time. |
Original comment by gwideman (Bitbucket: gwideman, GitHub: gwideman): Yes, the more I look at PEP 420-style packages, the less I am able to see what distinction there is between a package, and simply a directory tree containing python files at any level, a tree that import+sys.path can navigate arbitrarily. Packages become whatever directories can be reached by import combined with sys.path. And neither the imports or sys.path are readily available to packaging software. It may be that "packages" as a special level of modularity were an unnecessary idea in the first place, and it's as well to discard them. But if packages are thought to be an important level of modularity, then it should be up to the package developer to positively ordain them (as with So I'm a bit mystified what the big plan is regarding packages. |
Original comment by wyatt (Bitbucket: wyatt, GitHub: wyatt): I ran into this issue yesterday and took a quick stab at a patch. It works pretty much the same as before, but it's necessary to specify what to
It's slightly more complicated than I'd prefer because of the need to specify On the other hand, maybe allowing includes to be specified is generally useful. It also makes Anyway, here's the patch:
|
Original comment by wyatt (Bitbucket: wyatt, GitHub: wyatt): I just pushed two more alternative approaches. The first uses |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): Add support for PEP 420 namespace packages to find_packages() On Python 3.3+, The other way this supports PEP 420 is by making sure Fixes issue #97 |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): I've merged the pull request, but I find that the output includes quite a few directories not previously covered. Consider the comparison on setuptools itself:
I think perhaps the example above using the heuristic of testing for the presence of *.py files may be better. On the other hand, Python will import any directory.
If we stick with the generous (and probably more correct) implementation to accept any directory as a package, package maintainers are going to have to be a lot more explicit about which directories are packages. It almost renders find_packages useless. Some more consideration is going to have to go into this idea. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): Add support for PEP 420 namespace packages to find_packages() On Python 3.3+,
The other way this supports PEP 420 is by making sure Refs issue #97 |
Original comment by gwideman (Bitbucket: gwideman, GitHub: gwideman): Jason:
I think you are reaching a level of head scratching similar to my 2013-11-05 comment. It seems to me that PEP 420's implications weren't fully thought through, as it undermines the ideas about what the package concept facilitates and what it protects against (not just causing problems for find_packages() ). Perhaps someone at a higher pay grade could distill a little better what the big-picture goal is for packages, and figure out whether PEP 420 gets there. To that end, it might be worthwhile recording in more detail the implications that you've encountered. For example, your discussion of package maintainers needing "to be a lot more explicit" -- what form do you envision that taking? |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): By the way, I don't think there's a "big plan" regarding packages. PEP-420 was the plan and where it's deficient, more discussion should probably happen in python-dev. I was there for the implementation of PEP-420. I feel like it's a good thing to simply and directly allow folders to represent packages. In the past, I was always bothered by the fact that it took two entities (a directory and an Regardless, the implementation for Python 3.3 and 3.4 is set, so setuptools should support those models to the best of its ability. I think for find_packages, it might be worthwhile to do more heuristics, namely -- only inferring a directory is a package if it has modules somewhere within it. |
Original comment by arfrever (Bitbucket: arfrever, GitHub: arfrever): I suggest to skip directories matching some hardcoded patterns by default (unless such directories are matched by element of include parameter):
|
Original comment by wyatt (Bitbucket: wyatt, GitHub: wyatt): The heuristics are tricky. Here's what I came up with (my original patch included similar heuristics):
As far as excluding certain hard coded directories, it seems like that shouldn't be necessary, since that's basically the purpose of the new In addition, you can use a little trick to exclude non-package directories that are inside a package: add an extension to them (e.g., I also wrote a blog post about this: http://wyattbaldwin.com/2014/02/07/pep-420-namespace-packages-somewhat-surprising/. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): After further consideration on this issue, particularly around backward-compatibility, I'm inclined to back out the changes as published and instead create a new function, designed to supersede find_packages, with PEP-420 support. This approach would give opt-in capability for projects that use PEP-420 package structures, but would leave the existing functionality unchanged. I've been brainstorming over what this new function could be called. I'm leaning toward "discover_packages", "locate_packages", or "find_420_packages". The latter provides better clarity of its original purpose, but will appear like cruft in the future (once Python 3.2 is the baseline). |
Original comment by gwideman (Bitbucket: gwideman, GitHub: gwideman): @ Jason: Where function names suggest an overlap of functionality, I strongly favor that the distinctions in the names actually indicate what the functional distinctions are. Cases where functions are distinguished only by arbitrary differences in their names tend to present endless confusion, searches of docs, and so on. The difference between "find", "locate" and "discover" does not pertain to scope, which is the salient difference in functionality. Also, having two overlapping functions with names that are not alphabetically adjacent (in docs and autocomplete) misses the opportunity to alert the unwary programmer that there's a choice to be made. If the distinctness of meaning is: "find_distribution_and_namespace_packages", then I, for one, am in favor of calling it that. It's long, but we have copy/paste and IDE autocomplete. If that's considered too long, then "find_420_packages", while somewhat cryptic, is at least precise. If it suggests cruft, that's perhaps a good thing -- that here some complexity lies, please be fully informed of the nuances of this function. Side question: |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): @gwideman I do agree with your sentiment. I am concerned about the counter sentiment which is that this function is meant to be a convenience function, providing a shortcut for specifying the project names explicitly (and sometimes redundantly). A secondary advantage is that the function adds to the simplicity and elegance of the setup function. Having a function name like 'find_distribution_and_namespace_packages', while some environments might auto-complete, many (most?) won't. And furthermore, for a function like this, the aesthetic is relevant. A clumsy name, while functional, would detract from use (and thus the appeal of setuptools). Let me elaborate on my thoughts about the alternative names. Yes, they do mirror find_packages semantically without reflecting the disparity. This aspect was by design, with the intention that (a) it would be documented clearly (probably with a show-by-default deprecation warning), and (b) the new function would entirely supersede the old name. So, I'm somewhat reluctant to have a new function carry the baggage of a legacy function that's to be deprecated. That said, you make some good arguments, and you've pushed me closer to the 'find_420_packages' name. I should mention, I've considered 'find_all_packages' but rejected it because 'all' is vague yet still clumsy. I've also considered the Microsoft technique, find_packages_ex, but that's clumsy too and even more vague. Given this ticket/PR is your initiative, I'm inclined to defer some extra weight to your judgment, so please consider these aspects and give me your favorite (or two). Thanks again for your help on this. |
Original comment by gwideman (Bitbucket: gwideman, GitHub: gwideman): @jason R. Coombs: "defer some extra weight [to me]" Hahaha -- that's a lot of responsiblity! First, I thought I posted this additional comment, but perhaps not: Is adding an optional named argument to the existing function to select the 420-inclusive behavior a way forward that avoids the naming issue? Or is this beyond the pale for some reason? If a new function is the way to go, then I certainly see the reasonableness of all your points. If long names are out, then personally I lean strongly to the find_420_package name. Like I mentioned, it precisely captures the intent of the function, and its "technical" appearance correctly conveys that there's some nuances to understand. I strongly dislike the names that change just "find" to "locate" or "discover". I would also be quite happy with find_package_ex, or find_package2. These aren't beautiful, but at least they convey that this function is newer in some way than the old one. |
Original comment by wyatt (Bitbucket: wyatt, GitHub: wyatt): I was just thinking about the idea of adding a keyword arg, too. It would only be relevant/active on Python 3.3 and up. Perhaps it would be disabled by default on 3.3 too so that people who aren't using namespace packages--which is probably the common case--wouldn't need to think about it. Something like |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): In general, I avoid parameters, especially simple booleans, that select behavior. I'd prefer accepting/passing strings or actual functions to select varying behaviors. But even if one assumes a keyword argument as Wyatt suggests, I'd be inclined to have an alias where the default is True. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): Especially when I see this:
Of course, it's usually more complicated than that, but that function could be re-written to just have the caller call the function he cares about. More importantly, an important lesson I've taken from cleanroom development, functional programming, and test-driven design is that functions should be small, isolated, and with as few branches and side-effects as possible. Adding a boolean flag that creates in internal branch ties the external interface to a branch in the internal behavior. This means that to comprehend fully the meaning of the flag, one must comprehend the entire function and all of the states it might be in. Proper variable naming can help indicate intent, but it won't overcome the increasing cognitive burden of the behavior. This becomes increasingly apparent when one attempts to refactor such a function only to find that the single boolean parameter must be passed unused to reach its inner function. I realize my explanation probably sounds esoteric, but it stems from my attempt to explain what seems intuitive to me after reading books like Refactoring. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): Case in point is the recent refactorings I made to the find_packages function. I wanted to be rid of the 'stack' concept (which mutated a list for the purposes of iteration) and the 'out list' (which initialized an empty list and populated it incrementally for iteration). I wanted instead something that leveraged clean iteration over the essential objects. Once I did that though, it becomes obvious why a |
Original comment by pje (Bitbucket: pje, GitHub: pje): The easiest way to fix this problem is to simply auto-add namespace_packages to the packages list in the Distribution object. find_packages should not be used for this, because of the possibility of false positives. In other words, if you want to include namespace packages, list them in namespace_packages, and don't list them in the regular packages. This means that find_packages should only find traditional packages, not namespace packages. The explicit listing of namespace_packages needs to happen for backward compatibility anyway, and setuptools already installs those without |
Original comment by pje (Bitbucket: pje, GitHub: pje): I forgot to mention/may not have made this clear: currently, namespace packages must be explicitly listed in the |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): @pje excellent suggestion to use namespace_packages directly. I do think, however, that PEP420 doesn't just apply to namespace packages but to any package, in which case the use of namespace_packages only serves as another improvement in the heuristic. |
Has anyone tried using the interface currently presented by Setuptools, namely:
|
Most of my projects have migrated or are migrating their metadata and options to setup.cfg. How do I use that from setup.cfg? |
Good point. I think I prepared/exposed that interface for comment and experimentation. If it's a suitable solution, we should definitely present it in a more official manner. I think I'd intended to test it out with the |
@jaroco feel free to contribute to the PR |
@agronholm the official documentation https://docs.python.org/3/distutils/configfile.html lacks a lot, do you have more information? |
Regarding? |
@agronholm regardless, distutils is different from setuptools, unless both have been merged, of which i am unaware, see also https://stackoverflow.com/a/25372045. so this might not affect you at all. |
@silkentrance I still have no clue what you're asking me about. The way I see this going forward is by making use of the |
…space packages
…space packages
…space packages
…space packages
…space packages
…space packages
…space packages
…space packages
…space packages
…space packages
This fixes GH pypa#97 by introducing an alternate version of find_packages that works with PEP 420 namespace packages.
@pganssle How do I use |
@agronholm It was @silkentrance that did the implementation, I just merged it. Good point, though, I don't believe this is supported. I will create a new issue. |
tests: use sys.executable instead of hardcoding "python"
Originally reported by: gwideman (Bitbucket: gwideman, GitHub: gwideman)
on the developer machine will fail to find packages that lack a
__init__.py
file, as is allowed in Python 3.3. However, such packages listed explicitly: packages=['mypkg'] do appear to get included and later installed.Note: When testing this, before each test be sure to delete all generated metadata, including that which setup may previously have placed in the original source directory, as it seems that setup may use metadata created on a previous run in order to include files.
This is part of a general problem reported in issue #83, but I've logged it separately as it's specifically about setuptools.
The text was updated successfully, but these errors were encountered: