-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classify imports correctly with isort
#93
Conversation
We are only interested in finding imports that refer to _external_ code (that is, modules/packages for which a corresponding dependency should be declared). Until now, we've used isort.place_module() to identify (and ignore) stdlib modules, but there are more module types we need to ignore: - from __future__ import ...: This is a special statement handled by the Python language itself (used to backport newer language features into older language versions). - from fawltydeps.utils import ...: I.e. self-imports, a module that is part of a package will often import other modules from the same package. - from utils import ...: Very similar to the above, a module will often import another module found in the same directory (or less commonly, found anywhere else in the configured $PYTHONPATH). We haven't tackled these kinds of imports until now, but as it happens, we have been sitting on the solution this whole time: isort.place_module() returns a string that classifies the kind of import passed to it. AFAICS[1] the possible return values are: - "FUTURE": from __future import ... - "STDLIB": Imports of stdlib modules, the one category we have used until now for ignoring modules. - "THIRDPARTY": AFAICS, this corresponds _exactly_ to the external imports that we are ultimately interested in. - "FIRSTPARTY": Corresponds to the last two categories above: self- imports, as well as imports of local modules/packages. - "LOCALFOLDER": AFAICS, this corresponds only to _relative_ imports (from .utils import ...) which we already ignore in our parser (see the latter half of parse_code()). So by flipping our test from: def is_stdlib_import(name: str) -> bool: return isort.place_module(name) == "STDLIB" to: def is_external_import(name: str) -> bool: return isort.place_module(name) == "THIRDPARTY" we can - in principle - begin to ignore both __future__ and first-party imports. However, there are some complications (to be solved in the following commits): - For STDLIB imports, isort must be told which _versions_ of the Python stdlib should be considered for valid stdlib import names. - For correctly differentiating between FIRSTPARTY and THIRDPARTY imports, isort must be told about the location of project itself, i.e. where to look for potential first-party imports. [1]: https://github.com/PyCQA/isort/blob/5.12.0/isort/sections.py and https://github.com/PyCQA/isort/blob/5.12.0/isort/place.py contain the declaration of these strings, as well as the code that implements isort.place_module(), respectively.
In the `requests` project, we find the following code: try: # Python 3 from http.server import HTTPServer, SimpleHTTPRequestHandler except ImportError: # Python 2 from BaseHTTPServer import HTTPServer from SimpleHTTPServer import SimpleHTTPRequestHandler where the try-block attempts to import things from the Python 3 stdlib, and the except clause falls back to attempting the same imports from their corresponding locations in the Python 2 stdlib. FawltyDeps, at this point in time, finds _all_ the above import statements, and has no logic to determine that some of them are dead code on Python 3. It merely asks isort whether each of them are STDLIB modules (by passing "http", "BaseHTTPServer", and "SimpleHTTPServer" to isort.place_module() in sequence). By default, isort only considers the superset of Python 3 versions when building its set of stdlib modules, which means that the latter two modules above are NOT considered part of the stdlib. As a result FD then reports them as undeclared dependencies. Fix this by configuring isort to consider _all_ versions (i.e. Python 2 _and_ Python 3) of the Python stdlib. There is a (small) risk associated with this change: Say that a project uses an external dependency whose import name overlaps with a stdlib module from an old version of Python (a version on which this particular project will never run). In this case FD will mistakenly assume that this external dependency is an stdlib import: - If the external dependency is not declared, FD will incorrectly omit it from the list of undeclared deps. - If the external dependency is declared, FD will incorrectly report it as an unused dependency. At this point in time we assume that the number of projects that contain fallback imports for Python 2 far outweighs the number of projects that include external dependencies that happen to overlap with outdated stdlib modules.
Recall that isort.place_module() returns one of these strings: "FUTURE" (for `__future__` imports), "STDLIB" (for stdlib imports, see parent commit for more details), "THIRDPARTY", "FIRSTPARTY", and "LOCALFOLDER" (relative imports). These categories are fairly clearly defined at this point, except for the exact difference between FIRSTPARTY and THIRDPARTY: In the Python syntax itself, there is nothing to differentiate these: `import foo` could refer to a foo.py in the same directory (a FIRSTPARTY import), or it could refer to a "foo" package from PyPI (a THIRDPARTY import). Likewise, `from foo_package import bar` could refer to the _current_ package (or a sub-package of the current package), or it could refer to an external dependency. It all comes down to the context in which the import is made. isort.place_module() is able to discern this context, as long as its `directory` parameter is configured correctly, i.e. to a directory where FIRSTPARTY imports can be found/resolved. Anything that isort is unable to resolve/find inside this directory defaults to being classified as THIRDPARTY. Since we're specifically looking for THIRDPARTY imports, it is therefore important that this `directory` is correctly configured, otherwise we will report first-party imports as undeclared external dependencies! For now, we fix this by setting `directory` to the path that is passed to parse_dir(), and otherwise defaulting back to the current directory. TODO/FIXME: There are some scenarios where this is probably not the correct solution, for example: - Passing --code as a single file or a stdin stream will bypass parse_dir() altogether, and the current directory will be used to find FIRSTPARTY imports. This may or may not be what the user intended. - The project may do trickery with $PYTHONPATH or other mechanisms to alter where it is appropriate to look for FIRSTPARTY imports. We may have to consider adding command-line (and configuration) options to FawltyDeps for directly specifying this `directory`, and/or for including/excluding specific names as FIRSTPARTY imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unsettling how many issues this PR manages to solve in a concise and elegant way 🤯
Very helpful commits as usual, and it was thrilling to see the dependencies of requests
disappear one by one.
Just a note that we should not forget to open an issue to hold the following:
TODO/FIXME: There are some scenarios where this is probably not the
correct solution, for example:
- Passing --code as a single file or a stdin stream will bypass
parse_dir() altogether, and the current directory will be used to find
FIRSTPARTY imports. This may or may not be what the user intended.- The project may do trickery with $PYTHONPATH or other mechanisms to
alter where it is appropriate to look for FIRSTPARTY imports.
Created #95 now. |
It almost feels like I am cheating, leaning so heavily on
isort
to correctly classify imports 😅. That said, correctly differentiating between first- and third-party imports (as well as the other easier-to-discern categories) is certainly part ofisort
's core logic, so we're better off reusing their solution rather than rolling our own from scratch.It does come at the price of having to configure
isort
correctly, which is not trivial: I took a look at the completeisort.Config
definition, and it is suprisingly large and complex. For now, I just hope that we can get away by setting only the minimal stuff we need, and that the defaults are otherwise working in our favor. Another - smaller - cost is that this brings us further away from removingisort
as a core dependency, something we were contemplating before this PR (issue #26).This PR should directly solve issue #82, as well as tasks/checkboxes 2 and 3 on issue #89, and the first two tasks/checkboxed on issue #90.
Commits:
extract_imports
: Useisort
to filter out more non-external importsextract_imports
: Configureisort
to consider ALL versions of the stdlibextract_imports
: Configureisort
to correctly find first-party imports