Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules of files specified on the command line can be cached with the wrong name #7339

Open
gcbirzan-plutoflume opened this issue Aug 22, 2022 · 9 comments · May be fixed by #7357
Open

Modules of files specified on the command line can be cached with the wrong name #7339

gcbirzan-plutoflume opened this issue Aug 22, 2022 · 9 comments · May be fixed by #7357
Labels
Bug 🪲 Import system Needs PR This issue is accepted, sufficiently specified and now needs an implementation

Comments

@gcbirzan-plutoflume
Copy link

gcbirzan-plutoflume commented Aug 22, 2022

Bug description

When specifying on the command line files to check where the name of namespace package shares the name with a regular package in another namespace, the latter is cached in astroid with the name of the first package. This will cause any usage of the first one to break.

This happens because in two places sys.path is modified (once in place, once when calling astroid) to include the namespace package that contains the second namespace (I know it's confusing, see the example). But, the correct behaviour is to rely on sys.path and if the file is importable, not modify sys.path. This might cache the module with the wrong name (if sys.path contains, for example, foo and foo/bar, the stuff in foo/bar now is accessible by two names), but this only matters if it's imported, at which point it will be re-imported, with the correct name.

Setup

mkdir -p ns1/ ns2/ns1; touch ns1/m2.py ns2/ns1/__init__.py; echo 'import ns1.m2' > ns1/m1.py

ls -R (for visualising structure)

$ ls -R
ns1     ns2

./ns1:
m1.py   m2.py

./ns2:
ns1

./ns2/ns1:
__init__.py

Configuration

[MESSAGES CONTROL]

disable=C0114,W0611

Command used

pylint ns1/m1.py ns2/ns1/__init__.py

Pylint output

************* Module m1
ns1/m1.py:1:0: E0401: Unable to import 'ns1.m2' (import-error)
ns1/m1.py:1:0: E0611: No name 'm2' in module 'ns1' (no-name-in-module)

Expected behavior

This is perfectly valid code, which can be ran from the command line.

My suggested solution, as explained in the description is to check whether the file is importable normally and skip modifying sys.path. I don't think this would break anything, and it might fix other bugs I cannot imagine right now.

What I did was create a is_importable function that just calls astroid.modutils.modpath_from_file and returns False when it gets an import error, True otherwise. Then I skipped adding to sys.path when the file was directly importable.

I did this in here and here. This broke some tests, but still trying to figure exactly why and whether the tests are wrong or my code breaks some edge case. I'll update the issue once I have an answer on that.

Pylint version

pylint 2.15.0-a0
astroid 2.12.2
Python 3.7.4 (default, Oct 26 2021, 12:29:57) 
[Clang 12.0.0 (clang-1200.0.32.29)]

OS / Environment

No response

Additional dependencies

No response

@gcbirzan-plutoflume gcbirzan-plutoflume added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Aug 22, 2022
@gcbirzan-plutoflume
Copy link
Author

Ah. For 8 tests broke the module being displayed in the errors, because the original one was not the importable path. I would say that this is actually the correct behaviour. An example is test_wrong_import_position_when_others_disabled (here which expects the error to contain just Module wrong_import_position, but the path should be regrtest_data.wrong_import_position

@Pierre-Sassoulas
Copy link
Member

Could you open a merge request so we can see the tests result directly please ?

gcbirzan-plutoflume added a commit to gcbirzan-plutoflume/pylint that referenced this issue Aug 22, 2022
…e file is importable before adding its first non regular package dir to sys.path.
@gcbirzan-plutoflume
Copy link
Author

Done. That's just the code, I'll be doing tests and still need to figure out why that 9th test is failing (the test is wrong, but it shouldn't fail)

@Pierre-Sassoulas Pierre-Sassoulas added Bug 🪲 Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Aug 22, 2022
@gcbirzan-plutoflume
Copy link
Author

Okay, thinking more about the #6538 fix test (test added in #6553), I'm not entirely sure what the behaviour there should be.

Technically, the code is not importable with the current sys.path. Yes, if you run it, it will run, but that's because sys.path in that case gets set to the path of script. This seems like an edgecase that is debeatable (at least in my opinion) and that can be fixed by setting PYTHONPATH from the outside, vs having no way to fix this bug.

@DanielNoord
Copy link
Collaborator

DanielNoord commented Aug 22, 2022

Thanks for the detailed analysis @gcbirzan-plutoflume. Much appreciated!

This is perfectly valid code, which can be ran from the command line.

This triggered me to do some further exploring. Because although you can run python ns1/m1.py ns2/ns1/__init__.py only ns1/m1.py will be executed. So as far as I can see this is not a situation with which Python normally has to deal with. That also brings me to the solution I have been exploring: patching sys.path on a per file basis rather than on a per run basis. This seems to make more sense from the perspective of a normal Python run.

I tried out the following diff and I think I ran into the same test failures like you saw.

diff --git a/pylint/lint/expand_modules.py b/pylint/lint/expand_modules.py
index 27d4bd812..9ce8db966 100644
--- a/pylint/lint/expand_modules.py
+++ b/pylint/lint/expand_modules.py
@@ -11,6 +11,7 @@ from re import Pattern
 
 from astroid import modutils
 
+from pylint.lint.utils import fix_import_path
 from pylint.typing import ErrorDescriptionDict, ModuleDescriptionDict
 
 
@@ -144,9 +145,11 @@ def expand_modules(
                 ) or _is_in_ignore_list_re(subfilepath, ignore_list_paths_re):
                     continue
 
-                modpath = _modpath_from_file(
-                    subfilepath, is_namespace, path=additional_search_path
-                )
+                with fix_import_path(subfilepath):
+
+                    modpath = _modpath_from_file(
+                        subfilepath, is_namespace, path=additional_search_path
+                    )
                 submodname = ".".join(modpath)
                 result.append(
                     {
diff --git a/pylint/lint/pylinter.py b/pylint/lint/pylinter.py
index 472b3b030..bfda09310 100644
--- a/pylint/lint/pylinter.py
+++ b/pylint/lint/pylinter.py
@@ -35,7 +35,7 @@ from pylint.constants import (
 from pylint.interfaces import HIGH
 from pylint.lint.base_options import _make_linter_options
 from pylint.lint.caching import load_results, save_results
-from pylint.lint.expand_modules import _is_ignored_file, expand_modules
+from pylint.lint.expand_modules import _is_ignored_file, expand_modules, get_python_path
 from pylint.lint.message_state_handler import _MessageStateHandler
 from pylint.lint.parallel import check_parallel
 from pylint.lint.report_functions import (
@@ -670,12 +670,12 @@ class PyLinter(
 
         # The contextmanager also opens all checkers and sets up the PyLinter class
         with self._astroid_module_checker() as check_astroid_module:
-            with fix_import_path(files_or_modules):
-                # 4) Get the AST for each FileItem
-                ast_per_fileitem = self._get_asts(fileitems, data)
+            # with fix_import_path(files_or_modules):
+            # 4) Get the AST for each FileItem
+            ast_per_fileitem = self._get_asts(fileitems, data)
 
-                # 5) Lint each ast
-                self._lint_files(ast_per_fileitem, check_astroid_module)
+            # 5) Lint each ast
+            self._lint_files(ast_per_fileitem, check_astroid_module)
 
     def _get_asts(
         self, fileitems: Iterator[FileItem], data: str | None
@@ -731,7 +731,8 @@ class PyLinter(
             if module is None:
                 continue
             try:
-                self._lint_file(fileitem, module, check_astroid_module)
+                with fix_import_path(get_python_path(fileitem.filepath)):
+                    self._lint_file(fileitem, module, check_astroid_module)
             except Exception as ex:  # pylint: disable=broad-except
                 template_path = prepare_crash_report(
                     ex, fileitem.filepath, self.crash_file_path
diff --git a/pylint/lint/utils.py b/pylint/lint/utils.py
index ff2812e7e..2628e9cff 100644
--- a/pylint/lint/utils.py
+++ b/pylint/lint/utils.py
@@ -12,7 +12,8 @@ from datetime import datetime
 from pathlib import Path
 
 from pylint.config import PYLINT_HOME
-from pylint.lint.expand_modules import get_python_path
+
+# from pylint.lint.expand_modules import get_python_path
 
 
 def prepare_crash_report(ex: Exception, filepath: str, crash_file_path: str) -> Path:
@@ -76,7 +77,7 @@ def _patch_sys_path(args: Sequence[str]) -> list[str]:
     changes = []
     seen = set()
     for arg in args:
-        path = get_python_path(arg)
+        path = arg
         if path not in seen:
             changes.append(path)
             seen.add(path)

Obviously this solution would need to be cleaned up a little, but this does pass the namespace test that was still failing for you. Would you be willing to see if you can make this diff work? And if it solves your issues?
If so I'd be happy if you were to use it in #7340.
In that case, one point of self-review I already saw: what about putting the with fix_import_path in _lint_file instead of guarding a call by it.

@gcbirzan-plutoflume
Copy link
Author

gcbirzan-plutoflume commented Aug 22, 2022

This triggered me to do some further exploring. Because although you can run python ns1/m1.py ns2/ns1/init.py only ns1/m1.py will be executed.

The phrasing was bad, and I was secretly hoping nobody would notice (especially after I failed at markdown several times. Edit: and failing again here :) ). But, what I meant to say is that all of these files can be imported and they would work just fine.

I guess it depends on what you expect pylint to do: pretend that you're executing each file separately or you're running it on some files that should work together.

So as far as I can see this is not a situation with which Python normally has to deal with. That also brings me to the solution I have been exploring: patching sys.path on a per file basis rather than on a per run basis. This seems to make more sense from the perspective of a normal Python run.

While I am pretty sure that is doable, and while I don't know what's the most common use-case of running pylint with multiple files as arguments, I still think mine is (multiple files that should work together) is more useful.

Obviously this solution would need to be cleaned up a little, but this does pass the namespace test that was still failing for you. Would you be willing to see if you can make this diff work? And if it solves your issues?

I will try it out tomorrow morning (European time), but at first glance, it won't work because the issue surfaces since the wrong module name ends up in astroid's module cache, so that would need a bit of work too.

@DanielNoord
Copy link
Collaborator

I guess it depends on what you expect pylint to do: pretend that you're executing each file separately or you're running it on some files that should work together.

I think if the files are passed as arguments separately I think it makes most sense to lint them "separately".

While I am pretty sure that is doable, and while I don't know what's the most common use-case of running pylint with multiple files as arguments, I still think mine is (multiple files that should work together) is more useful.

Normally I think people just lint the directory/directories that include the files.

I will try it out tomorrow morning (European time), but at first glance, it won't work because the issue surfaces since the wrong module name ends up in astroid's module cache, so that would need a bit of work too.

Locally I got this to lint without the errors.

I forgot I had pylint-dev/astroid#1747 checked out as my astroid branch. That might actually be required for this to fully work, but I'm not 100% sure.

@gcbirzan-plutoflume
Copy link
Author

Sorry for the delay, I spent an enormous amount of time trying to figure out a good solution for this (trying to reason about the many places and ways in which the path can be modified is not fun), but I cannot really see a way this would work. I haven't tried your branch of astroid, will do it tomorrow, but with the current one, I added two tests that still fail (check latest code from the PR).

I think, maybe, a compromise solution would be to have an option to not add things to the path, which would keep the current behaviour for running separate files, but would allow people that want to run pylint on a repo with properly configured sys.path to do so.

@Pierre-Sassoulas
Copy link
Member

a compromise solution would be to have an option to not add things to the path

If think this last part is #5226. If you read the issue you'll see that we think that fixing the sys path patching would be better. But fixing the sys path patching is hard and noone worked on it long enough to understand it well and make it make sense, so we might have to re-consider this.

@DanielNoord DanielNoord linked a pull request Aug 25, 2022 that will close this issue
4 tasks
@DanielNoord DanielNoord added Import system Needs PR This issue is accepted, sufficiently specified and now needs an implementation and removed Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning labels Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🪲 Import system Needs PR This issue is accepted, sufficiently specified and now needs an implementation
Projects
None yet
3 participants