Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support an Option to Disable sys.path patching #5226

Open
hmc-cs-mdrissi opened this issue Oct 28, 2021 · 14 comments · Fixed by #6405
Open

Support an Option to Disable sys.path patching #5226

hmc-cs-mdrissi opened this issue Oct 28, 2021 · 14 comments · Fixed by #6405
Labels
Enhancement ✨ Improvement to a component Maintenance Discussion or action around maintaining pylint or the dev workflow namespace-package Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning

Comments

@hmc-cs-mdrissi
Copy link
Contributor

hmc-cs-mdrissi commented Oct 28, 2021

Current problem

I want an argument/config option to disable sys.path patching. Pylint currently modifies sys.path based on __init__.py in folders of the linted files. sys.path patching is messy to get right and current logic for it here has a bug for implicit namespace packages that has been open for years. As a side effect a simple package structure of,

foo/
--> A.py

bar/
--> foo/
------> A.py

where bar is a namespace package breaks if you try to lint both foo.A and bar.foo.A together.

While making implicit namespace packages work with sys.path patching is probably possible, sys.path patching means files are linted in a way different than actual execution of those imports in interpreter. You can have a folder pass because of patching, but running script would fail from that location if uninstalled.

A related issue is importing files in two separate packages sharing a name like in this issue is incompatible with current sys.path patching. Any solution for that issue will involve either not patching sys.path or patching more selectively.

Desired solution

Support a config argument/command line argument to disable patching. The current patch logic is only done in two places here and here. An option to disable it entirely would be a small patch to pylint. I'd be happy to make the patch if this change would be supported.

Additional context

No response

@hmc-cs-mdrissi hmc-cs-mdrissi added Enhancement ✨ Improvement to a component Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Oct 28, 2021
@Pierre-Sassoulas Pierre-Sassoulas added Maintenance Discussion or action around maintaining pylint or the dev workflow and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Oct 29, 2021
@Pierre-Sassoulas
Copy link
Member

I think this should not be an option that users can modify, and pylint should get it right for namespace package or any package without user intervention.

@hmc-cs-mdrissi
Copy link
Contributor Author

hmc-cs-mdrissi commented Oct 30, 2021

The main reason I'm avoidant is I don't see a good way to detect namespace packages correctly. setuptools handles namespace packages by asking users to give the name of each one in setup.py/setup.cfg as mentioned here. If setuptools doesn't support automatic detection of namespace packages I'm unsure you can without fairly complex logic. pytest is another major tool that has same bug in normal usage and breaks with namespace packages. They have a workaround in pytest (alternative import configuration option), but it imposes weird restriction (test modules can't import other test modules) and requires user configuration. The weird restriction is inconvenient for test modules and completely unusable for checking library directly like pylint. mypy also requires a config setting for enabling namespace package. I'm unaware of any dynamic tool that works with namespace packages without config setting.

We could do a similar thing of users specify namespace package names in config.

The sys.path solution will be 20ish lines of code (mostly adding a new config option). Based on other libraries difficulties dealing with namespace package, I don't think a solution without config setting is possible without being fairly complex. The only library I know that works mostly fine on namespace packages without config setting is pyright which is a javascript library and reimplements all python import logic itself anyway. pyright has also had bugs related to namespace packages in recent times, but that was mainly a bug in type stubs peps did not fully consider how stub packages work with namespace packages. pyright implementation also avoids sys.path patching entirely because it reimplements import system.

I think main choices are disable_sys_path or namespace_packages config setting. The latter would ask users to give the name of each namespace_package they have. Both solutions I can implement. I prefer former setting as it's simpler code wise and easier to tell is correct but the latter should work in most cases too. If we require no config changes at all I think bug will just stay open for years. The "correct" solution without config changes would to re-implement python import system. That includes some weird cases like explicit namespace packages using pkgutils (or maybe you ignore those cases and let them fail).

@Pierre-Sassoulas
Copy link
Member

Ok, that make sense. Maybe we could also check if setup.py or setup.cfg contains find_namespace_package or not, but that would not work all the time.

@cdce8p
Copy link
Member

cdce8p commented Oct 30, 2021

Just a small warning: I've worked on the methods you mentioned above. Getting it right in all previous cases wasn't easy, especially testing it. We had to deal with multiple regressions before we got it to where it is now.

I would suggest to keep any changes as minimal as possible. Ideally with no impact for the average user.

@hmc-cs-mdrissi
Copy link
Contributor Author

I've opened a small pr for disable_sys_path approach. It ended being ~10 lines of functional changes + adding config option. The option defaults to false so for average user it should have no impact.

@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 2.14.0, 2.15.0 May 9, 2022
@i386x
Copy link

i386x commented Jul 5, 2022

I think this is a great suggestion. The logic behind _patch_sys_path is tricky. Today, I run into issue while linting namespace package containing module with the same name as a system module. In a greater detail, I have the following directory structure:

.
├── setup.cfg
├── setup.py
├── src
│   └── myutils
│       └── yaml
│           ├── __init__.py
│           ├── utils.py
│           └── version.py
├── tests
│   └── unit
│       ├── __init__.py
│       └── test_utils.py
└── tox.ini

src/myutils/yaml/utils.py contain lines:

import yaml


class MyConstructor(yaml.constructor.SafeConstructor):
    pass

I run pylint via tox, my environment specification for running pylint in tox.ini looks like this:

[testenv:pylint]
basepython = python3
skipsdist = True
skip_install = True
setenv =
    PYTHONPATH=src
deps =
    pylint
commands =
    pylint setup.py src/myutils/yaml tests/unit

tox -e pylint gives me an error:

E1101: Module 'yaml' has no 'constructor' member (no-member)

which is produced when pylint passes "src/myutils/yaml" to _patch_sys_patch and as a consequence Python interpreter sees src/myutils/yaml as a module named yaml which override the system module yaml since src/myutils/path is prepended to sys.path.

My question is: Do we really need sys.path patching? Is there any obstacles in specifying additional module locations only via PYTHONPATH and disabling sys.path patching entirely? Seeing that #6405 has been reverted, guessing whether the package is a namespace package or not is not trivial.

@Pierre-Sassoulas Pierre-Sassoulas added Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning namespace-package labels Jul 6, 2022
@jacobtylerwalls
Copy link
Member

Thanks for adding extra details @i386x.

Today, I run into issue while linting namespace package containing module with the same name as a system module.

Does #7114 help? You might also test with and without the main branch of astroid which has changes for namespace pacakges. We would be very grateful for as much community testing as possible.

@i386x
Copy link

i386x commented Jul 6, 2022

@jacobtylerwalls Thanks for the tips. Your PR #7114 works when running pylint as pylint src, but with pylint src/myutils/yaml it gave the same results. However I suppose that pylint src should be the right way of pylint invocation for name space packages.

@i386x
Copy link

i386x commented Jul 29, 2022

@jacobtylerwalls Recently I add some code that prints variables' values on top of your branch and noticed that pylint setup.yml src tests/unit actually skips src since expand_modules does not include src to its results. Actually, file_from_modpath(['src'], ['.', ...]) returns None, which forces the for-loop to continue with the next element.

I am afraid that we need to choose a different approach to solve the problem.

@i386x
Copy link

i386x commented Jul 30, 2022

@jacobtylerwalls Update: I returned back to upstream pylint and the problems vanished. I just realized that namespace packages work perfectly with pylint, at least for my use case, and also I run pylint wrong the whole time. What is needed is simply to

$ PYTHONPATH=src pylint setup.py src test/unit

and the problems with name-clashes are no longer here. I inspected the diffs between pylint 2.14.4 and 2.14.5 and astroid 2.11.6 and 2.11.7 and I observed no changes in module-finding and module-importing code. On the other hand, I just realized my mistake after adding print() commands on several places at pylint's source code and investigating the output. Sorry for the noise.

@DanielNoord
Copy link
Collaborator

@jacobtylerwalls I have lost track of this issue. Should we consider this fixed? Or do we still need to allow disabling sys.path patching?

@jacobtylerwalls
Copy link
Member

I think it's a meta issue. It's saying #3944 is unlikely to be solved, so let's just create an option for users. Let's leave it open until after we get feedback on 2.15. If 2.15 goes smoothly and #3944 is the main pain point left, we can reevaluate if this makes sense.

@DanielNoord
Copy link
Collaborator

@hmc-cs-mdrissi While working on #7339 I completely forgot to check what the original issue was here. Turns out it is actually a duplicate of the original issue 😅

While we want to keep this issue around to keep discussing whether we should support this option, if you want you could test out #7357 which should fix the issue you describe in the first paragraph 😄

@hmc-cs-mdrissi
Copy link
Contributor Author

hmc-cs-mdrissi commented Dec 8, 2022

edit: I've made a toy repo that you can use to test with a structue similar to my issue. https://github.com/hmc-cs-mdrissi/tricky_namespace_package_lint. Just run setup.sh and then run pylint lib1 lib2 --disable=R,C,W and you will get import related errors even though code does work and pyright happily checks it with no errors.

@DanielNoord Very late, but just saw the ping. I ran your pr's forked pylint on my whole repo and got many import errors so doesn't work for my use case. Tons of error messages like,

************* Module training.script.skan.skan_attributer_dataset_generator
barista/training/script/skan/skan_attributer_dataset_generator.py:8:0: E0401: Unable to import 'training.common.datetime_constants' (import-error)
barista/training/script/skan/skan_attributer_dataset_generator.py:9:0: E0401: Unable to import 'training.common.util' (import-error)
barista/training/script/skan/skan_attributer_dataset_generator.py:10:0: E0401: Unable to import 'training.script.skan.query_templates' (import-error)
barista/training/script/skan/skan_attributer_dataset_generator.py:17:0: E0401: Unable to import 'training.script.skan.settings' (import-error)
barista/training/script/skan/skan_attributer_dataset_generator.py:18:0: E0401: Unable to import 'training.script.skan.utils' (import-error)

The repo structure is roughly

lib1/training/__init__.py
lib1/training/foo.py
lib1/other/__init__.py
lib2/training/__init__.py
lib2/training/foo.py
lib2/evaluation/__init__.py

where lib1/training and lib1/other are python packages, but lib1 itself is not a package, while lib2 is a namespace python package and I still see a lot of confusion between lib1/training vs lib2/training. The weird structure comes from lib1 and lib2 originally coming from different repos and merging into one repo, but this is valid python structuring and interpreter works normally with them. You should be able to simulate the error by adding any toy lines in foo.py files that have imports like lib2.training (for lib2/training side) or just training. (for lib1 side).

I still lean disabling path patching is simplest fix. I think "real" fix is possible as pyright/mypy somehow gets it right but pylint is in good company as pytest has similar path patching that causes me same issue. Trying to patch path and get namespace packages correct just feels too tricky. mypy still requires a setting explicit-package-bases to get it correct. pyright is only tool I know that introspects imports and gets it fully correct with no configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Maintenance Discussion or action around maintaining pylint or the dev workflow namespace-package Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning
Projects
None yet
6 participants