-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo Autoimport #464
Redo Autoimport #464
Conversation
|
||
def generate_modules_cache( | ||
self, | ||
modules_to_find: List[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename modules_to_find
back to modules
. This is part of the public API of the AutoImport class, so unfortunately it can't be changed.
Also, we need to keep the task_handle argument, as it's used by editor integrations to display progress report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Done
- I added a placeholder, but I'll figure out how to add a full one later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2, I basically suffer from a simple issue: a job set relies on knowing the number of jobs ahead of time, while my code doesn't. So I just set the initial count to 0 and manually incremented the count as it discovered modules and it looks mostly fine. Ideally, there'd be a better way (and shared interfaces for NullJobSets/TaskHandles and their non-Null counterparts). However, I'm not too familiar on how to test this or anything.
self.underlined = underlined | ||
self.names = project.data_files.read_data("globalnames") | ||
if self.names is None: | ||
self.names = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How difficult would it be to retain self.names
here as a property that generates the names from the database?
python-mode (which integrates rope into vim) uses AutoImport.names
to decide when to regenerate cache. If it's empty, then it calls generate_modules_cache()
.
I'm rather iffy with how it does that, I'd rather they should update it when they next update their builtin rope, but unfortunately it currently has to do that since there's no public API detect the need to regenerate cache. But the fact that one integration does this means that there may be other integrations that does something similar as well.
The alternative to removing .names
is that at the very least python-mode will need to be updated to no longer rely on this attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It maintains a separate database of indexed packages, which it uses to determine what needs to be updated currently in around 0.008 sec. In the future I want to add modified times to this. That way generating the cache works with existing caches (although the resource side does probably not work as well)/
I can make a function that checks the length of modules that need to be updated, but then we're doubling that computation across each of them.
if package in banned or (package.startswith("_") and not underlined): | ||
return # Builtins is redundant since you don't have to import it. | ||
try: | ||
module = import_module(str(package)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think compiled-only packages probably should not be indexed by default. import
-ing a compiled python module involves executing executable code from the imported module's, which means it may include running potentially harmful or malicious code.
Unless we can statically analyze the compiled modules somehow, this should not be default behavior.
At the very least, I'd suggest removing this code from get_names()
for now or restricting it to standard library modules and re-think on how to handle compiled modules as a separate ticket/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I limited importing compiled modules to Project, Standard, and Builtin modules which isn't 100% accurate. I can remove Project or Standard, but builtin has to remain for modules like os and sys.
I agree that not importing modules is a good idea, but we do need it unless we want to bundle a python decompiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove Project as well. Just because it's in your own project directory does not necessarily mean that it's trusted code. People may be checking out untrusted codebase so they can review whether or not it is malicious.
In the long run, I think we might make this a config option. If people want to enable this option in a codebase that they know they can trust and understand the implications and risks, that's on them. But the defaults should be as conservatively safe as possible.
Keeping Standard library and Builtin modules, I think are fine. They are a much less likely avenue for these kind of attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It doesn't really make sense to keep Project since it'd require the user to have a compiled .so module in their local project without the python code. In the future, we could add a whitelist or something for these edge cases.
Thanks for this feedback. I incorporated the ideas of using namedtuples and generators, which made the code much cleaner. I'll look into CREATE INDEX later. |
rope/contrib/autoimport/parse.py
Outdated
"""Get all the names from a given file using ast.""" | ||
with open(module, mode="rb") as file: | ||
try: | ||
root_node = ast.parse(file.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're already using pathlib
, we might as well use module.read_bytes()
here instead of open()
-ing explicitly. read_bytes()
would not require context manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if package in banned or (package.startswith("_") and not underlined): | ||
return # Builtins is redundant since you don't have to import it. | ||
try: | ||
module = import_module(str(package)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove Project as well. Just because it's in your own project directory does not necessarily mean that it's trusted code. People may be checking out untrusted codebase so they can review whether or not it is malicious.
In the long run, I think we might make this a config option. If people want to enable this option in a codebase that they know they can trust and understand the implications and risks, that's on them. But the defaults should be as conservatively safe as possible.
Keeping Standard library and Builtin modules, I think are fine. They are a much less likely avenue for these kind of attacks.
return Source.PROJECT | ||
if package.as_posix().__contains__("site-packages"): | ||
return Source.SITE_PACKAGE | ||
if package.as_posix().startswith(sys.prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reasons for using package.as_posix()
instead of str(package)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Changed it to str
I've been test-driving this branch for the last couple weeks at work and I think it's been pretty great, really great work @bageljrkhanofemus. I did found a number of issues with the database file being locked out when using memory=False, though I hadn't ruled out whether this might just be an issue with my local setup, my hacked-together patches to python-mode, or if it's something to do in the autoimport code itself. In the interest of merging this into the codebase while allowing easier transitions for IDE/editor integrators, I am going to merge this branch to master but keep the pickle-based implementation as the default for now; instead the new sqlite-based implementation is provided as an alternate experimental implementation to make it easier to start testing and fixing any issues in IDE/editor integrations. Once editor plugin authors have the chance to catch up with testing their integrations and in making any necessary fixes, the intent is to switch the default implementation to the sqlite implementation. Once this branch is merged, future changes should be made in a separate PR, which will also make them easier to review as well. @all-contributors add @bageljrkhanofemus for code contribution. |
I've put up a pull request to add @bageljrkhanofemus! 🎉 |
Thank you for all your help!
|
Description
Rewrite of the Autoimport code.
The goal is to create an autoimport module capable of being used in pylsp.
Upgrades:
Sidegrades (subjective):
Downgrades:
a. There's less control of how to handle underlined imports
b. some changes have to be made regarding how modules are named: "module" wouldn't work but "project.module" will unless added to the path.Fixedc. submodules taken out of public API (minor change). Resulting test removed
d. internal modules must use the resource system. Resulting test removed.
Issues with the resource system
These largely existed before
TODO (will mark as ready when done):