Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-infer]: allow ignoring unowned imports #18094

Conversation

AlexTereshenkov
Copy link
Member

@AlexTereshenkov AlexTereshenkov commented Jan 26, 2023

This PR adds a new option to the [python-infer] section.

This is to let users provide unowned imports that should be ignored.

If there are any unowned import statements and adding the # pants: no-infer-dep
to the lines of the import is impractical, you can instead provide a list of imports
that Pants should ignore. You can declare a specific import or a path to a package
if you would like any of the package imports to be ignored.

For example, you could ignore all the following imports of the code

import src.generated.app
from src.generated.app import load
from src.generated.app import start
from src.generated.client import connect

by setting ignored-unowned-imports=["src.generated.app", "src.generated.client.connect"].


By default, this has no impact on existing users of Pants as there are no imports to be ignored, so behavior will stay the same. Only if a user has provided a list of imports that will be ignored, then it will impact the dependency reference and what's reported as unowned imports.


This should be helpful for the following use cases:

  • when there is a single problematic import that is known to cause problems with inference but it's being imported in a hundred of places and adding # pants: no-infer-dep is not feasible; before this PR, the current implementation forces the users who don't know (or don't need to care) about Pants to add some comments, potentially extending any existing comments they may have set on the import statements to satisfy other tools (e.g. pylint)
  • when there are lots of import statements that consume generated Python code (a package or a module); e.g. a file is not under version control (and is Git ignored), is generated locally in the dev environment and in CI (using ad hoc tooling, not Pants) and Pants is not able to see it during dependency inference. Again, having dozens of # pants: no-infer-dep comments is less appealing when one can set it just once in the configuration file.

Copy link
Contributor

@lilatomic lilatomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, for whatever that's worth. I think this will also be useful for serverless stuff where some modules are provided by the runtime.

Do you think we should hook this into the debug information? The dependency inference script just drops ignored imports (instead of flagging them as ignored). But also, there's a strong locality of the pragma and the import, so it would be clearer why Pants wasn't complaining about not finding the import. Although, I'm not sure that the debug goal is widely used, and we could probably split this addition into a separate MR.

@@ -206,6 +206,33 @@ def _collect_imports_info(
)


def _filter_unowned_imports(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name should include why we're filtering, maybe _remove_ignored_imports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, a lot better, done. :)


run_dep_inference("warning")
assert len(caplog.records) == 1
assert (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth extracting a helper for asserting when owners are not found? There are only 2 uses, but it would be clearer to see something like assert_owners_not_found_error(target="src/python/cheesey.py", not_found=[...])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, great idea, done :)

@@ -166,3 +166,27 @@ class PythonInferSubsystem(Subsystem):
"""
),
)

ignore_unowned_imports = StrListOption(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bikeshed, feel free to disregard:
this name reads to me like a boolean flag, doing what unowned_dependency_behavior would do. I can't really find a good alternative, though. Maybe ignored_unowned_imports, ignorable_unowned_imports, or maybe just unowned_imports? or maybe more like the pragma with non_inferable_dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's actually rather helpful! I like ignored_unowned_imports a lot and have changed the option name. Thank you!

@AlexTereshenkov AlexTereshenkov force-pushed the backend/python-infer-allow-ignoring-unowned-imports branch from fd69af6 to dc0076d Compare January 28, 2023 15:42
@kaos
Copy link
Member

kaos commented Jan 28, 2023

I see you've tweaked the namings already, but I have yet another angle.

As this spins off the fact that using # pants: no-infer-dep for a particular module name everywhere it is imported is impractical, I would like the option to reflect that; something like:

[python-infer]
no_infer_dep = [
  "src.generated.app",
  "src.generated.client.connect"
]

Or the slightly longer never_infer_dependency or global_no_infer_dep etc.. as long as it has close resemblance to the equivalent option of the inline code comment no-infer-dep, as it does the same, only globally rather than per import.

@AlexTereshenkov
Copy link
Member Author

AlexTereshenkov commented Jan 29, 2023

I see you've tweaked the namings already, but I have yet another angle.

As this spins off the fact that using # pants: no-infer-dep for a particular module name everywhere it is imported is impractical, I would like the option to reflect that; something like:

[python-infer]
no_infer_dep = [
  "src.generated.app",
  "src.generated.client.connect"
]

Or the slightly longer never_infer_dependency or global_no_infer_dep etc.. as long as it has close resemblance to the equivalent option of the inline code comment no-infer-dep, as it does the same, only globally rather than per import.

Thank you @kaos very much for spending some time thinking about it. I see what you mean, this makes perfect sense. I guess I was relating to the unowned_dependency_behavior option which mentions unowned so I was trying to relate to that -- letting users to provide a list that is known to not be owned.

However, I think we have to generalize as we can't be sure about the reason why one would want to ignore an import - is it because they know it's not going to be owned (e.g. a generated module at runtime) or is it because it has an owner, but they just don't want to make a module depend on that another module they infer?

So the # pants: no-infer-dep serves both purposes and is generic in the eyes of Pants. I don't mind changing the option name to something else as it would still do what I (and likely other users) need, if you think that no_infer_deps would be a better option. I'd trust you on that one!

Do you think a plural form i.e. no_infer_deps would be a suitable one as we provide a list of dependencies to ignore (in contrast to no_infer_dep which is assigned to an individual import)?

@kaos
Copy link
Member

kaos commented Jan 30, 2023

Let's see if we can get some more opinions on the matter :)

@thejcannon
Copy link
Member

IIUC no-infer-dep unconditionally tells Pants to ignore the line. This code has Pants trying to infer it, but not erroring if it can't.

So either the name no_infer_dep would be a misnomer, or the behavior would have to change.

Overall the implementation looks good, we just should decide on naming/behavior.

@kaos
Copy link
Member

kaos commented Jan 30, 2023

IIUC no-infer-dep unconditionally tells Pants to ignore the line. This code has Pants trying to infer it, but not erroring if it can't.

So either the name no_infer_dep would be a misnomer, or the behavior would have to change.

Overall the implementation looks good, we just should decide on naming/behavior.

Good catch Josh on the nuance. I'm not sure I agree it would be a misnomer though. We don't say "do not try to infer a dep", we're saying "do not infer a dep". Whether Pants goes looking for a matching dep or not is beside the point. I see it merely as a missed opportunity to optimize in case there's a "no_infer_dep" in effect for an import being followed up.

@AlexTereshenkov
Copy link
Member Author

@lilatomic sorry, what do you mean by debug goal? I don't think there's one?

Although, I'm not sure that the debug goal is widely used, and we could probably split this addition into a separate MR.

Or do you refer to having Pants running with -ldebug flag and showing logger.debug(ignored_unowned_imports)?

@AlexTereshenkov
Copy link
Member Author

AlexTereshenkov commented Feb 1, 2023

@kaos @thejcannon thanks for the brainstorming efforts! My understanding is:

  • with # pants: no-infer-dep we ask Pants to ignore individual import lines, so it will ignore those early
  • with the ignored_unowned_imports option we tell Pants "I know that these are the imports that are unowned, so please do not yell at me that no one owns them" :) I don't want to bring this in early, e.g. here as I think

a) there's a performance hit checking for every import whether it's in the list of ignored paths (and we want to make sure that foo.bar.baz is ignored if user asked to ignore foo.bar, so this would require additional string massage)
b) we may want to treat imports that are set to be ignored with a pragma comment and imports that are set to be ignored because they are unowned differently
c) most likely users will have this list of ignored_unowned_imports empty, so I'd like to defer filtering only until the last moment (like it's done now)

So I think what I have now makes sense, however, happy to be told otherwise :) also happy to change the name from ignored_unowned_imports to anything else really :)

@kaos
Copy link
Member

kaos commented Feb 1, 2023

It comes down to matter of taste I think, how you associate and think about this option. as presented and makes sense to me, it relates closely to the no-infer-dep logic (even when applied late as is done here, which makes total sense imo). But I'm fine with going with the current name too if that turns out to be the preferred name for it :)

@jriddy
Copy link
Contributor

jriddy commented Feb 2, 2023

The current name makes sense to me, but this literally came up in the context of us trying to enable unowned_depedency_behavior = error on our codebase. So of course it does. But @thejcannon 's point/nuance stands I think as well. I think the only reason we're on this particular naming impasse is because @AlexTereshenkov described the feature in terms of "no-infer-dep."

@jriddy jriddy merged commit 5ace6f1 into pantsbuild:main Feb 2, 2023
@AlexTereshenkov AlexTereshenkov deleted the backend/python-infer-allow-ignoring-unowned-imports branch February 2, 2023 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants