Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to infer assets from strings for Python #14049

Merged
merged 26 commits into from
Mar 14, 2022

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Jan 3, 2022

High level

Although less structured, it shouldn't be difficult to infer assets from strings and filter the noise to only what is valid.

The use-case should be evident, however here's a few intentional design decisions:

  • Ability to infer both resources (assumed to be relative to the module for now) and files (relative to the build root)
  • Scope the possible strings to a subset of what really is possible. On Unix, basically anything is a valid filename, but that could be needless explosive.
  • Use strings (as opposed to just parsing pkgutil.get_data) because some people use custom resource types. E.g. Resource("data/foo.txt"). We can add pkgutil.get_data support later.

Example:

Given:

<root>/
├─ configs/
│  ├─ prod.txt (a file)
├─ src/
│  ├─ app1/
│  │  ├─ data/
│  │  │  ├─ db.json (a resource)
│  │  ├─ app.py

(With all the right BUILD file plumbing)
Then src/app1/app.py containing code like open("configs/prod.txt") as envvars or Database("data/db.json") would have the relevant file/resource dependency inferred by Pants! 🎉

@thejcannon
Copy link
Member Author

Can someone re-kick the tests?

@stuhood
Copy link
Member

stuhood commented Jan 5, 2022

Can someone re-kick the tests?

Will need to rebase I think: the fix for this landed on main as 5c10169.

@thejcannon
Copy link
Member Author

I suppose rebasing is an easy re-kick. 🙌

@thejcannon thejcannon force-pushed the infer-resources branch 2 times, most recently from f124689 to 84e1791 Compare January 7, 2022 02:06
Comment on lines -4 to -11
python_sources(
overrides={
"parse_python_imports.py": {
# This Python script is loaded as a resource, see parse_python_imports.py for more info.
"dependencies": ["./scripts:import_parser"]
}
}
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might ask where this has gone. The answer: it's being inferred 🪄

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Haven't worked through it all in details, though.. :)

@thejcannon thejcannon force-pushed the infer-resources branch 2 times, most recently from 8701b2c to 8bdeb7a Compare January 24, 2022 20:27
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Iirc, this would not handle this: importlib.resources.read_text("path.to.module", "foo.txt"), right? Even if not, this is still great.

@thejcannon
Copy link
Member Author

Neat! Iirc, this would not handle this: importlib.resources.read_text("path.to.module", "foo.txt"), right? Even if not, this is still great.

It does not. I didn't want to risk adding too many things to this PR, and import inference doesn't support importlib.import_module(). So I figured we can add support for both of these later.

@Eric-Arellano
Copy link
Contributor

So I figured we can add support for both of these later.

Makes sense. Only thing we need to be careful about is how to handle backwards-compatibility. If we change the behavior of this new option to become more powerful / infer more than before, that is somewhat a breaking change. I think it's acceptable, but we should probably hedge in the help message by saying

Note that we hope to continue improving the inference of resources/files, such as eventually inferring importlib.resources statements. These improvements will happen without the normal deprecation policy.

@thejcannon
Copy link
Member Author

So I figured we can add support for both of these later.

Makes sense. Only thing we need to be careful about is how to handle backwards-compatibility. If we change the behavior of this new option to become more powerful / infer more than before, that is somewhat a breaking change. I think it's acceptable, but we should probably hedge in the help message by saying

Note that we hope to continue improving the inference of resources/files, such as eventually inferring importlib.resources statements. These improvements will happen without the normal deprecation policy.

That's likely to be true of imports as well, right? With importlib.import_module?

@Eric-Arellano
Copy link
Contributor

That's likely to be true of imports as well, right? With importlib.import_module?

Yes. I think we weren't considering how imports could be improved because we didn't consider importlib.import_module.

I'm pretty sure it doesn't break our deprecation policy, only a possible gotcha. Maybe the second sentence in my suggested language isn't necessary, but probably good to have the first one.

@thejcannon
Copy link
Member Author

Actually this had made me realize, we might want a union rule for this, and split out the implementation into it's own rule. That also means maybe splitting the resource parsing out of the helper file into its own file.
FWIW this method is called infer_python_dependencies_via_imports, which is incorrect for this use case.

Yeah OK I think this is likely the route to take. I'll most likely post a new PR for that since the changes will be somewhat drastic. 🤞 hopefully there isn't too much duplication between this and import parsing.

@thejcannon thejcannon closed this Jan 27, 2022
@Eric-Arellano
Copy link
Contributor

I don't think I agree with splitting out the parsing into two separate Processes — there is overhead to launching more Processes. Notably, they will both have the exact same input_digest, so there's no caching benefit to this.

Instead, it's important that the parser do the minimum work necessary. If you don't care about resources, then don't try to parse them. It looks like you already made that optimization.

we might want a union rule for this

That sounds like it might be more complex. What would be the benefit?

@thejcannon
Copy link
Member Author

It's more about simplification and separation of concerns. The infer_python_dependencies_via_imports (which needs to be renamed, oops) already does a lot for python imports, and now we're adding more into it.

Perhaps splitting it into helpers (not necessarily needing #14238 but that'd be even nicer) would qualm some doubts.

For implementations sake, you might be right that bundling them together is still simpler.

@thejcannon thejcannon reopened this Jan 27, 2022
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
@thejcannon
Copy link
Member Author

OK I did another once-over on the PR. Should be good to re-review. Let me know if it needs to be split.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Eric-Arellano
Eric-Arellano previously approved these changes Mar 7, 2022
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@Eric-Arellano Eric-Arellano requested a review from stuhood March 7, 2022 21:34
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ability to infer both resources (assumed to be relative to the module, unless pkgutil.get_data is used) and files

Is the PR description still up-to-date? I'm not following the code for this part.

Generally, it seems that this only infers in the format path/to/f.txt, not path.to.my_resource, right?

@Eric-Arellano Eric-Arellano dismissed their stale review March 7, 2022 21:47

module mapping as a rule

@thejcannon
Copy link
Member Author

Ability to infer both resources (assumed to be relative to the module, unless pkgutil.get_data is used) and files

Is the PR description still up-to-date? I'm not following the code for this part.

Generally, it seems that this only infers in the format path/to/f.txt, not path.to.my_resource, right?

Fixed the wording

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@@ -31,21 +32,37 @@ class ParsedPythonImports(FrozenDict[str, ParsedPythonImportInfo]):
"""All the discovered imports from a Python source file mapped to the relevant info."""


class ParsedPythonAssets(DeduplicatedCollection[str]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ParsedPythonAssetPaths? I continue to confuse myself with what this is, I think because of the old setup of it being a tuple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. If/when we support pkgutil.get_data we'll be changing this name 😅 but for now that works!

@@ -89,6 +103,24 @@ class PythonInferSubsystem(Subsystem):
"treated as a potential dependency if this option is set to 2 but not if set to 3."
),
)
assets = BoolOption(
"--assets",
default=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you didn't have to consider our deprecation policy, should this default to True? We don't have string imports default to True because of false positives. Not sure what the right behavior is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. OK. I can get very behind making True the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but note that we do need to consider our deprecation policy, I think. Possibly. Unclear to me. I was only asking what the aspiration is.

In the interest of landing this for 2.10, how about using False for now, having a conversation in Slack or dedicated PR on Monday, and then possibly changing it in time for 2.10? That'd be good anyways to have two entries in changelog:

# new feature
* Add `[python-infer].assets` to infer resources and files from strings

# user-api changes
* Enable `[python-infer].assets` by default

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
@thejcannon
Copy link
Member Author

Updated with:

  • Asset inference on by default
  • Reverse mapping of assets by path in a rule
  • Added file inference. That was the intent from the start, but for whatever reason I didn't test it (and the code wasn't good)

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating on this! Looks good once the default is reverted to not break our deprecation policy.

This reverts commit d588e39.

[ci skip-rust]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@thejcannon thejcannon merged commit b73328f into pantsbuild:main Mar 14, 2022
@thejcannon thejcannon deleted the infer-resources branch March 14, 2022 18:43
@benjyw
Copy link
Contributor

benjyw commented Mar 14, 2022

Woohoo!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants