Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type inference for @rules #17947

Merged
merged 13 commits into from
Feb 3, 2023
Merged

Conversation

kaos
Copy link
Member

@kaos kaos commented Jan 8, 2023

caveat: This does add a lot of extra overhead during rule parsing affecting the Pants scheduler startup time (~12s)--which we need to address. Either by caching the rule-parsing step, or refactoring over to Rust, or something else..
Edit: This was based on a time .. hunch but turned out to be something else, unrelated to this change.. see benchmarks at the end of this description.

This opens up for calling rule helpers more dynamically so we can have rule-logic closer to the classes and not spread out in all the rules, and also Gets without having to spell out the input types as long as the type of the input arg is known (which it does more now than before).

I think the latter can help reducing the number of imports we need in all rule files as well.

As an example, imagine this simple case using current praxis:

    if python_infer_subsystem.local_disambiguation:
        source_root = await Get(
            SourceRoot, SourceRootRequest, SourceRootRequest.for_address(request.field_set.address)
        )
        locality = source_root.path

Address could hide the rule logic as an implementation details instead, so we could do:

    if python_infer_subsystem.local_disambiguation:
        locality = await request.field_set.address.get_source_root()

and the implementation in get_source_root() something like:

    # No need for the `@rule_helper` decorator for this to work..
    async def get_source_root(self) -> str:
        # Note here that we **don't** need to provide the type of the request as the second arg,
        # but can use the value returned by `.for_address()` directly, as the type will be inferred now.
        source_root = await Get(
            SourceRoot, SourceRootRequest.for_address(self)
        )
        return source_root.path

Doesn't maybe look like much for this example, but I can see this opens up a lot of flexibility and I think it'll lower the learning curve for new users as well.

Benchmarks:

# On `main`
╰─❯ hyperfine --warmup=2 --runs=10 './pants version'
Benchmark 1: ./pants version
  Time (mean ± σ):      1.284 s ±  0.025 s    [User: 0.477 s, System: 0.210 s]
  Range (min … max):    1.255 s …  1.322 s    10 runs
╰─❯ hyperfine --warmup=2 --runs=10 './pants --no-pantsd version'
Benchmark 1: ./pants --no-pantsd version
  Time (mean ± σ):      6.467 s ±  0.291 s    [User: 3.730 s, System: 0.713 s]
  Range (min … max):    6.220 s …  7.079 s    10 runs

# On this PR
╰─❯ hyperfine --warmup=2 --runs=10 './pants version'
Benchmark 1: ./pants version
  Time (mean ± σ):      1.286 s ±  0.023 s    [User: 0.483 s, System: 0.210 s]
  Range (min … max):    1.261 s …  1.328 s    10 runs
╰─❯ hyperfine --warmup=2 --runs=10 './pants --no-pantsd version'
Benchmark 1: ./pants --no-pantsd version
  Time (mean ± σ):      6.348 s ±  0.330 s    [User: 3.750 s, System: 0.698 s]
  Range (min … max):    6.052 s …  7.089 s    10 runs

Side by side values:

Branch main PR
./pants version 1.284 s ± 0.025 s 1.286 s ± 0.023 s
./pants --no-pantsd version 6.467 s ± 0.291 s 6.348 s ± 0.330 s
-------- -------- ------
Check all async and awaitable returning methods
./pants version 1.317 s ± 0.034 s
./pants --no-pantsd version 7.365 s ± 0.855 s

So this seems to have negligible effect on performance, so far (except when parsing more methods not limiting to those decorated with @rule_helper).

Edit:

With the changes making @rule_helper obsolete degrades slightly:

╰─❯ hyperfine --warmup=2 --runs=10 './pants version'
Benchmark 1: ./pants version
  Time (mean ± σ):      1.317 s ±  0.034 s    [User: 0.497 s, System: 0.212 s]
  Range (min … max):    1.283 s …  1.389 s    10 runs
╰─❯ hyperfine --warmup=2 --runs=10 './pants --no-pantsd version'
Benchmark 1: ./pants --no-pantsd version
  Time (mean ± σ):      7.365 s ±  0.855 s    [User: 4.011 s, System: 0.771 s]
  Range (min … max):    6.339 s …  8.997 s    10 runs

@kaos
Copy link
Member Author

kaos commented Jan 8, 2023

If this was fast enough, I don't even think we'd need the @rule_helper annotation.. ;)

Edit: which I have as POC in this PR now.. :)

@kaos kaos marked this pull request as ready for review January 8, 2023 17:49
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very, very cool! And I think that it would probably resolve #16763 in a more general manner.

The idea behind #16763 was to allow for stronger @union-centric APIs, including allowing Fields to have computable defaults.

@benjyw and @thejcannon have the most experience with the AST APIs, so I'll defer to them.

@stuhood stuhood requested a review from benjyw January 18, 2023 17:53
@thejcannon
Copy link
Member

I'm not groking whats going on from the PR description. Is it still up-to-date?

@kaos
Copy link
Member Author

kaos commented Jan 19, 2023

I'm not groking whats going on from the PR description. Is it still up-to-date?

I'm terrible at making up examples. Look at the test case, it shows some of the nifty-ness in this.

Edit: I've now updated the PR description.

The out-of-date part in the PR description is that we could make away with the rule helper, and instead have just:

    async def _get_source_root(self) -> SourceRoot:
        return await Get(
            SourceRoot, SourceRootRequest.for_address(self)
    )

But it's more than just getting rid of rule_helper decorators. Consider this contrived example:

@dataclass
class MyData:
  def get_request(self) -> SourceRootRequest:
    ...

def utility() -> MyData:
  return MyData(...)

@rule 
async def my_rule() -> ...:
  my_data = utility() # or where ever my_data comes from..
  source_root = await Get(SourceRoot, my_data.get_request())
  ...

the example doesn't do anything sensible, but consider the possibilities that opens up with this syntax :)

@thejcannon
Copy link
Member

So this allows anything to be a rule_helper (in a way) and also makes it so we don't need the "middle" type in a 3-arg Get?

@benjyw
Copy link
Contributor

benjyw commented Jan 20, 2023

The alleged performance hit would have been a non-starter, so I'm glad that's not actually real...

@kaos
Copy link
Member Author

kaos commented Jan 29, 2023

bumping this.. :)

Copy link
Member

@thejcannon thejcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too closely at the code. If it works it works IMO.

If/when perf for rule-graph collection is a problem, we can always shift the code into Rust and use RustPython or something similar.

Are we deprecating rule_helper in this PR or another one?

src/python/pants/engine/internals/rule_visitor.py Outdated Show resolved Hide resolved
@kaos
Copy link
Member Author

kaos commented Jan 30, 2023

Are we deprecating rule_helper in this PR or another one?

I've not done that in this PR. I think we can use a dedicated PR for the deprecation, so it has it's own changelog entry.

@kaos
Copy link
Member Author

kaos commented Jan 30, 2023

@benjy do you have any other concerns, or are we good to go here..?

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one thought is that if we allow unannotated async functions, what is to prevent someone trying to await one in the normal asyncio way. The annotation was a flag for the reader that this is something special and not a regular asyncio thing.

src/python/pants/engine/internals/rule_visitor.py Outdated Show resolved Hide resolved
src/python/pants/util/backport.py Show resolved Hide resolved
@kaos
Copy link
Member Author

kaos commented Feb 3, 2023

My one thought is that if we allow unannotated async functions, what is to prevent someone trying to await one in the normal asyncio way. The annotation was a flag for the reader that this is something special and not a regular asyncio thing.

I'm not sure the rule annotations did much to prevent this either to begin with. A bigger question for me is how you'd end up in a world where you have Pants rules along with code using asyncio (or other async library) in the first place?

It is a gotcha, but not a big one (as far as I can guess), and it will bite you right off not sneak up as some subtle issue down the line.

If we want to safe guard and have hand rails etc, there's things we could do to make it a safer environment to operate in--like adding lists of modules/namespaces to consider when following calls from rules in order to shield off unwanted parts of a code base. As an example, perhaps not a particular good one at that. 🤷🏽

@benjyw
Copy link
Contributor

benjyw commented Feb 3, 2023

Yeah, I'm not super concerned, it's just weird to see a "naked" async function in the Pants codebase...

@kaos kaos enabled auto-merge (squash) February 3, 2023 19:06
@kaos kaos merged commit b587181 into pantsbuild:main Feb 3, 2023
@kaos kaos deleted the kaos/rules-on-steroids branch February 3, 2023 20:34
kaos added a commit that referenced this pull request Feb 23, 2023
Following #17947 the `rule_helper` decorator is no longer needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants