-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some dependencies-like fields not showing up with project introspection #10923
Fix some dependencies-like fields not showing up with project introspection #10923
Conversation
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, but I have concerns about the term "special cased", which I try to describe in comments -- I think "indirect" might be more appropriate.
More importantly (this part is blocking, unless I'm missing something), since it seems that most users will not want to retrieve these dependencies, it seems unnecessary to add this complexity into the normal DependenciesRequest
class (especially considering whether we might want to add more types of "special" dependencies later). Rather, since the use cases for these two different types of dependencies requests do not seem to overlap, I would think we'd want to consider breaking it out into a separate dataclass.
Let me know what you think about those two ideas.
# `files` and `packages` fields, then we possibly include those too. We don't want to always | ||
# include those dependencies because they should often be excluded from the result due to | ||
# being handled elsewhere in the calling code. | ||
special_cased: Tuple[Address, ...] = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this would be incredibly difficult to move to a separate rule for DependenciesWithIndirectDeps
, which would append the special_cased
to the result as is done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we added a new rule, we would probably want its result type to be Addresses
to avoid having to invent more wrapper dataclasse. But, that will result in graph ambiguity to use await Get(Addresses, Foo)
in this rule, unless we added the new dataclass.
I don't think there's strong enough of motivation yet to factor this out and pay the cost of Yet Another Dataclass and rule. Note that we don't really expect call sites to resolve all their special deps in one single call, without their normal deps included. Instead, we expect them to resolve each distinct field separately, because each field has its own magical handling. archive
is a good example of this:
pants/src/python/pants/core/target_types.py
Lines 295 to 304 in ea542ac
package_targets, files_targets = await MultiGet( | |
Get( | |
Targets, | |
UnparsedAddressInputs(field_set.packages.value or (), owning_address=field_set.address), | |
), | |
Get( | |
Targets, | |
UnparsedAddressInputs(field_set.files.value or (), owning_address=field_set.address), | |
), | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because each field has its own magical handling
I'm not sure I understand. If this is the case, why aren't we separating the calls for Dependencies
and SpecialCasedDependencies
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there are very select places that we want all dependencies, both normal and special cased. Specifically, project introspection needs all dependencies.
It is simpler in those places to have one call await Get(Addresses, DependenciesRequest)
than a call for normal deps + call for all the special cased. In particular, this is apparent with TransitiveTargets. With this current design, we don't need to implement a separate transitive targets rule that duplicates the normal rule. It sounds like you tried implementing new distinct dataclasses yesterday and found this to be the case?
My above comment was trying to explain that in almost all other contexts, you do not want all dependencies, both normal and magical, to be in the same result. You want to separate them out. I was pointing to how that it possible with that design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you tried implementing new distinct dataclasses yesterday and found this to be the case?
It would have been extremely trivial if #8542 had been merged, which would have allowed providing a @union
base as an @rule
argument. We could have then made a TransitiveTargetsRequestBase
union, and it would have generated a version of that rule for each type. I have asked you and others to take a look at that PR many times to no avail. Would you consider looking at it again if I updated it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at #8542 and comment there
if request.include_special_cased_deps: | ||
wrapped_tgt = await Get(WrappedTarget, Address, request.field.address) | ||
# Unlike normal, we don't use `tgt.get()` because there may be >1 subclass of | ||
# SpecialCasedDependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we link to #10888 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, could we add a TODO pointing to #10924?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any TODO we need to do here, tbh. This is idiomatic Python and idiomatic usage of the Target API. We get the functionality we need in 4 lines of code, without needing to invent any new abstractions.
The comment is solely to draw attention to why we don't use tgt.get()
like normal.
Is there something you think we could improve with these 4 lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just confusing to me why we would iterate over the target's fields again here instead of making that part of the original DependenciesRequest
. I think it was a good decision to make sure each Field
has an Address
, but it feels like that's useful for when we discover new things we need to introspect about the original target. I'm not sure it's really a problem, but it feels like that might be part of why the graph cycle is happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really following the argument here. At this point, we had only discovered that there is a Dependencies
field (or subclass). We need to check if there are any other special cased dependencies fields that are in play, and the way to do that is to iterate over all the fields to see if they subclass this marker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, we had only discovered that there is a Dependencies field (or subclass).
What if we included the special-cased dependencies fields in the original DependenciesRequest
?
@@ -1667,6 +1655,27 @@ def __iter__(self) -> Iterator[Address]: | |||
return iter(self.dependencies) | |||
|
|||
|
|||
class SpecialCasedDependencies(AsyncStringSequenceField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definitely prefer something like IndirectDependencies
for this class. I don't feel like SpecialCased
is very clear -- it also describes what we do with the field, as opposed to what it is, which I think is not what we want to be doing (please feel free to correct me if I'm wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like SpecialCased is very clear
I argue this is a feature, not a bug. There is very little unifying about this parent class, outside of a) these are like the normal dependencies field, and b) you are going to do something magical with the values, such that you didn't think it was appropriate to use the simpler, normal Dependencies
field.
We don't know what the magic might be. It could be running the spiritual equivalent of ./pants package
, or it could be that you want to express this solely works with files
targets, etc.
My concern with IndirectDependencies
is that it might be trying to generalize more than we can do in this situation. These are direct dependencies—rather than indirect—only that they have been split out from the normal Dependencies
field for some reason the plugin author decided to do.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know what the magic might be. It could be running the spiritual equivalent of ./pants package, or it could be that you want to express this solely works with files targets, etc.
I said above that I would expect it to be better to describe what it is rather than what we do with it in the name, since as you describe here there are many different things we might want to do with it. I'm concerned that naming things based on how they're used leads to God Classes. I believe every other type of class in target.py
follows the "what it is" naming scheme.
These are direct dependencies—rather than indirect
Ok, I think I was mostly trying to go off of this description from the OP:
or example, with
relocated_files()
, it's important that the original files targets are not included when we're hydrating sources for the transitive closure.
Could we consider a name that more accurately represents this distinction? The reason I thought about IndirectDependenciesRequest
was because it chose to include the original files()
target as well. Perhaps IndirectDependencies
isn't appropriate for the field name itself -- maybe WrappedDependencies
?
@@ -612,9 +612,13 @@ class TransitiveTargetsRequest: | |||
""" | |||
|
|||
roots: Tuple[Address, ...] | |||
include_special_cased_deps: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't especially like how we have to have this include_special_cased_deps
field wherever we want to propagate the behavior. It feels harder to search for and remove later than introducing a new dataclass which performs the desired behavior via @rule
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, forget the above -- I tried to implement a new dataclass and it was too hard.
Could we make this an Enum
instance instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the Enum
be? There are only two variants: include or exclude.
While I generally like Enums for modeling (especially over strings), they add new complexity. Unless we believe there are going to be additional states beyond this binary in the future, an enum seems like unnecessary complexity imo, such as adding yet another class you need to import from target.py
, which is already gargantuan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the Enum be? There are only two variants: include or exclude.
You said in this comment #10923 (comment):
There is very little unifying about this parent class, outside of a) these are like the normal dependencies field, and b) you are going to do something magical with the values
That seems to me to imply we are going to want to do many other things with this field in the future. An enum allows for that. boolean fields are often a code smell: see https://softwareengineering.stackexchange.com/a/148058.
such as adding yet another class you need to import from target.py, which is already gargantuan.
I think this is necessary complexity, because each element is individually cacheable. When I responded to a user in the slack yesterday who mentioned that it was a little off-putting to import so many things by mentioning how the point is to make a 100% cacheable and performant build tool, they seemed to accept that very easily.
I think if this is a concern, we could consider methods to bring elements into scope automatically, perhaps by linking classes to other classes. #10924 generalizes the inject_for
/infer_from
pattern with a ConvertibleFrom
mixin, which makes a lot of the small machinery fall away, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean fields are often a code smell:
Sure, but premature abstraction is also a code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that giving each choice of an option a name is "premature abstraction" and I'm confused as to why you've decided this is the case given that it was decided a whole separate issue needed to be created to figure out how to handle this problem of the kind of dependencies to retrieve. It's clearly not a trivial distinction. I don't care enough to continue this discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried to implement the solution I suggested with the separate dataclass, and it was too hard. I would like to make the include_special_cased_deps
field into an Enum
if possible, but otherwise this looks great!
Thanks for the deep dive Danny, and for acking this. I think simple is better than complex here (and in most cases...) |
Problem
Closes #10888.
We now have several fields that act like dependencies, but are not quite it. In most code, we don't want the dependencies showing up with
TransitiveTargets
andDependenciesRequest
; for example, withrelocated_files()
, it's important that the original files targets are not included when we're hydrating sources for the transitive closure.But with project introspection, we do need these dependencies-like fields to be used. This is important for
--changed-since --changed-dependees
to work properly.Solution
Add a new
SpecialCasedDependencies
superclass. This is not a normal field, e.g. not settingalias
. It's closer in spirit to a "Field template", e.g.StringField
andBoolField
.Both
DependenciesRequest
andTransitiveTargetsRequest
get new boolean flags to determine if they should include any subclasses ofSpecialCasedDependencies
or not. Then, we teach the project introspection goals to use the new behavior.