Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Target type abstract to allow overriding by different concrete implementations #2402

Merged
merged 18 commits into from
Apr 3, 2023

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Apr 2, 2023

This PR makes all of the T.{apply, input, source, sources, persistent} functions return the same Target type, so that you can easily override one with another. T.{command, worker} are still distinct types.

Motivation

This allows us to override one with another, e.g. override a T.sources with a T.apply if we want to replace source files with computed sources.

trait Foo extends Module{
  def thing = T.source(millSourcePath / "foo.txt")
}
trait Bar extends Foo{
  def thing = T{ 
    os.write(T.dest / "foo.txt", "hello")
    PathRef(T.dest / "foo.txt") 
  }
}

Currently, the workaround is to make T.source do the computation, as follows:

trait Foo extends Module{
  def thing = T.source(millSourcePath / "foo.txt")
}
trait Bar extends Foo{
  def thing = T.source{ 
    os.write(T.dest / "foo.txt", "hello")
    PathRef(T.dest / "foo.txt") 
  }
}

But the status quo is wasteful: it would begin hashing foo.txt at the start of every evaluation, it will watch foo.txt for changes, etc.. Even though we know it could never change since it's a generated file. This is because we are not allowed to change the method type of Source to T[PathRef] during the override, and thus we have to preserve all the Source characteristics even though we know they no longer apply

With this PR, we can simply replace the T.sources with a T{...}, and Mill makes use of that information to avoid hashing/watching the PathRef unnecessarily

Implementation

  1. NamedTask and NamedTaskImpl were merged

  2. Target's logic was mostly hoisted into NamedTask, leaving Target an empty marker trait

  3. TargetImpl is unchanged

  4. Input, Source and Sources have been renamed InputImpl, SourceImpl, and SourcesImplt

  5. All the functions that used to return Source/Sources/Persistent/etc. now return the same type Target, meaning that we can easily override one with the other.

  6. I added stubs in mill/define/package.scala to make existing type annotations : Sources, : Input, etc. continue to work as type aliases to Target[T], and our large codebase and test suite required relatively few changes

  7. Command/T.command and Worker/T.worker continue to return their specific type Command[T]/Worker[T], since they are not sub-types of Target[T].

The Task type hierarchy is considerably flatter and simpler:

Before:

  • Task
    • Task.Sequence, Task.TraverseCtx, Task.Mapped, Task.Zipped, T.task, etc.
    • NamedTask
      • NamedTaskImpl
        • Command
        • Worker
        • Input
          • Source
          • Sources
    • Target
      • TargetImpl (also inherits from NamedTaskImpl)
        • Persistent

After:

  • Task
    • Task.Sequence, Task.TraverseCtx, Task.Mapped, Task.Zipped, T.task, etc.
    • NamedTask
      • Command
      • Worker
      • Target
        • TargetImpl
          • PersistentImpl
        • InputImpl
          • SourceImpl
          • SourcesImpl

Testing

Added some tests to TaskTests.scala to demonstrate and validate the behavior when people override with different types. This was previously a compile error

I had to update the error message for the wrong number of param lists, from T{...} definitions must have 0 parameter lists to Target definitions must have 0 parameter lists, since we no longer know which target sub-type a method returns based on its signature

Notes

  • Despite the overhaul of the type hierarchy, this should mostly be a transparent change. Hopefully it would be minimally source-incompatible with existing code, so even though there's a huge bin-compat breakage here, people upgrading shouldn't be too affected.

@lihaoyi lihaoyi marked this pull request as draft April 2, 2023 09:27
@lihaoyi lihaoyi changed the title [WIP] Make Target type abstract to allow overriding by different concrete implementations Make Target type abstract to allow overriding by different concrete implementations Apr 2, 2023
@lihaoyi lihaoyi requested review from lefou and lolgab April 2, 2023 12:24
@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 2, 2023

The correct-number-of-parameter-lists checks no longer work, since they used to inspect the return types of the methods, which they can no longer do since they all have the same return type. I'll see if I can find a different way to implement them

@lihaoyi lihaoyi marked this pull request as ready for review April 2, 2023 12:29
@lefou
Copy link
Member

lefou commented Apr 2, 2023

Does this affect also T.command? If not, there should never be an argument list allowed in all variants of Target, as only Tasks and Commands should accept parameter lists.

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change. In my early Mill days, it took me a while to understand, that all these T-macros return different types and that these are not compatible. In the mental model, they are all interchangeable, and this PR is just correcting it. Great!

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 2, 2023

@lefou currently, this PR does make Command a sub-type of Target, since it used to be a NamedTask. I adjusted the error-reporting code to make it work now that Command and Target are no longer disjoint types, so that's fine I think.

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 2, 2023

@lefou maybe two questions I would like to discuss, which concern exactly what we want a Target (as an abstract class) to really mean:

Should Command be a sub-type of Target?

  1. If we define Target as "Task with name/segments", then Command should be a sub-type since it has a name
  2. If we define Target as "Task that is uniquely addressed by name/segments", then Command should not be a sub-type since it requires not only a name/segments but also arguments to fully define what it will run

Should Workers be Targets?

They can be uniquely addressed by a name/segments, but they cannot be run from the CLI (I think?), and do not return any JSON-serializable output (since their whole thing is about keeping stuff in memory for performance and not serializing it)


I think regardless of what definition we choose I can probably make the tests pass, so this is more from a philosophical perspective rather than from any practical constraints. What do we think makes the most sense, to us and to other people?

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 2, 2023

I guess if we define Target as "things that are mostly interchangeable and can over-ride each other", then that rules out Command, since they generally take parameters while other things don't. The incompatible signatures would prohibit over-riding a T{...} or T.input with a T.command no matter what we think.

What about Workers? Should we be able to override a T.worker with a T{...}, or vice versa? My gut feeling is that we shouldn't allow that, but it's not totally clear what the actual reason is.

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 2, 2023

Another issue with the naming is that it is confusing to have "Target" both refer a concrete implementation as well as an abstract class that other implementations also use.

We could rename the abstract class from Target to Task, rename the current Task to something like BaseTask, and then leave Target as the name of the concrete implementation. Like what I did for Input and Source and Sources, I can leave an alias type Target[T] = Task[T] for source compatibility. Then we would have separate names for the interface Task vs the concrete implementation Target, which may be less confusing in future

@lefou
Copy link
Member

lefou commented Apr 2, 2023

@lihaoyi as you said, Commands are not interchangeable with Targets, so they should not extend Target. They are almost everywhere special handled, in the router, in the evaluator, exactly for the fact that they also depend on their parameters. Commands are some kind of target factories, so to say.

Workers are some utility to work with state or processes and mostly an implementation detail. They aren't cacheable and can't be invoked from CLI, so they should also not inherit Target.

About the names. I think Task should stay Task, they are anonymous and don't need to produce JSON-serializable values. Target should also stay, they produce JSON-able output. It's IMHO ok from a user perspective, that there are more specialized variants of a Target (input, persistent, ...). If we call them consistently, e.g. "input target" or "persistent target", it's probably ok.

Commands are mostly for use from the CLI, but can also be useful as an API, but only if their parameters accept Tasks, otherwise users see misleading error messages. This caused a lot of issues for our users, especially, when they tried to override a command. They currently need to produce JSON-able output, but we don't use that property in any way beside the fact, that the user or external tools can easily inspect the output.

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 2, 2023

Hmm ok. If we do wish to make Commands and Workers not inherit from Target, then I'll need to resurrect the NamedTask trait to put them under, just so they have a place they can branch off from the others in the type hierarchy. Should be easy to do, and if the NamedTask isn't really going to be very user-visible it probably won't add much confusion

@lihaoyi
Copy link
Member Author

lihaoyi commented Apr 3, 2023

I managed to merge a bunch more stuff together, so now the type hierarchy is somewhat flatter and simpler:

Before:

  • Task
    • Task.Sequence, Task.TraverseCtx, Task.Mapped, Task.Zipped, T.task, etc.
    • NamedTask
      • NamedTaskImpl
        • Command
        • Worker
        • Input
          • Source
          • Sources
    • Target
      • TargetImpl (also inherits from NamedTaskImpl)
        • Persistent

After:

  • Task
    • Task.Sequence, Task.TraverseCtx, Task.Mapped, Task.Zipped, T.task, etc.
    • NamedTask
      • Command
      • Worker
      • Target
        • TargetImpl
          • PersistentImpl
        • InputImpl
          • SourceImpl
          • SourcesImpl

Target is now an empty marker-trait, with all the necessary logic merged into NamedTask. There's probably more cleanups we can do, but this is probably a reasonably good state for now.

with Target[Int] {
val ctx = ctx0.withSegments(ctx0.segments ++ Seq(ctx0.segment))
val readWrite = upickle.default.readwriter[Int]
class TestTarget(taskInputs: Seq[Task[Int]], val pure: Boolean)(implicit ctx0: mill.define.Ctx)
Copy link
Member Author

@lihaoyi lihaoyi Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TestTarget was always a bit weird; unlike normal TargetImpls, it didn't wrap a Task[T], and instead implemented the Task[T] behavior itself, to keep the task graph simple for testing purposes.

Now that we've simplified the Task hierarchy, we need to be a bit more explicit about implementing the task behavior ourselves, hence all these getter/setter forwarders

@@ -40,22 +40,74 @@ abstract class Task[+T] extends Task.Ops[T] with Applyable[Task, T] {
def self: Task[T] = this
}

object Task {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moved up from the bottom of the file, so it is next to the companion trait

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Great change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants