-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts are run unnecessarily depending on how turbo is configured #937
Comments
I can see the issue here, and I think I'm maybe 99% convinced the behavior should change. I am a little worried about anyone using "synthetic" tasks to trigger upstream packages to do something, but I think we have other ways of handling that ( |
I just ran into this as well and found it surprising and it is causing issues for my use case: I'm currently migrating a very large monorepo over to use turborepo, and trying to update our scripts one package at a time, and my pipeline contains a top level script For now I need to pass a really long filter enumerating out ever package that has been converted to avoid this issue |
I think this discussion may also be related #1106. I also recently ran into this issue. The section of the documentation that mentions that missing tasks are gracefully ignored led me to assume missing task dependencies would also be ignored. See https://turborepo.org/docs/core-concepts/pipelines#tasks-that-are-in-the-pipeline-but-not-in-some-packagejson. Since ignoring missing task dependencies would be a breaking change it may make sense to introduce this as a new cli flag or config option rather than overwrite the default behavior. CLI Option turbo.json Option I’d be open to a more terse option name. However, I’d be happy to look into contributing a PR if that helps move this issue along. |
Joining the discussion here from a similar issue that was marked as duplicate. If I understand correctly, the current behavior can be described as:
and these two behaviors repeat themselves all the way down the pipeline dependency graph until no more potential dependencies are found. This issue appears to be discussing the second bullet point, i.e. Although similar, #1135 was more related to the first bullet point, i.e. what does "gracefully ignoring" a task mean with respect to the I'm not completely convinced these are the same issue - but maybe there are some implementation details that would result in the decision here addressing both? |
@finn-orsini you're right, they are technically distinct but related. I think we're going to move forward with a PR to prune packages that don't have the requested task, which will have the result of causing the output from However, given that the above is technically a breaking change (workaround: define a no-op task), it may take a little while to land. In the meantime, I'll put up a PR to include the non-existent tasks in the |
Would it be possible to have the expected behaviour under a flag i.e: |
I'd really like to see an intermediary step too. I was actually forced to rename some of my pipline tasks because of this behavior, which is far from ideal |
I will look into putting this behind a flag. |
I actually think this behavior makes a lot of sense. At its core, Turborepo seems to be a task orchestration tool with comprehensive support for caching to avoid unnecessary task execution. I see a pipeline as a declaration of task dependencies and caching more than a specific list of I am currently working on migrating from Nx to Turborepo in a multi-language monorepo. There are cases where I recognize that explicit While I understand that my opinion seems to be the minority, I think there's an ideological question here about what we mean when we declare a task dependency. Approaching the problem from that angle, a better solution could be found in the configuration of the pipelines themselves. My first thought was that we could add a new pipeline configuration option to declare dependency execution intent but this doesn't feel expressive enough. My proposal would be to use a token to tell Turborepo what we mean when we say that a task depends on another. If we want to maintain backward compatibility, a token like |
@ObliviousHarmony from what you describe it sounds like your |
I thought about this too @mantljosh, but, it falls apart with transitive dependencies. Even if my |
Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies. Given that's a scenario that we want to support, I don't think we can proceed with this change as a default. Perhaps we could augment the |
@gsoltis Augmenting the |
@ObliviousHarmony It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a Although this is absolutely a breaking change, you could accomplish this by adding a FWIW, I think the proposal to add |
I don't think either behavior is objectively right. The difference of opinion here doesn't really matter, I admitted already I am probably in the minority here 😃
Honestly, I would just like to settle on a solution that doesn't require either workflow to remember to pass an argument to do the right thing. These are radically different behaviors and seem like something better handled at the configuration level. What did you think of my proposal? |
@ObliviousHarmony The problem isn't solely that the dependencies don't have the task to run, so I'm not sure that your token proposal would actually solve the problem described in the original bug. To fix that one, we need to filter the initial set of projects down to ones that actually have the task being run. I see your point about transitive dependencies though and wonder if we'd end up needing a combination of both solutions. FWIW, that |
Ah @dcherman, I came here from the open pull request and so I misunderstood the original bug it seems. No, my proposal wouldn't solve that. I 100% agree that running the I'd suggest an approach of applying that filtering to the nodes selected for entry into the graph and then having transitive dependencies execute tasks with the original behavior. A token probably doesn't make much sense here, I can't imagine any cases where you would want to run a As an aside, I've added basically nothing to this issue's discussion 🗡️ |
One concern I have with using the configuration approach is the ability to override it. Currently, tasks in |
This is an annoying problem because like other's have mentioned, both approaches have benefits and downsides. I think there may be a safe incremental step to take, where we only do the "ignoring" based off the top level of the execution graph. What I mean is that if I run |
This would break my use case: I have tests that run from source, however I want the cache of those tests to miss if one of their dependencies changes regardless of whether that package itself has tests. However I would say that Really what I want to express is that I want to use the input hash of |
Just to add another resulting problem to this (talking to @anthonyshew in Discord about this right now): This will also lead to turbo reporting circular dependencies and thus not executing anything, even though with the actual tasks and dependencies given, there is no circular dependency and it would be executable just fine. For an example, see this reproduction repository: https://github.com/LionC/turbo-circular-dependency-example Here, with all tasks that are present in the workspaces, the graph should look like this But instead it reports a circular dependency
|
I think this issue is causing us a lot of unnecessary rebuilds. Sure you can say the jobs are cacheable but keep in mind that they won't be cached if they depend on tasks that are being "virtually" re-run, even though those tasks don't exist. So this problem adds a lot of a imaginary dependencies and causes a lot of unexpected rebuilds. Some use cases that I have which are relevant here: I have a build script that needs another tool to be built in order to run - if the script is defined for that package - and which depends on some tasks from its dependencies. Currently this just means every package will seem to require that tool to be built and those tasks from their dependencies to be built as well, unless I specificallly define a default version of the task with no I have a generic A couple solution ideas: When defining a task, have a way to write a task spec that only applies to packages which define the task. This would be equivalent to writing out all the Have an option on a task to mark it as "virtual" which means it will never try to run the actual script, it just is used for dependency purposes. |
I came here to request the original behavior, but I totally agree with the existing default after reading this conversation (thanks all!). Specifically this comment.
Following this discussion, I chose to use This does remove some nice things on re-run (namely: omitting bundle stats), but I often have HTML visualizations for bundle stats I can open anyway. |
For me, this is very annoying as I have e2e tests. And I want to run the e2e suite for BE in one workflow ignoring everything related to frontend e2e tests. So I don't need to build any frontend dependencies nor do I want a Next.js build to be built which takes forever compared to libraries that are instant/cached mostly. |
I think this issue is a bit complicated because several topics are being discussed, but to me this is a great summary of my problem, which as @michaelkplai outlines and provides a great solution is not clear in the docs. It doesn't have to be the default behavior, but adding a --ignore-missing-task-deps config and clarifying that they aren't ignored in the docs would be a huge improvement. Edit: I was saying the discussion within the issue had gotten complicated due to parallel discussions like how |
Unsure what's complex here. But if task is absent for the package, then you don't need to run its dependent tasks. Simple as that. In other words: For me it's so obvious I don't understand what to discuss here. People may opt into current weird behavior via flag, but default should be the one I described above. |
1000x thumbs up to what @RIP21 is saying here. Super simple |
I do agree this should probably be the default. I also want tasks that were always "virtual" and not only don't have to exist on the package, but won't be run even if they do. And tasks that behave the way they do now. |
We are currently working around this issue using workspace configurations. If a task doesn't exist in a workspace we overwrite the {
"extends": ["//"],
"pipeline": {
"missing-task": {
"dependsOn": []
}
}
} Would still prefer a more permanent solution in the form of an global option to ignore missing tasks. |
Hey all, it's clear from the discussion here that we need a more powerful way to express how these "Transit Nodes" behave. It's clear that both behaviors are defensible, but to be clear: we will not be changing the default behavior before a 2.0 release, since it's a very fundamentally breaking change to the Task Graph construction. We most likely need a more expressive |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Just adding to this thread as I spent some time this morning figuring out why tasks were running unexpectedly, and it was due to this issue. I agree with this comment, #937 (comment). The most obvious behaviour is that if a task does not exist for a workspace, then the dependencies will not be run. I understand that this functionality is required for "synthetic" tasks, however my initial impression of this use case (as documented here), was that it felt a little hacky, and not very obvious. I think a more explicit way of configuring these "synthetic tasks"; and changing the default functionality for dependencies of non-existent tasks, would make turborepo more intuitive overall. |
I would be willing to work on this if I had some direction from the maintainers what solution they are willing to accept. Maybe a couple boolean fields?
To get the current behavior, both could be absent or false. It could be possible to say that in the future I do feel like there should be a better name for these fields, but struggling a bit to find one. |
What version of Turborepo are you using?
1.1.9
What package manager are you using / does the bug impact?
Yarn v1
What operating system are you using?
Mac
Describe the Bug
When running a command like
turbo run test
, I would expect it to build the graph of tasks to run based on packages that actually define atest
task. Instead, it seems like it assumes that all packages contain that task which results in needlessly running all tasks that would be dependencies.In the case of the reproducer below, only the
bar
package contains atest
script with a dependency onfoo
. Thebuild
task in baz was executed despite the output of it not being required for executing the requested tasks.Expected Behavior
Only packages that actually contain the tasks to run should be considered when building the task graph.
To Reproduce
yarn install
yarn reproduce
Since only
bar
has atest
script, I expected to see thebuild
task fromfoo
andbar
run followed by thetest
task frombar
. Thebuild
task ofbaz
runs despite it having notest
task and it not being a dependency of either of the other packages.Here's an image of the graph that's generated from running
turbo run test --graph
Note that the both the
foo
andbaz
tasks contain a node for thetest
task despite not actually defining atest
task in package.json.The text was updated successfully, but these errors were encountered: