Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts are run unnecessarily depending on how turbo is configured #937

Open
dcherman opened this issue Mar 24, 2022 · 50 comments
Open

Scripts are run unnecessarily depending on how turbo is configured #937

dcherman opened this issue Mar 24, 2022 · 50 comments
Assignees
Labels
area: ergonomics Issues and features impacting the developer experience of using turbo needs: champion

Comments

@dcherman
Copy link

dcherman commented Mar 24, 2022

What version of Turborepo are you using?

1.1.9

What package manager are you using / does the bug impact?

Yarn v1

What operating system are you using?

Mac

Describe the Bug

When running a command like turbo run test, I would expect it to build the graph of tasks to run based on packages that actually define a test task. Instead, it seems like it assumes that all packages contain that task which results in needlessly running all tasks that would be dependencies.

In the case of the reproducer below, only the bar package contains a test script with a dependency on foo. The build task in baz was executed despite the output of it not being required for executing the requested tasks.

Expected Behavior

Only packages that actually contain the tasks to run should be considered when building the task graph.

To Reproduce

  1. Clone https://github.com/dcherman/turborepo-bad-script-reproducer
  2. yarn install
  3. yarn reproduce

Since only bar has a test script, I expected to see the build task from foo and bar run followed by the test task from bar. The build task of baz runs despite it having no test task and it not being a dependency of either of the other packages.

Here's an image of the graph that's generated from running turbo run test --graph

image

Note that the both the foo and baz tasks contain a node for the test task despite not actually defining a test task in package.json.

@gsoltis
Copy link
Contributor

gsoltis commented Mar 24, 2022

I can see the issue here, and I think I'm maybe 99% convinced the behavior should change. I am a little worried about anyone using "synthetic" tasks to trigger upstream packages to do something, but I think we have other ways of handling that (--include-dependencies / --filter=<pkg>... if it is desired behavior.

@mantljosh
Copy link

I just ran into this as well and found it surprising and it is causing issues for my use case:

I'm currently migrating a very large monorepo over to use turborepo, and trying to update our scripts one package at a time, and my pipeline contains a top level script turbo:dev - but I cannot run turbo run turbo:dev without filters because it will attempt to run all the dependency scripts on all of my packages (even ones that don't yet have a turbo:dev) which ultimately fails because those scripts are misconfigured.

For now I need to pass a really long filter enumerating out ever package that has been converted to avoid this issue

@michaelkplai
Copy link

I think this discussion may also be related #1106.

I also recently ran into this issue. The section of the documentation that mentions that missing tasks are gracefully ignored led me to assume missing task dependencies would also be ignored. See https://turborepo.org/docs/core-concepts/pipelines#tasks-that-are-in-the-pipeline-but-not-in-some-packagejson.

Since ignoring missing task dependencies would be a breaking change it may make sense to introduce this as a new cli flag or config option rather than overwrite the default behavior.

CLI Option turbo run build --ignore-missing-task-deps

turbo.json Option ”ignoreMissingTaskDeps”: true.

I’d be open to a more terse option name. However, --if-present could be a little confusing since the npm implementation is more about suppressing errors for missing scripts and has no relation to dependencies. See https://docs.npmjs.com/cli/v8/commands/npm-run-script#if-present.

I’d be happy to look into contributing a PR if that helps move this issue along.

@finn-orsini
Copy link

finn-orsini commented May 11, 2022

Joining the discussion here from a similar issue that was marked as duplicate.

If I understand correctly, the current behavior can be described as:

  • Turbo will attempt to run tasks that do not exist, and "gracefully ignore" them.
  • Turbo will attempt to run dependencies of tasks that do not exist.

and these two behaviors repeat themselves all the way down the pipeline dependency graph until no more potential dependencies are found.

This issue appears to be discussing the second bullet point, i.e. should Turbo try to run dependencies of tasks that do not exist?

Although similar, #1135 was more related to the first bullet point, i.e. what does "gracefully ignoring" a task mean with respect to the --dry-run output? Currently the missing tasks are filtered from the overall task list if they don't exist, but they still appear as dependencies of existing tasks.

I'm not completely convinced these are the same issue - but maybe there are some implementation details that would result in the decision here addressing both?

@gsoltis
Copy link
Contributor

gsoltis commented May 11, 2022

@finn-orsini you're right, they are technically distinct but related.

I think we're going to move forward with a PR to prune packages that don't have the requested task, which will have the result of causing the output from --dry-run match what is actually executed.

However, given that the above is technically a breaking change (workaround: define a no-op task), it may take a little while to land. In the meantime, I'll put up a PR to include the non-existent tasks in the --dry-run output, so that whichever way the above change goes, at least --dry-run will be more informative.

kodiakhq bot pushed a commit that referenced this issue May 11, 2022
Related to #937 and #1135

When executing a dry run, don't short-circuit on tasks with a non-existent command. Instead, set the command to `<NONEXISTENT>` and include it in the display.
@zanona
Copy link

zanona commented May 12, 2022

Would it be possible to have the expected behaviour under a flag i.e:--if-present and in the next major default to it instead?

@TxHawks
Copy link

TxHawks commented May 12, 2022

I'd really like to see an intermediary step too. I was actually forced to rename some of my pipline tasks because of this behavior, which is far from ideal

@gsoltis
Copy link
Contributor

gsoltis commented May 12, 2022

I will look into putting this behind a flag.

@ObliviousHarmony
Copy link
Contributor

I actually think this behavior makes a lot of sense. At its core, Turborepo seems to be a task orchestration tool with comprehensive support for caching to avoid unnecessary task execution. I see a pipeline as a declaration of task dependencies and caching more than a specific list of package.json scripts to call. I think it's counterintuitive that Turborepo would decide what I meant when I said I wanted to execute a dependent task first. Besides, isn't the point of the cache that this shouldn't matter after the first execution?

I am currently working on migrating from Nx to Turborepo in a multi-language monorepo. There are cases where "build" has no meaning to a project, but unless the dependencies are built, it will not work. Considering that these are also paired with "test" pipelines that depend on "build", it seems intuitive that it would build the dependencies all the way down.

I recognize that explicit "{package}#{task}" pipelines can resolve this but it seems unnecessary to add a bunch of copy/pasted pipelines to tell Turborepo I want it to do something I feel like I already told it to do. Having to use --filter calls also feels like an unnecessary workaround. It's weird to have to run turbo run build --filter={package}... on every package to get it to actually build dependencies when I already made "build" dependent on "^build". While the discussion here includes a proposal for an option to exhibit the current behavior, having to pass this to every turbo call in our repository is excessive.

While I understand that my opinion seems to be the minority, I think there's an ideological question here about what we mean when we declare a task dependency. Approaching the problem from that angle, a better solution could be found in the configuration of the pipelines themselves. My first thought was that we could add a new pipeline configuration option to declare dependency execution intent but this doesn't feel expressive enough.

My proposal would be to use a token to tell Turborepo what we mean when we say that a task depends on another. If we want to maintain backward compatibility, a token like ^= could say that we want to stop traversing down the graph when we meet a dependency that does not have a task. If we are fine with breaking compatibility, a token like ^~ could say that we're doing a fuzzy dependency match and want to execute a task all the way down. I like this solution because you can mix and match how a specific dependent task should behave.

@mantljosh
Copy link

mantljosh commented May 17, 2022

@ObliviousHarmony from what you describe it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run (regardless of whether or not the package itself needs to build).

@ObliviousHarmony
Copy link
Contributor

it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run

I thought about this too @mantljosh, but, it falls apart with transitive dependencies. Even if my "test" task builds the direct dependencies of my package, when one of those also has no "build" but has dependencies that do, we reach the same conclusion. In my case, there are packages pulled into the main app which are not built themselves but have JS dependencies that are.

@gsoltis
Copy link
Contributor

gsoltis commented May 17, 2022

Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies. Given that's a scenario that we want to support, I don't think we can proceed with this change as a default.

Perhaps we could augment the --filter syntax to include matching on having a particular task? So you could run turbo run test --filter=#has-task or something like that. Or perhaps this could be configured per-task in turbo.json?

@ObliviousHarmony
Copy link
Contributor

ObliviousHarmony commented May 17, 2022

@gsoltis Augmenting the --filter syntax likely wouldn't be sufficient since you would want to be able to run turbo run test --filter={package} and have it exhibit the desired behavior too (unless you're suggesting --filter={package}#has-task?) What do you think of my token proposal above?

@dcherman
Copy link
Author

dcherman commented May 17, 2022

@ObliviousHarmony It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.

Although this is absolutely a breaking change, you could accomplish this by adding a build task to your project that does nothing but exit 0. At that point, your project actually has a build task and it's no longer surprising that it'd be included in the pipeline.

FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.

@ObliviousHarmony
Copy link
Contributor

It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.

I don't think either behavior is objectively right. The difference of opinion here doesn't really matter, I admitted already I am probably in the minority here 😃

FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.

Honestly, I would just like to settle on a solution that doesn't require either workflow to remember to pass an argument to do the right thing. These are radically different behaviors and seem like something better handled at the configuration level.

What did you think of my proposal?

@dcherman
Copy link
Author

@ObliviousHarmony The problem isn't solely that the dependencies don't have the task to run, so I'm not sure that your token proposal would actually solve the problem described in the original bug. To fix that one, we need to filter the initial set of projects down to ones that actually have the task being run. I see your point about transitive dependencies though and wonder if we'd end up needing a combination of both solutions.

FWIW, that if-present flag could be supported in config pretty easily if that's desirable - just move the flag to an ifPresent boolean property in the config file. This could follow the same precedence rules as other options so that environment variables, config, and flags are all supported.

@ObliviousHarmony
Copy link
Contributor

ObliviousHarmony commented May 19, 2022

so I'm not sure that your token proposal would actually solve the problem described in the original bug

Ah @dcherman, I came here from the open pull request and so I misunderstood the original bug it seems. No, my proposal wouldn't solve that.

I 100% agree that running the "build" task on projects that don't have a "test" task doesn't make any sense. There is still a case for running a "build" task on a project with no script but that seems like the edge case. Either a configuration option or the filter augmentation proposed by @gsoltis would work just fine. One benefit of the filtering approach is that it creates a base for the hash syntax to add modifiers to filtering.

I'd suggest an approach of applying that filtering to the nodes selected for entry into the graph and then having transitive dependencies execute tasks with the original behavior. A token probably doesn't make much sense here, I can't imagine any cases where you would want to run a "test" command but not build the dependencies of that task. The option/filter/whatever can cover the entry "build" edge case just fine.

As an aside, I've added basically nothing to this issue's discussion 🗡️

@gsoltis
Copy link
Contributor

gsoltis commented May 19, 2022

One concern I have with using the configuration approach is the ability to override it. Currently, tasks in turbo.json map 1:1 to the name of the script being run, so it is not feasible to, for instance, have two copies of "test", with and without the option to include targets that don't have a "test" script. This means that regardless of whether or not we augment the config file, we must have a flag to control behavior per-execution.

@mantljosh
Copy link

This is an annoying problem because like other's have mentioned, both approaches have benefits and downsides.

I think there may be a safe incremental step to take, where we only do the "ignoring" based off the top level of the execution graph. What I mean is that if I run turbo run test, it should immediately filter out any packages that do not have a test script and ignore them (and their dependencies) entirely.

@georeith
Copy link

This would break my use case:

I have tests that run from source, however I want the cache of those tests to miss if one of their dependencies changes regardless of whether that package itself has tests.

However I would say that ^tests is not necessarily the right thing here either as I don't care if my topological dependencies test configuration changes.

Really what I want to express is that I want to use the input hash of build but I don't actually want to run build. There's no way of currently doing this I believe?

@LionC
Copy link
Contributor

LionC commented Jan 9, 2023

Just to add another resulting problem to this (talking to @anthonyshew in Discord about this right now):

This will also lead to turbo reporting circular dependencies and thus not executing anything, even though with the actual tasks and dependencies given, there is no circular dependency and it would be executable just fine.

For an example, see this reproduction repository: https://github.com/LionC/turbo-circular-dependency-example

Here, with all tasks that are present in the workspaces, the graph should look like this

image

But instead it reports a circular dependency

 ERROR  run failed: error preparing engine: Invalid task dependency graph:
cyclic dependency detected:
	schema#build,___ROOT___#codegen,___ROOT___#build

@dobesv
Copy link
Contributor

dobesv commented Feb 9, 2023

I think this issue is causing us a lot of unnecessary rebuilds. Sure you can say the jobs are cacheable but keep in mind that they won't be cached if they depend on tasks that are being "virtually" re-run, even though those tasks don't exist. So this problem adds a lot of a imaginary dependencies and causes a lot of unexpected rebuilds.

Some use cases that I have which are relevant here:

I have a build script that needs another tool to be built in order to run - if the script is defined for that package - and which depends on some tasks from its dependencies. Currently this just means every package will seem to require that tool to be built and those tasks from their dependencies to be built as well, unless I specificallly define a default version of the task with no dependsOn and then write a package-specific version of the task for every package that actually uses the tool. e.g. the tool is generate-json-schema-from-types and the script build:json-schema. I only want that task to depend on generate-json-schema-from-types and ^build:types if it exists for a package.

I have a generic build-all script in each package called build, which runs the specific scripts for that package. Turbo also can run the specific tasks for that package. It would be nice if I could have a sort of "virtual" task in turbo which generates dependencies but does NOT run the task.

A couple solution ideas:

When defining a task, have a way to write a task spec that only applies to packages which define the task. This would be equivalent to writing out all the @xxx/yyy#build:json-schema jobs individually for those packages, but leaves the "default" version with blank dependsOn, inputs, outputs, etc.. So for example the selection syntax could be something like ":has-script(build:json-schema)#build:json-schema": { "dependsOn": ["^build:types", ...] } or maybe simpler "#build:json-schema" : { "dependsOn": ["^build:types", ...] } Other packages would have no definition for that task, so no dependsOn and no synthetic dependencies.

Have an option on a task to mark it as "virtual" which means it will never try to run the actual script, it just is used for dependency purposes.

@connorjs
Copy link

connorjs commented Mar 2, 2023

I came here to request the original behavior, but I totally agree with the existing default after reading this conversation (thanks all!). Specifically this comment.

In my opinion, as a default, doing extra work is probably preferable to not doing enough work, especially if the extra work is easily cache-able.


Following this discussion, I chose to use "outputMode": "new-only" in my "build" task to simplify the output of the “target” tasks (examples: test, start, preview), because the "build" tasks in other packages will often hit the cache.

This does remove some nice things on re-run (namely: omitting bundle stats), but I often have HTML visualizations for bundle stats I can open anyway.

@RIP21
Copy link

RIP21 commented Apr 24, 2023

For me, this is very annoying as I have e2e tests. And I want to run the e2e suite for BE in one workflow ignoring everything related to frontend e2e tests. So I don't need to build any frontend dependencies nor do I want a Next.js build to be built which takes forever compared to libraries that are instant/cached mostly.
Also, since the environment is not prepared for Next.js to build it breaks the pipeline.
But since test:e2e (non-existent in Next.js package) depends on build and build^ it still gets unexpectedly triggered for no good reason.
I'll have to ignore it as well with ! to avoid that one explicitly, I guess. But I'm quite surprised it's still discussed here in length when it's an expected/default behavior for all build/monorepo tools I used (Maven, Gradle, Rush.js, Lerna (I guess, used it long time ago))

@tannerbaum
Copy link

tannerbaum commented Aug 17, 2023

So I don't need to build any frontend dependencies nor do I want a Next.js build to be built ... But since test:e2e (non-existent in Next.js package) depends on build and build^ it still gets unexpectedly triggered for no good reason... But I'm quite surprised it's still discussed here in length when it's an expected/default behavior for all build/monorepo tools I used (Maven, Gradle, Rush.js, Lerna (I guess, used it long time ago))

I think this issue is a bit complicated because several topics are being discussed, but to me this is a great summary of my problem, which as @michaelkplai outlines and provides a great solution is not clear in the docs.

It doesn't have to be the default behavior, but adding a --ignore-missing-task-deps config and clarifying that they aren't ignored in the docs would be a huge improvement.

Edit: I was saying the discussion within the issue had gotten complicated due to parallel discussions like how dry-mode handles these tasks.

@RIP21
Copy link

RIP21 commented Aug 17, 2023

Unsure what's complex here. But if task is absent for the package, then you don't need to run its dependent tasks. Simple as that.

In other words:
If test:e2e task is dependent on own build and dependencies build, then you run own build and build of the dependencies ONLY if test:e2e task is present for the package. If it's not, then you don't need to run its build and upstream build tasks as artifacts they generate won't be needed/won't be used.

For me it's so obvious I don't understand what to discuss here. People may opt into current weird behavior via flag, but default should be the one I described above.

@franzwilhelm
Copy link

1000x thumbs up to what @RIP21 is saying here. Super simple

@dobesv
Copy link
Contributor

dobesv commented Aug 17, 2023

if task is absent for the package, then you don't need to run its dependent tasks. Simple as that.

I do agree this should probably be the default.

I also want tasks that were always "virtual" and not only don't have to exist on the package, but won't be run even if they do.

And tasks that behave the way they do now.

@michaelkplai
Copy link

We are currently working around this issue using workspace configurations.

If a task doesn't exist in a workspace we overwrite the dependsOn scripts.

{
  "extends": ["//"],
  "pipeline": {
    "missing-task": {
      "dependsOn": []
    }
  }
}

Would still prefer a more permanent solution in the form of an global option to ignore missing tasks.

@mehulkar mehulkar added area: ergonomics Issues and features impacting the developer experience of using turbo needs: team input Filter for core team meetings owned-by: turborepo labels Oct 20, 2023
@mehulkar mehulkar removed the needs: team input Filter for core team meetings label Dec 4, 2023
@mehulkar
Copy link
Contributor

mehulkar commented Dec 6, 2023

Hey all, it's clear from the discussion here that we need a more powerful way to express how these "Transit Nodes" behave.
In the meantime, I've added some docs about how it works today and why.

It's clear that both behaviors are defensible, but to be clear: we will not be changing the default behavior before a 2.0 release, since it's a very fundamentally breaking change to the Task Graph construction.

We most likely need a more expressive dependsOn configuration (as opposed to expanding --filter or --ignore-missing-task-deps as suggested earlier in this issue). My reasoning is that in most use cases a pipeline has a "correct" definition of what task dependencies should be, rather than a pipeline that needs to run both ways (although you may well define one task entrypoint one way, and another task entrypoint another way).

@mehulkar mehulkar added the story label Dec 6, 2023
@closesimple-wl

This comment was marked as resolved.

@mehulkar

This comment was marked as resolved.

@mehulkar mehulkar added the needs: investigation The bug / idea needs more investigation label Jan 23, 2024
@rjohnsonbade
Copy link

Just adding to this thread as I spent some time this morning figuring out why tasks were running unexpectedly, and it was due to this issue.

I agree with this comment, #937 (comment).

The most obvious behaviour is that if a task does not exist for a workspace, then the dependencies will not be run.
I understand this is a breaking change, but strongly believe that should be the default behaviour.

I understand that this functionality is required for "synthetic" tasks, however my initial impression of this use case (as documented here), was that it felt a little hacky, and not very obvious.

I think a more explicit way of configuring these "synthetic tasks"; and changing the default functionality for dependencies of non-existent tasks, would make turborepo more intuitive overall.

@mehulkar mehulkar added needs: champion and removed needs: investigation The bug / idea needs more investigation labels Jan 31, 2024
@anthonyshew anthonyshew removed the story label Feb 22, 2024
@dobesv
Copy link
Contributor

dobesv commented May 2, 2024

I would be willing to work on this if I had some direction from the maintainers what solution they are willing to accept.

Maybe a couple boolean fields?

  • "requiresScript": true - if the script doesn't exist in the package, don't process any dependencies (treat as {dependsOn: [], inputs: [], outputs: [].
  • "runScript": false - if false, don't run the script, even if it exists. Defaults to true

To get the current behavior, both could be absent or false. It could be possible to say that in the future requiresScript will be true by default and put a warning if no value is provided, so people who want to keep the current behavior should set "requiresScript": false.

I do feel like there should be a better name for these fields, but struggling a bit to find one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ergonomics Issues and features impacting the developer experience of using turbo needs: champion
Projects
None yet
Development

Successfully merging a pull request may close this issue.