diff --git a/content/docs/command-reference/dag.md b/content/docs/command-reference/dag.md
index 8c663f788f..5519861ab4 100644
--- a/content/docs/command-reference/dag.md
+++ b/content/docs/command-reference/dag.md
@@ -25,7 +25,7 @@ the `dvc.yaml` files found in the project. Provide a `target` stage
name to show the pipeline up to that point.
[directed acyclic graph]:
- /doc/user-guide/data-pipelines/defining-pipelines#directed-acyclic-graph-dag
+ /doc/user-guide/pipelines/defining-pipelines#directed-acyclic-graph-dag
### Paginating the output
diff --git a/content/docs/command-reference/exp/index.md b/content/docs/command-reference/exp/index.md
index c733a2d77e..77ddf44685 100644
--- a/content/docs/command-reference/exp/index.md
+++ b/content/docs/command-reference/exp/index.md
@@ -49,8 +49,11 @@ science/ machine learning experiments.
📖 See [Experiment Management](/doc/user-guide/experiment-management) for more
info.
-> ⚠️ Note that DVC assumes that experiments are deterministic (see **Avoiding
-> unexpected behavior** in `dvc stage add`).
+> ⚠️ Note that DVC assumes that experiments are deterministic (see [Avoiding
+> unexpected behavior]).
+
+[avoiding unexpected behavior]:
+ /doc/user-guide/project-structure/dvcyaml-files#avoiding-unexpected-behavior
## Options
diff --git a/content/docs/command-reference/exp/init.md b/content/docs/command-reference/exp/init.md
index 5794a9b50e..dd1d45e183 100644
--- a/content/docs/command-reference/exp/init.md
+++ b/content/docs/command-reference/exp/init.md
@@ -97,7 +97,7 @@ See the [Pipelines guide] for more on that topic.
/doc/user-guide/project-structure/dvcyaml-files#stage-commands
[checkpoints]: /doc/user-guide/experiment-management/checkpoints
[dvc experiments]: /doc/user-guide/experiment-management/experiments-overview
-[pipelines guide]: /doc/user-guide/data-pipelines/defining-pipelines
+[pipelines guide]: /doc/user-guide/pipelines/defining-pipelines
## Options
diff --git a/content/docs/command-reference/move.md b/content/docs/command-reference/move.md
index 2fe4e00d46..49c6fe6276 100644
--- a/content/docs/command-reference/move.md
+++ b/content/docs/command-reference/move.md
@@ -93,7 +93,7 @@ Often the output of a stage is a dependency in another stage, creating a
[dependency graph]. In this case, you may want to also update the `path` in the
`deps` field of `dvc.yaml`.
-[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines
+[dependency graph]: /doc/user-guide/pipelines/defining-pipelines
diff --git a/content/docs/command-reference/params/index.md b/content/docs/command-reference/params/index.md
index b86eb66c0c..81e90b413c 100644
--- a/content/docs/command-reference/params/index.md
+++ b/content/docs/command-reference/params/index.md
@@ -75,7 +75,7 @@ is outdated upon `dvc repro` (or `dvc status`).
[hyperparameters]:
/doc/user-guide/experiment-management/running-experiments#tuning-hyperparameters
[use the same params file]:
- /doc/user-guide/data-pipelines/defining-pipelines#parameter-dependencies
+ /doc/user-guide/pipelines/defining-pipelines#parameter-dependencies
[more details]: /doc/user-guide/project-structure/dvcyaml-files#parameters
[templating]: /doc/user-guide/project-structure/dvcyaml-files#templating
[stage commands]: /doc/user-guide/project-structure/dvcyaml-files#stage-commands
diff --git a/content/docs/command-reference/repro.md b/content/docs/command-reference/repro.md
index ba4e0b9efa..647f1194bc 100644
--- a/content/docs/command-reference/repro.md
+++ b/content/docs/command-reference/repro.md
@@ -68,7 +68,7 @@ It stores all the data files, intermediate or final results in the
hash values of changed dependencies and outputs in the `dvc.lock` and `.dvc`
files.
-[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines
+[dependency graph]: /doc/user-guide/pipelines/defining-pipelines
[always changed]: /doc/command-reference/status#local-workspace-status
### Parallel stage execution
@@ -160,10 +160,8 @@ up-to-date and only execute the final stage.
option, as all possible targets are already included.
- `--no-run-cache` - execute stage command(s) even if they have already been run
- with the same dependencies and outputs (see the
- [details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful
- for example if the stage command/s is/are non-deterministic
- ([not recommended](/doc/user-guide/data-pipelines/defining-pipelines#avoiding-unexpected-behavior)).
+ with the same dependencies and outputs (see the [details]). Useful for example
+ if the stage command/s is/are non-deterministic ([not recommended]).
- `--force-downstream` - in cases like `... -> A (changed) -> B -> C` it will
reproduce `A` first and then `B`, even if `B` was previously executed with the
@@ -185,11 +183,8 @@ up-to-date and only execute the final stage.
corresponding pipelines, including the target stages themselves. This option
has no effect if `targets` are not provided.
-- `--pull` - attempts to download outputs of stages found in the
- [run-cache](/doc/user-guide/project-structure/internal-files#run-cache) during
- reproduction. Uses the
- [default remote storage](/doc/command-reference/remote/default). See also
- `dvc pull`
+- `--pull` - attempts to download outputs of stages found in the [run-cache]
+ during reproduction. Uses the [default remote storage]. See also `dvc pull`
- `-h`, `--help` - prints the usage/help message, and exit.
@@ -200,6 +195,12 @@ up-to-date and only execute the final stage.
- `-v`, `--verbose` - displays detailed tracing information.
+[details]: /doc/user-guide/project-structure/internal-files#run-cache
+[not recommended]:
+ /doc/user-guide/project-structure/dvcyaml-files#avoiding-unexpected-behavior
+[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache
+[default remote storage]: /doc/command-reference/remote/default
+
## Examples
> To get hands-on experience with data science and machine learning pipelines,
diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md
index 8006801088..1328c24484 100644
--- a/content/docs/command-reference/run.md
+++ b/content/docs/command-reference/run.md
@@ -107,7 +107,7 @@ Relevant notes:
[manual process](/doc/command-reference/move#renaming-stage-outputs) to update
`dvc.yaml` and the project's cache accordingly.
-[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines
+[dependency graph]: /doc/user-guide/pipelines/defining-pipelines
### For displaying and comparing data science experiments
@@ -216,10 +216,8 @@ data science experiments.
asking for confirmation.
- `--no-run-cache` - execute the stage command(s) even if they have already been
- run with the same dependencies and outputs (see the
- [details](/doc/user-guide/project-structure/internal-files#run-cache)). Useful
- for example if the stage command/s is/are non-deterministic
- ([not recommended](/doc/user-guide/data-pipelines/defining-pipelines#avoiding-unexpected-behavior)).
+ run with the same dependencies and outputs (see the [details]). Useful for
+ example if the stage command/s is/are non-deterministic ([not recommended]).
- `--no-commit` - do not store the outputs of this execution in the cache
(`dvc.yaml` and `dvc.lock` are still created or updated); useful to avoid
@@ -231,7 +229,7 @@ data science experiments.
when reproducing the pipeline.
- `--external` - allow writing outputs outside of the DVC repository. See
- [Managing External Data](/doc/user-guide/managing-external-data).
+ [Managing External Data].
- `--desc ` - user description of the stage (optional). This doesn't
affect any DVC operations.
@@ -243,6 +241,11 @@ data science experiments.
- `-v`, `--verbose` - displays detailed tracing information.
+[details]: /doc/user-guide/project-structure/internal-files#run-cache
+[not recommended]:
+ /doc/user-guide/project-structure/dvcyaml-files#avoiding-unexpected-behavior
+[managing external data]: /doc/user-guide/managing-external-data
+
## Examples
Let's create a stage (that counts the number of lines in a `test.txt` file):
diff --git a/content/docs/command-reference/stage/add.md b/content/docs/command-reference/stage/add.md
index dcfd76b781..797364f05f 100644
--- a/content/docs/command-reference/stage/add.md
+++ b/content/docs/command-reference/stage/add.md
@@ -46,7 +46,7 @@ graph] and execute them.
See the guide on [defining pipeline stages] for more details.
[defining pipeline stages]:
- /doc/user-guide/data-pipelines/defining-pipelines#pipelines
+ /doc/user-guide/pipelines/defining-pipelines#pipelines
@@ -111,7 +111,7 @@ Relevant notes:
[manual process](/doc/command-reference/move#renaming-stage-outputs) to update
`dvc.yaml` and the project's cache accordingly.
-[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines
+[dependency graph]: /doc/user-guide/pipelines/defining-pipelines
### For displaying and comparing data science experiments
diff --git a/content/docs/command-reference/stage/index.md b/content/docs/command-reference/stage/index.md
index 1bb7c73939..dd6598cf27 100644
--- a/content/docs/command-reference/stage/index.md
+++ b/content/docs/command-reference/stage/index.md
@@ -26,4 +26,4 @@ organize data science projects, or build detailed machine learning pipelines.
examine `dvc.yaml` files manually.
Learn more about
-[defining stages](/doc/user-guide/data-pipelines/defining-pipelines#stages).
+[defining stages](/doc/user-guide/pipelines/defining-pipelines#stages).
diff --git a/content/docs/start/data-management/pipelines.md b/content/docs/start/data-management/pipelines.md
index d61177df46..e5259d0660 100644
--- a/content/docs/start/data-management/pipelines.md
+++ b/content/docs/start/data-management/pipelines.md
@@ -171,7 +171,7 @@ $ dvc stage add -n featurize \
The `dvc.yaml` file is updated automatically and should include two stages now.
-[dag]: /doc/user-guide/data-pipelines/defining-pipelines
+[dag]: /doc/user-guide/pipelines/defining-pipelines
diff --git a/content/docs/user-guide/basic-concepts/pipeline.md b/content/docs/user-guide/basic-concepts/pipeline.md
index 8c58710ed5..bede9879f9 100644
--- a/content/docs/user-guide/basic-concepts/pipeline.md
+++ b/content/docs/user-guide/basic-concepts/pipeline.md
@@ -6,6 +6,5 @@ tooltip: >-
YAML format ([`dvc.yaml`](/doc/user-guide/project-structure/dvcyaml-files)).
This guarantees DVC can reproduce them consistently. DVC also helps automate
their execution and caches their results. See [Defining
- Pipelines](/doc/user-guide/data-pipelines/defining-pipelines) for more
- details.
+ Pipelines](/doc/user-guide/pipelines/defining-pipelines) for more details.
---
diff --git a/content/docs/user-guide/basic-concepts/stage.md b/content/docs/user-guide/basic-concepts/stage.md
index 1a5292a367..bf429be5b9 100644
--- a/content/docs/user-guide/basic-concepts/stage.md
+++ b/content/docs/user-guide/basic-concepts/stage.md
@@ -6,5 +6,5 @@ tooltip: >-
some milestone as part of your project's workflow. For example, `python
train.py` may generate a machine learning model. DVC stages include data
input(s) and resulting output(s), if any. [Learn
- more](/doc/user-guide/data-pipelines/defining-pipelines#stages).
+ more](/doc/user-guide/pipelines/defining-pipelines#stages).
---
diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md
index 998f1b36bc..c3dd9873c4 100644
--- a/content/docs/user-guide/experiment-management/running-experiments.md
+++ b/content/docs/user-guide/experiment-management/running-experiments.md
@@ -44,7 +44,7 @@ once.
> 📖 `dvc exp run` is an experiment-specific alternative to `dvc repro`.
[reproduction targets]: /doc/command-reference/repro#options
-[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines
+[dependency graph]: /doc/user-guide/pipelines/defining-pipelines
## Tuning (hyper)parameters
diff --git a/content/docs/user-guide/pipelines/index.md b/content/docs/user-guide/pipelines/index.md
index 5a9f96a823..5cfea8de38 100644
--- a/content/docs/user-guide/pipelines/index.md
+++ b/content/docs/user-guide/pipelines/index.md
@@ -16,4 +16,4 @@ consistent to reproduce.
See [Get Started: Data Pipelines](/doc/start/data-management/pipelines) for a
hands-on introduction to this topic.
-[define]: /doc/user-guide/data-pipelines/defining-pipelines
+[define]: /doc/user-guide/pipelines/defining-pipelines
diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md
index 9e766ac0e5..6b2f1bbb43 100644
--- a/content/docs/user-guide/project-structure/dvcyaml-files.md
+++ b/content/docs/user-guide/project-structure/dvcyaml-files.md
@@ -94,6 +94,29 @@ parametrize `cmd` strings.
+
+
+### 💡 Avoiding unexpected behavior
+
+We don't want to tell anyone how to write their code or what programs to use!
+However, please be aware that in order to prevent unexpected results when DVC
+reproduces pipeline stages, the underlying code should ideally follow these
+rules:
+
+- Read/write exclusively from/to the specified dependencies and
+ outputs (including parameters files, metrics, and plots).
+- Completely rewrite outputs. Do not append or edit.
+- Stop reading and writing files when the `command` exits.
+
+Also, if your pipeline reproducibility goals include consistent output data, its
+code should be
+[deterministic](https://en.wikipedia.org/wiki/Deterministic_algorithm) (produce
+the same output for any given input): avoid code that increases
+[entropy](https://en.wikipedia.org/wiki/Software_entropy) (e.g. random numbers,
+time functions, hardware dependencies, etc.).
+
+
+
### Parameters
Parameters are simple key/value pairs consumed by the `command`
diff --git a/content/docs/user-guide/project-structure/internal-files.md b/content/docs/user-guide/project-structure/internal-files.md
index f8b705ada2..f3f2776a83 100644
--- a/content/docs/user-guide/project-structure/internal-files.md
+++ b/content/docs/user-guide/project-structure/internal-files.md
@@ -168,4 +168,4 @@ run-cache to remote storage for sharing and/or as a back up.
> [Avoiding unexpected behavior]).
[avoiding unexpected behavior]:
- /doc/user-guide/data-pipelines/defining-pipelines#avoiding-unexpected-behavior
+ /doc/user-guide/project-structure/dvcyaml-files#avoiding-unexpected-behavior
diff --git a/content/docs/user-guide/related-technologies.md b/content/docs/user-guide/related-technologies.md
index b6d0967004..c633e12d12 100644
--- a/content/docs/user-guide/related-technologies.md
+++ b/content/docs/user-guide/related-technologies.md
@@ -78,7 +78,7 @@ _Luigi_, etc.
- See also our sister project, [CML](https://cml.dev/), that helps fill some of
these gaps.
-[dependency graphs]: /doc/user-guide/data-pipelines/defining-pipelines
+[dependency graphs]: /doc/user-guide/pipelines/defining-pipelines
## Experiment management software
@@ -133,4 +133,4 @@ _Luigi_, etc.
> technical details (Linux).
[directed acyclic graph]:
- /doc/user-guide/data-pipelines/defining-pipelines#directed-acyclic-graph-dag
+ /doc/user-guide/pipelines/defining-pipelines#directed-acyclic-graph-dag
diff --git a/content/docs/user-guide/what-is-dvc.md b/content/docs/user-guide/what-is-dvc.md
index e695443e7a..d005c12168 100644
--- a/content/docs/user-guide/what-is-dvc.md
+++ b/content/docs/user-guide/what-is-dvc.md
@@ -51,7 +51,7 @@ can version experiments, manage large datasets, and make projects reproducible.
[free]: https://github.com/iterative/dvc/blob/master/LICENSE
[vs code extension]: /doc/vs-code-extension
[command line]: /doc/command-reference
-[pipelines]: /doc/user-guide/data-pipelines
+[pipelines]: /doc/user-guide/pipelines
## DVC does not replace Git!