diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index d862c71b22..0257bd31cc 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -23,9 +23,12 @@ Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc. -> `dvc exp run` has the same behavior as `dvc repro` when it comes to `targets` -> and stage execution (restores the dependency graph, etc.). See the command -> [options](#options) for more on the differences. +`dvc exp run` has the same general behavior as `dvc repro` when it comes to +`targets` and stage execution (restores the dependency graph, etc.). + +> This includes committing any changed data dependencies to the +> DVC cache when preparing the experiment, which can take some +> time. See the [Options](#options) section for the differences. Use the `--set-param` (`-S`) option as a shortcut to change parameter values [on-the-fly] before running the experiment. diff --git a/content/docs/command-reference/gc.md b/content/docs/command-reference/gc.md index 269ff1d486..d3a6e5a1d7 100644 --- a/content/docs/command-reference/gc.md +++ b/content/docs/command-reference/gc.md @@ -1,7 +1,6 @@ # gc -Remove unused files and directories from cache or -[remote storage](/doc/command-reference/remote). +Remove unused files and directories from cache or [remote storage]. ## Synopsis @@ -14,8 +13,8 @@ usage: dvc gc [-h] [-q | -v] [-w] [-a] [-T] [--all-commits] ## Description This command can delete (garbage collect) data files or directories that exist -in the cache but are no longer needed. With `--cloud`, it also removes data in -[remote storage](/doc/command-reference/remote). +in the cache but are no longer needed. With `--cloud`, it also +[removes data in remote storage](#removing-data-in-remote-storage). To avoid accidentally deleting data, `dvc gc` doesn't do anything unless one or a combination of scope options are provided (`--workspace`, `--all-branches`, @@ -26,27 +25,27 @@ details. The data kept is determined by reading the DVC files in the set of commits of the given scope. -> Note that `dvc gc` tries to fetch any missing -> [`.dir` files](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) -> from [remote storage](/doc/command-reference/remote) to the local -> cache, in order to determine which files should exist inside -> cached directories. These files may be missing if the cache directory was -> previously garbage collected, or in a newly cloned copy of the repo, etc. +> Note that `dvc gc` tries to fetch missing [`.dir` files] from remote storage +> to local cache in order to determine which files should exist inside cached +> directories. These files may be missing if the cache was previously garbage +> collected, in a newly cloned copy of the repo, etc. -Unless the `--cloud` option is used, `dvc gc` does not remove data files from -any remote. This means that any files collected from the local cache can be -restored using `dvc fetch`, as long as they have previously been uploaded with +Unless the `--cloud` option is used, any files collected from the cache can be +restored using `dvc fetch`, as long as they have been previously uploaded with `dvc push`. +[`.dir` files]: + /doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory + ### Removing data in remote storage -If the `--cloud` option is provided, this command deletes unused data from the +If the `--cloud` (`-c`) flag is used, this command deletes unused data from the [default remote storage](/doc/command-reference/remote/default) **in addition** to deleting it from the local DVC cache. To specify a DVC remote to delete from, -use `--remote` as well. +use the `--remote` (`-r`) option. -> ⚠️ This is dangerous -- cloud/remote data deletion is irreversible unless -> there is another DVC remote or a manual backup. +> ⚠️ Danger: cloud deletion is irreversible unless there is another DVC remote +> or a manual backup with the same data. ## Options @@ -74,14 +73,14 @@ use `--remote` as well. that is never referenced from the workspace or from any Git commit can still be stored in the project's cache). - > \* Not including [DVC experiments]( + > \* Not including [DVC experiments] [dvc experiments]: /doc/user-guide/experiment-management#experiments - `--all-experiments` keep cached objects referenced in all [DVC experiments], as well as in the workspace (implying `-w`). This preserves the project's [experimental](/doc/user-guide/experiment-management) data (including - checkpoints). + checkpoints). See also `dvc exp gc`. - `-p `, `--projects ` - if a single remote or a single [cache is shared](/doc/user-guide/how-to/share-a-dvc-cache) among different diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 276a0f97f1..262a0b509f 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -11,8 +11,8 @@ details. ## Pipelines files DVC relies on `dvc.yaml` files that contain the commands to run the -experiment(s). These files codify _pipelines_ that specify the -stages of experiment workflows (code, dependencies, +experiment(s). These files codify _pipelines_ that specify one or more +stages of the experiment workflow (code, dependencies, outputs, etc.). > 📖 See [Get Started: Data Pipelines](/doc/start/data-pipelines) for an intro @@ -20,8 +20,8 @@ experiment(s). These files codify _pipelines_ that specify the ### Running the pipeline(s) -You can run the pipeline using `dvc exp run`. It uses `./dvc.yaml` (in the -current directory) by default: +You can run the experimental pipeline using `dvc exp run`. It uses `./dvc.yaml` +(in the current directory) by default. ```dvc $ dvc exp run @@ -29,7 +29,11 @@ $ dvc exp run Reproduced experiment(s): exp-44136 ``` -DVC keeps track of the [dependency graph] among stages. It only runs the ones +> ⚠️ Note that any changed dependencies are committed to the DVC cache when +> preparing the experiment, which can take some time. `dvc exp gc` can clean up +> unnecessary ones. + +DVC observes the [dependency graph] between stages, so it only runs the ones with changed dependencies or outputs missing from the cache. You can limit this to certain [reproduction targets] or even single stages (`--single-item` flag).