diff --git a/content/docs/install/linux.md b/content/docs/install/linux.md index fe30dcf466..1dcce419e8 100644 --- a/content/docs/install/linux.md +++ b/content/docs/install/linux.md @@ -29,10 +29,11 @@ Note that Python 3.8+ is needed to get the latest version of DVC. $ pip install dvc ``` -Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `[s3]`, -`[gdrive]`, `[gs]`, `[azure]`, `[ssh]`, `[hdfs]`, `[webdav]`, `[oss]`. Use -`[all]` to include them all. +Depending on the type of the [remote storage] you plan to use, you might need to +install optional dependencies: `[s3]`, `[gdrive]`, `[gs]`, `[azure]`, `[ssh]`, +`[hdfs]`, `[webdav]`, `[oss]`. Use `[all]` to include them all. + +[remote storage]: /doc/user-guide/data-management/remote-storage
@@ -65,9 +66,9 @@ $ conda install -c conda-forge mamba # installs much faster than conda $ mamba install -c conda-forge dvc ``` -Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `dvc-s3`, -`dvc-azure`, `dvc-gdrive`, `dvc-gs`, `dvc-oss`, `dvc-ssh`. +Depending on the type of the [remote storage] you plan to use, you might need to +install optional dependencies: `dvc-s3`, `dvc-azure`, `dvc-gdrive`, `dvc-gs`, +`dvc-oss`, `dvc-ssh`.
diff --git a/content/docs/install/macos.md b/content/docs/install/macos.md index d7dcf72fa6..0bed444662 100644 --- a/content/docs/install/macos.md +++ b/content/docs/install/macos.md @@ -59,10 +59,11 @@ Note that Python 3.8+ is needed to get the latest version of DVC. $ pip install dvc ``` -Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `[s3]`, -`[gdrive]`, `[gs]`, `[azure]`, `[ssh]`, `[hdfs]`, `[webdav]`, `[oss]`. Use -`[all]` to include them all. +Depending on the type of the [remote storage] you plan to use, you might need to +install optional dependencies: `[s3]`, `[gdrive]`, `[gs]`, `[azure]`, `[ssh]`, +`[hdfs]`, `[webdav]`, `[oss]`. Use `[all]` to include them all. + +[remote storage]: /doc/user-guide/data-management/remote-storage
@@ -90,9 +91,9 @@ $ conda install -c conda-forge mamba # installs much faster than conda $ mamba install -c conda-forge dvc ``` -Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `dvc-s3`, -`dvc-azure`, `dvc-gdrive`, `dvc-gs`, `dvc-oss`, `dvc-ssh`. +Depending on the type of the [remote storage] you plan to use, you might need to +install optional dependencies: `dvc-s3`, `dvc-azure`, `dvc-gdrive`, `dvc-gs`, +`dvc-oss`, `dvc-ssh`.
diff --git a/content/docs/install/windows.md b/content/docs/install/windows.md index 03c0ed4796..c2bbd0748c 100644 --- a/content/docs/install/windows.md +++ b/content/docs/install/windows.md @@ -42,9 +42,11 @@ $ conda install -c conda-forge mamba # installs much faster than conda $ mamba install -c conda-forge dvc ``` -Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `dvc-s3`, -`dvc-azure`, `dvc-gdrive`, `dvc-gs`, `dvc-oss`, `dvc-ssh`. +Depending on the type of the [remote storage] you plan to use, you might need to +install optional dependencies: `dvc-s3`, `dvc-azure`, `dvc-gdrive`, `dvc-gs`, +`dvc-oss`, `dvc-ssh`. + +[remote storage]: /doc/user-guide/data-management/remote-storage
@@ -81,9 +83,9 @@ Note that Python 3.8+ is needed to get the latest version of DVC. $ pip install dvc ``` -Depending on the type of the [remote storage](/doc/command-reference/remote) you -plan to use, you might need to install optional dependencies: `[s3]`, `[azure]`, -`[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Use `[all]` to include them all. +Depending on the type of the [remote storage] you plan to use, you might need to +install optional dependencies: `[s3]`, `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, +`[ssh]`. Use `[all]` to include them all.
diff --git a/content/docs/start/data-management/data-and-model-access.md b/content/docs/start/data-management/data-and-model-access.md index 05ab67cc15..2509b20391 100644 --- a/content/docs/start/data-management/data-and-model-access.md +++ b/content/docs/start/data-management/data-and-model-access.md @@ -22,10 +22,12 @@ a specific version of a model? Or reuse datasets across different projects? These questions tend to come up when you browse the files that DVC saves to -remote storage (e.g. +[remote storage] (e.g. `s3://dvc-public/remote/get-started/fb/89904ef053f04d64eafcc3d70db673` 😱 instead of the original file name such as `model.pkl` or `data.xml`). +[remote storage]: /doc/user-guide/data-management/remote-storage + Remember those `.dvc` files `dvc add` generates? Those files (and `dvc.lock`, @@ -86,10 +88,15 @@ bring in changes from the data source later using `dvc update`. ### 💡 Expand to see what happens under the hood. -> Note that the -> [dataset registry](https://github.com/iterative/dataset-registry) repository -> doesn't actually contain a `get-started/data.xml` file. Like `dvc get`, -> `dvc import` downloads from [remote storage](/doc/command-reference/remote). + + +The [dataset registry] repository doesn't actually contain a +`get-started/data.xml` file. Like `dvc get`, `dvc import` downloads from [remote +storage]. + +[dataset registry]: https://github.com/iterative/dataset-registry + + `.dvc` files created by `dvc import` have special fields, such as the data source `repo` and `path` (under `deps`): diff --git a/content/docs/start/data-management/data-versioning.md b/content/docs/start/data-management/data-versioning.md index 5867f962f5..687b97496e 100644 --- a/content/docs/start/data-management/data-versioning.md +++ b/content/docs/start/data-management/data-versioning.md @@ -95,9 +95,11 @@ outs: ## Storing and sharing You can upload DVC-tracked data or model files with `dvc push`, so they're -safely stored [remotely](/doc/command-reference/remote). This also means they -can be retrieved on other environments later with `dvc pull`. First, we need to -set up a remote storage location: +safely stored [remotely]. This also means they can be retrieved on other +environments later with `dvc pull`. First, we need to set up a remote storage +location: + +[remotely]: /doc/user-guide/data-management/remote-storage ```cli $ dvc remote add -d storage s3://mybucket/dvcstore @@ -105,9 +107,16 @@ $ git add .dvc/config $ git commit -m "Configure remote storage" ``` -> DVC supports many remote storage types, including Amazon S3, SSH, Google -> Drive, Azure Blob Storage, and HDFS. See `dvc remote add` for more details and -> examples. + + +DVC supports many [remote storage types], including Amazon S3, SSH, Google +Drive, Azure Blob Storage, and HDFS. See `dvc remote add` for more details and +examples. + +[remote storage types]: + /doc/user-guide/data-management/remote-storage#supported-storage-types + +
diff --git a/content/docs/studio/user-guide/projects-and-experiments/configure-a-project.md b/content/docs/studio/user-guide/projects-and-experiments/configure-a-project.md index 503e671d58..e630804483 100644 --- a/content/docs/studio/user-guide/projects-and-experiments/configure-a-project.md +++ b/content/docs/studio/user-guide/projects-and-experiments/configure-a-project.md @@ -54,10 +54,11 @@ you want to visualize in Iterative Studio. ### Data remotes (cloud/remote storage) The metrics and parameters that you want to include in the project may also be -present in a [data remote](/doc/command-reference/remote#description) (cloud -storage or another location outside the Git repo). If you want to include such -data in your projects, then you will have to grant Iterative Studio access to -the data remote. +present in a [data remote] (cloud storage or another location outside the Git +repo). If you want to include such data in your projects, then you will have to +grant Iterative Studio access to the data remote. + +[data remote]: /doc/user-guide/data-management/remote-storage ## Configuring project settings @@ -82,9 +83,8 @@ which you are trying to connect. ### Data remotes / cloud storage credentials -If you need to provide credentials for -[DVC data remotes](/doc/command-reference/remote#description), you will need to -do it after your project has been created. First, create your project without +If you need to provide credentials for a [data remote], you will need to do it +after your project has been created. First, create your project without specifying the data remotes. Once your project is created, open its settings. Open the `Data remotes / cloud storage credentials` section. The data remotes that are used in your DVC repo will be listed. @@ -93,8 +93,8 @@ that are used in your DVC repo will be listed. Now, click on `Add new credentials`. In the form that opens up, select the provider (Amazon S3, GCP, etc.). For details on what types of remote storage -(protocols) are supported, refer to the DVC documentation on -[supported storage types](/doc/command-reference/remote/add#supported-storage-types). +(protocols) are supported, refer to the DVC documentation on [supported storage +types]. Depending on the provider, you will be asked for more details such as the credentials name, username, password etc. Note that for each supported storage @@ -103,17 +103,19 @@ type, the required details may be different. ![](https://static.iterative.ai/img/studio/s3_remote_settings_v2.png) You will also have to ensure that the credentials you enter have the required -permissions on the cloud / remote storage. In the DVC documentation on -[supported storage types](/doc/command-reference/remote/add#supported-storage-types), -expand the section for the storage type you want to add. There, you will find -the details of the permissions that you need to grant to the account -(credentials) that you are configuring on Iterative Studio. +permissions on the cloud / remote storage. Refer to the [DVC Remote config +parameters] for more details about this. Note that Iterative Studio uses the credentials only to read plots/metrics files if they are not saved into Git. It does not access any other data in your remote storage. And you do not need to provide the credentials if any DVC data remote in not used in your Git repository. +[supported storage types]: + /doc/user-guide/data-management/remote-storage#supported-storage-types +[dvc remote config parameters]: + /doc/command-reference/remote/modify#available-parameters-per-storage-type + ### Mandatory columns ##### (Tracking scope) diff --git a/content/docs/studio/user-guide/projects-and-experiments/live-metrics-and-plots.md b/content/docs/studio/user-guide/projects-and-experiments/live-metrics-and-plots.md index e698a7a948..328a576b3d 100644 --- a/content/docs/studio/user-guide/projects-and-experiments/live-metrics-and-plots.md +++ b/content/docs/studio/user-guide/projects-and-experiments/live-metrics-and-plots.md @@ -47,12 +47,11 @@ job: example below). ```yaml - ... - steps: - - name: Train model - env: - STUDIO_TOKEN: ${{ secrets.STUDIO_TOKEN }} - ... + --- + steps: + - name: Train model + env: + STUDIO_TOKEN: ${{ secrets.STUDIO_TOKEN }} ``` 2. `STUDIO_REPO_URL`: If you are running the experiment locally, you do not diff --git a/content/docs/use-cases/ci-cd-for-machine-learning.md b/content/docs/use-cases/ci-cd-for-machine-learning.md index 278c716f99..e598bc32e2 100644 --- a/content/docs/use-cases/ci-cd-for-machine-learning.md +++ b/content/docs/use-cases/ci-cd-for-machine-learning.md @@ -52,11 +52,12 @@ configuration. Here are a few feature highlights: **Models, Data, and Metrics as Code**: DVC removes the need to create versioning databases, use special file/folder structures, or write bespoke interfacing code. Instead, DVC stores meta-information in Git ("codifying" data and ML -models) while pushing the actual data content to -[cloud storage](/doc/command-reference/remote). DVC also provides metrics-driven -navigation in Git repositories -- -[tabulating and plotting](/doc/start/data-management/metrics-parameters-plots) -model metrics changes across commits. +models) while pushing the actual data content to [cloud storage]. DVC also +provides metrics-driven navigation in Git repositories -- [tabulating and +plotting] model metrics changes across commits. + +[cloud storage]: /doc/user-guide/data-management/remote-storage +[tabulating and plotting]: /doc/start/data-management/metrics-parameters-plots **Low friction**: Our sister project CML provides [lightweight machine resource orchestration](https://cml.dev/doc/self-hosted-runners) diff --git a/content/docs/use-cases/data-registry/index.md b/content/docs/use-cases/data-registry/index.md index ae6d737f9d..0dfe986576 100644 --- a/content/docs/use-cases/data-registry/index.md +++ b/content/docs/use-cases/data-registry/index.md @@ -33,7 +33,7 @@ cloud storage. Advantages: [ci/cd for your data and models lifecycle]: /doc/use-cases/ci-cd-for-machine-learning -[remote storage]: /doc/command-reference/remote +[remote storage]: /doc/user-guide/data-management/remote-storage 👩‍💻 Intrigued? Try our [registry tutorial] to learn how DVC looks and feels firsthand. diff --git a/content/docs/use-cases/model-registry.md b/content/docs/use-cases/model-registry.md index 352d4c7c32..46851c92b8 100644 --- a/content/docs/use-cases/model-registry.md +++ b/content/docs/use-cases/model-registry.md @@ -58,7 +58,7 @@ with software engineering methods such as continuous integration (CI/CD), which can sync with the state of the artifacts in your registry. [modeling process]: /doc/start/data-management/data-pipelines -[remote storage]: /doc/command-reference/remote +[remote storage]: /doc/user-guide/data-management/remote-storage [sharing]: /doc/start/data-management/data-and-model-access [via cml]: https://cml.dev/doc/cml-with-dvc [gitops]: https://www.gitops.tech/ diff --git a/content/docs/use-cases/versioning-data-and-models/index.md b/content/docs/use-cases/versioning-data-and-models/index.md index f4a3a18f0d..a6467bc8fd 100644 --- a/content/docs/use-cases/versioning-data-and-models/index.md +++ b/content/docs/use-cases/versioning-data-and-models/index.md @@ -55,17 +55,21 @@ Benefits of our approach include: editing these in source code. - **Efficient data management**: Use a familiar and cost-effective storage - solution for your data and models (e.g. SFTP, S3, HDFS, - [etc.](/doc/command-reference/remote/add#supported-storage-types)) — free from - Git hosting - [constraints](https://docs.github.com/en/free-pro-team@latest/github/managing-large-files/what-is-my-disk-quota). - DVC [optimizes](/doc/user-guide/data-management/large-dataset-optimization) - storing and transferring large files. + solution for your data and models (e.g. SFTP, S3, HDFS, [etc.]) — free from + Git hosting [constraints]. DVC [optimizes] storing and transferring large + files. + + [etc.]: /doc/user-guide/data-management/remote-storage#supported-storage-types + [constraints]: + https://docs.github.com/en/free-pro-team@latest/github/managing-large-files/what-is-my-disk-quota + [optimizes]: /doc/user-guide/data-management/large-dataset-optimization - **Collaboration**: Easily distribute your project development and share its - data [internally](/doc/user-guide/how-to/share-a-dvc-cache) and - [remotely](/doc/command-reference/remote), or - [reuse](/doc/start/data-management/data-and-model-access) it in other places. + data [internally] and [remotely], or [reuse] it in other places. + + [remotely]: /doc/user-guide/data-management/remote-storage + [internally]: /doc/user-guide/how-to/share-a-dvc-cache + [reuse]: /doc/start/data-management/data-and-model-access - **Data compliance**: Review data modification attempts as Git [pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/). diff --git a/content/docs/use-cases/versioning-data-and-models/tutorial.md b/content/docs/use-cases/versioning-data-and-models/tutorial.md index 32100fcbe8..ce8e6ca872 100644 --- a/content/docs/use-cases/versioning-data-and-models/tutorial.md +++ b/content/docs/use-cases/versioning-data-and-models/tutorial.md @@ -86,12 +86,18 @@ $ unzip -q data.zip $ rm -f data.zip ``` -> `dvc get` can download any file or directory tracked in a DVC -> repository (and [stored remotely](/doc/command-reference/remote)). It's -> like `wget`, but for DVC or Git repos. In this case we use our -> [dataset registry](https://github.com/iterative/dataset-registry) repo as the -> data source (refer to [Data Registry](/doc/use-cases/data-registry) for more -> info.) + + +`dvc get` can download any file or directory tracked in a DVC +repository (and stored [remotely]). It's like `wget`, but for DVC or Git +repos. In this case we use our [dataset registry] repo as the data source (refer +to [Data Registry] for more info.) + +[remotely]: /doc/user-guide/data-management/remote-storage +[dataset registry]: https://github.com/iterative/dataset-registry +[data registry]: /doc/use-cases/data-registry + + This command downloads and extracts our raw dataset, consisting of 1000 labeled images for training and 800 labeled images for validation. In total, it's a 43 diff --git a/content/docs/user-guide/data-management/importing-external-data.md b/content/docs/user-guide/data-management/importing-external-data.md index ef0ce358a2..37bf226da3 100644 --- a/content/docs/user-guide/data-management/importing-external-data.md +++ b/content/docs/user-guide/data-management/importing-external-data.md @@ -29,8 +29,13 @@ types/protocols: - HTTP - Local files and directories outside the workspace -> Note that [remote storage](/doc/command-reference/remote) is a different -> feature. + + +[Remote storage] is a different feature. + +[remote storage]: /doc/user-guide/data-management/remote-storage + + ## Examples @@ -151,8 +156,8 @@ be managed independently. This is useful if the connection requires authentication, if multiple dependencies (or stages) reuse the same location, or if the URL is likely to change in the future. -[DVC remotes](/doc/command-reference/remote) can do just this. You may use -`dvc remote add` to define them, and then use a special URL with format +[DVC remotes][remote storage] can do just this. You may use `dvc remote add` to +define them, and then use a special URL with format `remote://{remote_name}/{path}` (remote alias) to define the external dependency. diff --git a/content/docs/user-guide/data-management/managing-external-data.md b/content/docs/user-guide/data-management/managing-external-data.md index 5b25b89fc4..f33918e7d4 100644 --- a/content/docs/user-guide/data-management/managing-external-data.md +++ b/content/docs/user-guide/data-management/managing-external-data.md @@ -8,7 +8,7 @@ [to-cache]: /doc/command-reference/add#example-transfer-to-an-external-cache [to-remote]: /doc/command-reference/add#example-transfer-to-remote-storage -[remote storage]: /doc/command-reference/remote +[remote storage]: /doc/user-guide/data-management/remote-storage There are cases when data is so large, or its processing is organized in such a way, that its impossible to handle it in the local machine disk. For example diff --git a/content/docs/user-guide/data-management/remote-storage.md b/content/docs/user-guide/data-management/remote-storage.md index 02f3150af2..a4457ff5e6 100644 --- a/content/docs/user-guide/data-management/remote-storage.md +++ b/content/docs/user-guide/data-management/remote-storage.md @@ -13,7 +13,7 @@ DVC remotes are similar to [Git remotes], but for cached data. -This is somehow like GitHub or GitLab providing hosting for source code +This is somewhat like GitHub or GitLab providing hosting for source code repositories. However, DVC does not provide or recommend a specific storage service. Instead, it adopts a bring-your-own-platform approach, supporting a wide variety of [storage types](#supported-storage-types). diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index a4d964d2fb..edbaf27cad 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -26,8 +26,9 @@ $ git push origin my-branch If you only need to share code and metadata like parameters and metrics, then pushing to Git is often enough. However, you may also have data, models, etc. that are tracked and cached by DVC. If you need to share these -files, you can push them to [remote storage](/doc/command-reference/remote) -(e.g. Amazon S3 or Google Drive). +files, you can push them to [remote storage] (e.g. Amazon S3 or Google Drive). + +[remote storage]: /doc/user-guide/data-management/remote-storage ```cli $ dvc push diff --git a/content/docs/user-guide/how-to/setup-google-drive-remote.md b/content/docs/user-guide/how-to/setup-google-drive-remote.md index f0f4569d58..13f827250a 100644 --- a/content/docs/user-guide/how-to/setup-google-drive-remote.md +++ b/content/docs/user-guide/how-to/setup-google-drive-remote.md @@ -7,9 +7,9 @@ description: >- # How to Setup a Google Drive DVC Remote -In this guide we explain the existing ways to setup Google Drive -[remote storage](/doc/command-reference/remote) for your DVC -projects, along with the different benefits each one brings. +In this guide we explain the existing ways to setup Google Drive [remote +storage] for your DVC projects, along with the different benefits +each one brings. DVC uses the Google Drive API to synchronize your DVC project data with this type of remote storage, so it's subject to certain usage limits and quotas, @@ -22,6 +22,8 @@ Having your own GC project, it's also possible to [use a service account](#using-service-accounts) for automating tasks that need to establish GDrive remote connections (e.g. CI/CD). +[remote storage]: /doc/user-guide/data-management/remote-storage + ## Quick start To start using a Google Drive remote, you only need to add it with a @@ -181,14 +183,13 @@ automation is needed (e.g. CI/CD) we recommend -On the first usage of a GDrive [remote](/doc/command-reference/remote), for -example when trying to `dvc push` tracked data for the first time, DVC will -prompt you to visit a special Google authentication web page. There you'll need -to sign into a Google account with the needed access to the GDrive -[URL](#url-format) in question. The [auth process] will ask you to grant DVC the -necessary permissions, and produce a verification code needed for DVC to -complete the connection. On success, the necessary credentials will be cached -globally, for example in +On the first usage of a GDrive remote, for example when trying to `dvc push` +tracked data for the first time, DVC will prompt you to visit a special Google +authentication web page. There you'll need to sign into a Google account with +the needed access to the GDrive [URL](#url-format) in question. The [auth +process] will ask you to grant DVC the necessary permissions, and produce a +verification code needed for DVC to complete the connection. On success, the +necessary credentials will be cached globally, for example in `~/Library/Caches/pydrive2fs/{gdrive_client_id}/default.json` for macOS ([see `gdrive_user_credentials_file`]), and used automatically next time DVC needs them. diff --git a/content/docs/user-guide/index.md b/content/docs/user-guide/index.md index 0f0b8a0de9..418d44381c 100644 --- a/content/docs/user-guide/index.md +++ b/content/docs/user-guide/index.md @@ -118,18 +118,22 @@ repo (if one is being used, which is not required). ### Git-LFS (Large File Storage) - DVC does not require special servers like Git-LFS demands. Any cloud storage - like S3, Google Cloud Storage, or even an SSH server can be used as a - [remote storage](/doc/command-reference/remote). No additional databases, - servers, or infrastructure are required. + like S3, Google Cloud Storage, or even an SSH server can be used as a [remote + storage]. No additional databases, servers, or infrastructure are required. - DVC does not add any hooks to the Git repo by default (although they are - [available](/doc/command-reference/install)). + [available]). - Git-LFS was not made with data science in mind, so it doesn't provide related features (e.g. [ML pipelines], [metrics](/doc/command-reference/metrics), etc.). -- GitHub (most common Git hosting service) has a limit of 2 GB per repository. +- GitHub (common Git hosting service) has a limit of 2 GB per repository. + +[remote storage]: /doc/user-guide/data-management/remote-storage +[available]: /doc/command-reference/install +[pipelines]: /doc/command-reference/dag +[metrics]: /doc/command-reference/metrics
@@ -144,9 +148,9 @@ repo (if one is being used, which is not required). workflow for machine learning and reproducible experiments. When a DVC or Git-annex repository is cloned via `git clone`, data files won't be copied to the local machine, as file contents are stored in separate - [remotes](/doc/command-reference/remote). With DVC however, `.dvc` files, - which provide the reproducible workflow, are always included in the Git - repository. Hence, they can be executed locally with minimal effort. + [remotes][remote storage]. With DVC however, `.dvc` files, which provide the + reproducible workflow, are always included in the Git repository. Hence, they + can be executed locally with minimal effort. - DVC optimizes file hash calculation. diff --git a/content/docs/user-guide/privacy.md b/content/docs/user-guide/privacy.md index 89393d4e4e..7aeefaaa90 100644 --- a/content/docs/user-guide/privacy.md +++ b/content/docs/user-guide/privacy.md @@ -69,7 +69,7 @@ caution when using Google Drive DVC remotes on shared machines.** By default, OAuth tokens are cached in a global location (e.g. `~/.cache/pydrive2fs` on Linux, see [details]). -[details]: https://dvc.org/doc/command-reference/remote/modify#google-drive +[details]: /doc/command-reference/remote/modify#google-drive ## Usage in other packages or applications diff --git a/content/docs/user-guide/project-structure/internal-files.md b/content/docs/user-guide/project-structure/internal-files.md index bd06d88b36..c99a32b077 100644 --- a/content/docs/user-guide/project-structure/internal-files.md +++ b/content/docs/user-guide/project-structure/internal-files.md @@ -141,12 +141,15 @@ That's how DVC knows that the other two cached files belong in the directory. `dvc exp run` and `dvc repro` by default populate and reutilize a log of stages that have been run in the project. It is found in the `runs/` directory inside -the cache (or [remote storage](/doc/command-reference/remote)). +the cache (or [remote storage]). Runs are identified as combinations of exact dependency contents -(or [parameter](/doc/command-reference/params) values), and the literal -command(s) to execute. These combinations are represented by special hashes that -translate to the file paths inside the run-cache dir: +(or [parameter] values), and the literal command(s) to execute. These +combinations are represented by special hashes that translate to the file paths +inside the run-cache dir: + +[remote storage]: /doc/user-guide/data-management/remote-storage +[parameter]: /doc/command-reference/params ```cli $ tree .dvc/cache/runs diff --git a/content/docs/user-guide/troubleshooting.md b/content/docs/user-guide/troubleshooting.md index bbcd26c997..559d362336 100644 --- a/content/docs/user-guide/troubleshooting.md +++ b/content/docs/user-guide/troubleshooting.md @@ -12,9 +12,10 @@ custom anchor link is used. Just add {#custom-anchor} after each title: Users may encounter errors when running `dvc pull` and `dvc fetch`, like `WARNING: Cache 'xxxx' not found.` or `ERROR: failed to pull data from the cloud`. The most common cause is changes -pushed to Git without the corresponding data being uploaded to the -[DVC remote](/doc/command-reference/remote). Make sure to `dvc push` from the -original project, and try again. +pushed to Git without the corresponding data being uploaded to the [DVC remote]. +Make sure to `dvc push` from the original project, and try again. + +[dvc remote]: /doc/user-guide/data-management/remote-storage ## Too many open files error {#many-files}