diff --git a/content/docs/command-reference/config.md b/content/docs/command-reference/config.md index c38fecfbfd..9658d2df6e 100644 --- a/content/docs/command-reference/config.md +++ b/content/docs/command-reference/config.md @@ -63,7 +63,7 @@ multiple projects or users, respectively. > Note that the `--show-origin` flag can show you where a given config option > `value` is currently stored. -## Command options (flags) +## Command options/flags - `-u`, `--unset` - remove the specified config option `name` from a config file. Don't provide a `value` argument when employing this flag. diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 57cf9d281b..b0ea975ed3 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -11,12 +11,11 @@ etc.), and download it to the local project, or make a copy in ```usage usage: dvc import-url [-h] [-q | -v] [--file ] - [--to-remote] [-r ] - [--no-exec | --no-download] - [-j ] [--version-aware] - [--desc ] [--type ] - [--label ] [--meta key=value] - url [out] + [--to-remote] [-r ] + [--no-exec | --no-download] [-j ] + [--desc ] [--type ] [--label ] + [--meta key=value] [--version-aware] + url [out] positional arguments: url (See supported URLs in the description.) @@ -109,15 +108,22 @@ DVC supports several types of external locations (protocols): [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) is necessary to track if the specified URL changed. -DVC also supports capturing -[cloud versioning](/doc/user-guide/data-management/cloud-versioning) information -when importing data from certain cloud storage providers. When the -`--version-aware` option is provided or when the `url` argument includes a -supported cloud versioning ID, DVC will import the specified version of the -given data. When using versioned storage, DVC will always -[pull](/doc/command-reference/pull) the versioned data from its original source -location. Versioned data will also not be [pushed](/doc/command-reference/push) -to remote storage. +DVC also supports capturing [cloud versioning] information from certain cloud +storage providers. When the `--version-aware` option is provided or when the +`url` argument includes a supported cloud versioning ID, DVC will import the +specified version. + +[cloud versioning]: /doc/user-guide/data-management/cloud-versioning + + + +When using versioned storage, DVC will always [pull] the versioned data from +source. This will not [push] an additional version to remote storage. + +[pull]: https://dvc.org/doc/command-reference/pull +[push]: https://dvc.org/doc/command-reference/push + + | Type | Description | Versioned `url` format example | | ------- | ---------------------------- | ------------------------------------------------------ | @@ -200,6 +206,11 @@ produces a regular stage in `dvc.yaml`. - `--meta key=value` - custom metadata to add to the data. +- `--version-aware` - capture [cloud versioning] information (supported for + certain cloud storage providers). By default, DVC will automatically do so + only if the `url` contains a valid cloud versioning ID. Otherwsie, with this + flat DVC will import the latest version of the file. + - `-h`, `--help` - prints the usage/help message, and exit. - `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index 5028c898f4..1c9e1a9ebf 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -1,11 +1,15 @@ # remote add -Register a new [DVC remote](/doc/user-guide/data-management/remote-storage). +Add a new `dvc remote` to the project configuration. -Depending on your storage type, you may also need `dvc remote modify` to provide -credentials and/or configure other remote parameters. +You may also need `dvc remote modify` to provide credentials and/or configure +other remote parameters. See [Remote storage configuration] for more +information. + +[remote storage configuration]: + /doc/user-guide/data-management/remote-storage#configuration @@ -23,41 +27,68 @@ positional arguments: ## Description -This command creates a `remote` section in the DVC project's -[config file](/doc/command-reference/config) and optionally assigns a _default -remote_ in the `core` section, if the `--default` option is used (recommended -for the first remote): +Registers an [additional storage] location to save data files (besides the +cache) and optionally sets it as the `--default` remote. DVC +remotes can point to a cloud storage service, an SSH server, network-attached +storage, or even a directory in the local file system. + +[additional storage]: /doc/user-guide/data-management/remote-storage + + + +A [default remote] is expected by `dvc push`, `dvc pull`, `dvc status`, +`dvc gc`, and `dvc fetch` unless their `--remote` option is used. + +[default remote]: /doc/command-reference/remote/default + + + +The remote `name` (required) is used to identify the remote and must be unique. +DVC will determine the [type of remote](#supported-storage-types) based on the +provided `url` (also required), a URL or path for the location. + + + +The storage type determines which config parameters you can access via +`dvc remote modify`. Note that the `url` itself can be modified. + + + +This command creates a [`remote`] section in the project's [config file] +(`.dvc/config`). The `--default` (`-d`) flag uses the [`core`] config section: + +```cli +$ dvc remote add -d temp /tmp/dvcstore +``` ```ini -['remote "myremote"'] +# .dvc/config +['remote "temp"'] url = /tmp/dvcstore [core] remote = myremote ``` -> 💡 Default remotes are expected by commands that accept a `-r`/`--remote` -> option (`dvc pull`, `dvc push`, `dvc status`, `dvc gc`, `dvc fetch`) when that -> option is omitted. +[config file]: /doc/command-reference/config +[`remote`]: /doc/command-reference/config#remote +[`core`]: /doc/command-reference/config#core -`name` and `url` are required. The `name` is used to identify the remote and -must be unique for the project. + -`url` specifies a location to store your data. It can represent a cloud storage -service, an SSH server, network-attached storage, or even a directory in the -local file system (see all the supported remote storage types in the examples -below). +If you [installed DVC] via `pip` and plan to use cloud services as remote +storage, you might need to install these optional dependencies: `[s3]`, +`[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Use `[all]` to include them +all. For example: -DVC will determine the [type of remote](#supported-storage-types) based on the -`url` provided. This may affect which parameters you can access later via -`dvc remote modify` (note that the `url` itself can be modified). +```cli +$ pip install "dvc[s3]" +``` + +[installed dvc]: /doc/install -> If you installed DVC via `pip` and plan to use cloud services as remote -> storage, you might need to install these optional dependencies: `[s3]`, -> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to -> include them all. The command should look like this: `pip install "dvc[s3]"`. -> (This example installs `boto3` library along with DVC to support S3 storage.) + -## Options +## Command options/flags - `--system` - save remote configuration to the system config file (e.g. `/etc/xdg/dvc/config`) instead of `.dvc/config`. diff --git a/content/docs/command-reference/remote/default.md b/content/docs/command-reference/remote/default.md index 7f390a5ec6..9cee66b42c 100644 --- a/content/docs/command-reference/remote/default.md +++ b/content/docs/command-reference/remote/default.md @@ -1,7 +1,6 @@ # remote default -Set/unset the default -[remote storage](/doc/user-guide/data-management/remote-storage). +Set/unset the default `dvc remote`. ## Synopsis diff --git a/content/docs/command-reference/remote/index.md b/content/docs/command-reference/remote/index.md index 8e0dd78aaf..61ed9230b0 100644 --- a/content/docs/command-reference/remote/index.md +++ b/content/docs/command-reference/remote/index.md @@ -26,49 +26,25 @@ positional arguments: ## Description -What is data remote? +DVC remotes are distributed storage locations for your data sets and ML models +(similar to Git remotes, but for cached assets). This optional +feature is typically used to share or back up copies of all or some of your +data. Several types are supported: Amazon S3, Google Drive, SSH, HTTP, local +file systems, [among others]. -The same way as GitHub provides storage hosting for Git repositories, DVC -remotes provide a location to store and share data and models. You can pull data -assets created by colleagues from DVC remotes without spending time and -resources to build or process them locally. Remote storage can also save space -on your local environment – DVC can [fetch](/doc/command-reference/fetch) into -the cache directory only the data you need for a specific -branch/commit. +[among others]: + /doc/user-guide/data-management/remote-storage#supported-storage-types -Using DVC with remote storage is optional. DVC commands use the local cache -(usually in dir `.dvc/cache`) as data storage by default. This enables the main -DVC usage scenarios out of the box. + -DVC supports several types of remote storage: local file system, SSH, Amazon S3, -Google Cloud Storage, HTTP, HDFS, among others. Refer to `dvc remote add` for -more details. - - - -If you installed DVC via `pip` and plan to use cloud services as remote storage, -you might need to install these optional dependencies: `[s3]`, `[azure]`, -`[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to include them -all. The command should look like this: `pip install "dvc[s3]"`. (This example -installs `boto3` library along with DVC to support S3 storage.) - - - -### Managing remote storage - - - -For an intro on DVC remote usage see [Storing and sharing data]. - -[storing and sharing data]: - /doc/start/data-management/data-versioning#storing-and-sharing +Learn more about [remote storage]. -`dvc remote` subcommands read or modify DVC [config files], where DVC remotes -are set up. Alternatively, `dvc config` can be used, or the config files can be -edited manually. +`dvc remote` subcommands read or modify DVC [config files] (`.dvc/config` by +default). Alternatively, the config files can be edited manually. +[types of storage]: /doc/command-reference/remote/add#supported-storage-types [config files]: /doc/command-reference/config ## Options @@ -88,8 +64,8 @@ edited manually. While the term may seem contradictory, it doesn't have to be. The "local" part refers to the type of location where the storage is: another directory in the -same file system. "Remote" is how we call storage for DVC projects. -It's essentially a local backup for data tracked by DVC. +same file system. "Remote" is what we call storage for DVC +projects. It's essentially a local backup for data tracked by DVC. diff --git a/content/docs/command-reference/remote/list.md b/content/docs/command-reference/remote/list.md index cb478cdfb9..56be0a920a 100644 --- a/content/docs/command-reference/remote/list.md +++ b/content/docs/command-reference/remote/list.md @@ -1,7 +1,6 @@ # remote list -List all available -[DVC remotes](/doc/user-guide/data-management/remote-storage). +List all `dvc remote` names and locations. ## Synopsis @@ -12,9 +11,11 @@ usage: dvc remote list [-h] [--global | --system | --project | --local] ## Description -Reads DVC configuration files and prints the list of available remotes, -including names and URLs. Remotes are read from the system, global, project, and -local config files (in that order). +Reads [DVC configuration] and prints the list of available remotes, including +their names and URLs/paths. Remotes are read from the system, global, project, +and local config files (in that order). + +[dvc configuration]: /doc/command-reference/config#remote ## Options diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 07ac6ff50c..ce6db3182c 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -1,12 +1,14 @@ # remote modify -Configure a [DVC remote](/doc/user-guide/data-management/remote-storage). +Configure an existing `dvc remote`. -This command is commonly needed after `dvc remote add` or `dvc remote default` -to set up credentials or for other customizations specific to the -[storage type](#available-parameters-per-storage-type). +This command is commonly needed after `dvc remote add` to set up credentials or +other customizations. See [Remote storage configuration] for more information. + +[remote storage configuration]: + /doc/user-guide/data-management/remote-storage#configuration @@ -25,16 +27,43 @@ positional arguments: ## Description -Remote `name` and `option` name are required. Config option names are specific -to the remote type. See `dvc remote add` and -[Available parameters](#available-parameters-per-storage-type) below for a list -of remote storage types. +The DVC remote's `name` and a valid `option` to modify are required. Remote +options or [config parameters](#available-parameters-per-storage-type) are +specific to the storage type and typically require a `value` as well. + +This command updates a [`remote`] section in the [config file] (`.dvc/config`): + +```cli +$ dvc remote modify temp url /mnt/c/tmp/dvcstore +``` + +```git +# .dvc/config +['remote "temp"'] +- url = /tmp/dvcstore ++ url = /mnt/c/tmp/dvcstore +``` + + + +If you [installed DVC] via `pip` and plan to use cloud services as remote +storage, you might need to install these optional dependencies: `[s3]`, +`[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Use `[all]` to include them +all. For example: + +```cli +$ pip install "dvc[s3]" +``` + +[installed dvc]: /doc/install + + -This command modifies a `remote` section in the project's -[config file](/doc/command-reference/config). Alternatively, `dvc config` or -manual editing could be used to change the configuration. +[config file]: /doc/command-reference/config +[`remote`]: /doc/command-reference/config#remote +[`core`]: /doc/command-reference/config#core -## Command options (flags) +## Command options/flags - `-u`, `--unset` - remove the configuration `option` from a config file. Don't provide a `value` argument when employing this flag. diff --git a/content/docs/command-reference/remote/remove.md b/content/docs/command-reference/remote/remove.md index aaddb4c0a4..500a892ef7 100644 --- a/content/docs/command-reference/remote/remove.md +++ b/content/docs/command-reference/remote/remove.md @@ -1,6 +1,6 @@ # remote remove -Remove a [DVC remote](/doc/user-guide/data-management/remote-storage). +Remove a `dvc remote`. diff --git a/content/docs/command-reference/remote/rename.md b/content/docs/command-reference/remote/rename.md index 4c470a185d..7aee86050e 100644 --- a/content/docs/command-reference/remote/rename.md +++ b/content/docs/command-reference/remote/rename.md @@ -1,6 +1,6 @@ # remote rename -Rename a [DVC remote](/doc/user-guide/data-management/remote-storage). +Rename a `dvc remote`. diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md index 77976cdcbe..09177278cd 100644 --- a/content/docs/command-reference/update.md +++ b/content/docs/command-reference/update.md @@ -1,8 +1,9 @@ # update Update files or directories imported from external DVC repositories -or [URLs](/doc/command-reference/import-url#description), and the corresponding -import `.dvc` files. +or [URLs], and the corresponding import `.dvc` files. + +[urls]: /doc/command-reference/import-url ## Synopsis @@ -40,18 +41,17 @@ $ dvc update --rev master ## Options -- `--rev ` - commit hash, branch or tag name, etc. (any - [Git revision](https://git-scm.com/docs/revisions)) of the repository to - update the file or directory from. The latest commit in `master` (tip of the - default branch) is used by default when this option is not specified. +- `--rev ` - commit hash, branch or tag name, etc. (any [Git revision]) + of the repository to update the file or directory from. The latest commit in + `master` (tip of the default branch) is used by default. + + For data obtained with `dvc import-url --version-aware`, this option can be + used to specify an object version ID. By default, the current version from + cloud storage will be used. - > Note that this changes the `rev` field in the import stage, fixing it to the - > revision. + Changes the `rev` field in the import `.dvc` files. - For stages created with `dvc import-url` and a - [cloud-versioned URL](/doc/command-reference/import-url#--version-aware), - `--rev` can be used to specify a object version ID to use. By default, the - import will be updated to the current version from cloud storage. + [git revision]: https://git-scm.com/docs/revisions - `-R`, `--recursive` - determines the files to update by searching each target directory and its subdirectories for import `.dvc` files to inspect. If there @@ -60,13 +60,11 @@ $ dvc update --rev master - `--no-download` - Update data checksums in the `.dvc` file (`md5`, `etag`, or `checksum` fields) without actually downloading the latest data. See `dvc import-url --no-download` or `dvc import --no-download` for more context. - Cannot be combined with `--to-remote`. + Cannot be used with `--to-remote`. - `--to-remote` - update a `.dvc` file created with `dvc import-url` and - [transfer](/doc/command-reference/import-url#example-transfer-to-remote-storage) - the data directly to remote storage (the default one unless one is specified - with -r) without saving it locally. Use - [dvc pull](https://dvc.org/doc/command-reference/pull) to get the data + [transfer] the data directly to remote storage (the default one unless one is + specified with -r) without saving it locally. Use `dvc pull` to get the data locally. - `-r `, `--remote ` - name of the @@ -84,6 +82,8 @@ $ dvc update --rev master - `-v`, `--verbose` - displays detailed tracing information. +[transfer]: /doc/command-reference/import-url#example-transfer-to-remote-storage + ## Example Let's first import a data artifact from our diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index a1b60a2c6c..8278ea3b3d 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -82,6 +82,6 @@ your DVC repository. ## Importing versioned data -DVC supports importing cloud versioned data from supported storage providers. -Refer to the documentation for `dvc import-url` and `dvc update` for more +DVC supports importing cloud-versioned data from supported storage providers. +Refer to `dvc import-url` (`--version-aware`) and `dvc update --rev` for more information. diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index 998fcdde16..9b712e5c20 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -62,39 +62,45 @@ Comments can be entered using the `# comment` format. The following subfields may be present under `outs` entries: -| Field | Description | -| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `path` | (Required) Path to the file or directory (relative to `wdir` which defaults to the file's location) | -| `md5`
`etag`
`checksum` | Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) for HTTP, S3, or Azure [external outputs](/doc/user-guide/data-management/managing-external-data); and a special _checksum_ for HDFS and WebHDFS. | -| `version_id` | Version ID native to the cloud provider. Used to track each file in the cloud if [cloud versioning](/doc/user-guide/data-management/cloud-versioning) is enabled. | -| `size` | Size of the file or directory (sum of all files) | -| `nfiles` | If this output is a directory, the number of files inside (recursive). | -| `isexec` | Whether this is an executable file. DVC preserves execute permissions upon `dvc checkout` and `dvc pull`. This has no effect on directories, or in general on Windows. | -| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | -| `remote` | Name of the remote to use for pushing/fetching | -| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts) | -| `desc` | User description for this output (supported in metrics and plots too). This doesn't affect any DVC operations. | -| `type` | User-assigned type of the data. | -| `labels` | User-assigned labels to add to the data. | -| `meta` | Custom metadata about the data. | -| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | +| Field | Description | +| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `path` | (Required) Path to the file or directory (relative to `wdir` which defaults to the file's location) | +| `md5`
`etag`
`checksum` | Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); [ETag] for HTTP, S3, or Azure [external outputs]; and a special _checksum_ for HDFS and WebHDFS. | +| `version_id` | Version ID native to the cloud provider. Used to track each file in the cloud if [cloud versioning] is enabled. | +| `size` | Size of the file or directory (sum of all files) | +| `nfiles` | If this output is a directory, the number of files inside (recursive). | +| `isexec` | Whether this is an executable file. DVC preserves execute permissions upon `dvc checkout` and `dvc pull`. This has no effect on directories, or in general on Windows. | +| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | +| `remote` | Name of the remote to use for pushing/fetching | +| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts) | +| `desc` | User description for this output (supported in metrics and plots too). This doesn't affect any DVC operations. | +| `type` | User-assigned type of the data. | +| `labels` | User-assigned labels to add to the data. | +| `meta` | Custom metadata about the data. | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | + +[etag]: https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation +[external outputs]: /doc/user-guide/data-management/managing-external-data +[cloud versioning]: /doc/user-guide/data-management/cloud-versioning ## Dependency entries The following subfields may be present under `deps` entries: -| Field | Description | -| ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `path` | (Required) Path to the dependency (relative to `wdir`, which defaults to the file's location) | -| `md5`
`etag`
`checksum` | Only in external dependencies created with `dvc import-url`: Hash value of the imported file or directory. MD5 is used for local paths and SSH; [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) for HTTP, S3, GCS, and Azure; and a special _checksum_ for HDFS and WebHDFS. | -| `size` | Size of the file or directory (sum of all files). | -| `nfiles` | If this dependency is a directory, the number of files inside (recursive). | -| `repo` | Only in external dependencies created with `dvc import`: It can contain `url`, `rev`, and `rev_lock` (detailed below). | +| Field | Description | +| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `path` | (Required) Path to the dependency (relative to `wdir`, which defaults to the file's location) | +| `md5`
`etag`
`checksum` | Only in external dependencies created with `dvc import-url`: Hash value of the imported file or directory. MD5 is used for local paths and SSH; [ETag] for HTTP, S3, GCS, and Azure; and a special _checksum_ for HDFS and WebHDFS. | +| `size` | Size of the file or directory (sum of all files). | +| `nfiles` | If this dependency is a directory, the number of files inside (recursive). | +| `repo` | Only in external dependencies created with `dvc import`: It can contain `url`, `rev`, and `rev_lock` (detailed below). | ### Dependency `repo` subfields: -| Field | Description | -| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `url` | URL of Git repository with source DVC project | -| `rev` | Only when `dvc import --rev` is used: Specific commit hash, branch or tag name, etc. (a [Git revision](https://git-scm.com/docs/revisions)) used to import the dependency from. | -| `rev_lock` | Git commit hash of the external DVC repository at the time of importing or updating the dependency (with `dvc update`) | +| Field | Description | +| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `url` | URL of Git repository with source DVC project | +| `rev` | Only when `dvc import --rev` is used: Specific commit hash, branch or tag name, etc. (a [Git revision]) used to import the dependency from. | +| `rev_lock` | Git commit hash of the external DVC repository at the time of importing or updating the dependency (with `dvc update`) | + +[git revision]: https://git-scm.com/docs/revisions