diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md
index 6719f202c7..58564f7276 100644
--- a/content/docs/command-reference/import-url.md
+++ b/content/docs/command-reference/import-url.md
@@ -1,8 +1,8 @@
# import-url
Download a file or directory from a supported URL (for example `s3://`,
-`ssh://`, and other protocols) into the workspace, and track
-changes in the remote data source. Creates a `.dvc` file.
+`ssh://`, and other protocols) into the workspace, and track it (an
+import `.dvc` file is created).
> See `dvc import` to download and tack data/model files or directories from
> other DVC repositories (e.g. hosted on GitHub).
@@ -21,39 +21,45 @@ positional arguments:
## Description
-In some cases it's convenient to add a data file or directory from a remote
+In some cases it's convenient to add a data file or directory from an external
location into the workspace, such that it can be updated later, if/when the
external data source changes. Example scenarios:
- A remote system may produce occasional data files that are used in other
projects.
- A batch process running regularly updates a data file to import.
-- A shared dataset on a remote storage that is managed and updated outside DVC.
+- A shared dataset on cloud storage that is managed and updated outside DVC.
> Note that `dvc get-url` corresponds to the first step this command performs
> (just download the file or directory).
-The `dvc import-url` command helps the user create such an external data
-dependency without having to manually copying files from the supported remote
-locations (listed below), which may require installing a different tool for each
-type.
+`dvc import-url` helps you create such an external data dependency, without
+having to manually copy files from the supported locations (listed below), which
+may require installing a different tool for each type.
-The `url` argument specifies the external location of the data to be imported,
-while `out` can be used to specify the directory and/or file name desired for
-the downloaded data. If an existing directory is specified, the file or
-directory will be placed inside.
+The `url` argument specifies the external location of the data to be imported.
+The imported data is cached, and linked (or copied) to the current
+working directory with its original file name e.g. `data.txt` (or to a location
+provided with `out`).
+
+An _import `.dvc` file_ is created in the same location e.g. `data.txt.dvc` –
+similar to using `dvc add` after downloading the data. This makes it possible to
+update the import later, if the data source has changed (see `dvc update`).
+
+> Note that the imported data can be [pushed](/doc/command-reference/push) to
+> remote storage normally.
`.dvc` files support references to data in an external location, see
[External Dependencies](/doc/user-guide/external-dependencies). In such an
-import `.dvc` file, the `deps` field stores the remote URL, and the `outs` field
-contains the corresponding local path in the workspace. It records
-enough metadata about the imported data to enable DVC efficiently determining
-whether the local copy is out of date.
+import `.dvc` file, the `deps` field stores the external URL, and the `outs`
+field contains the corresponding local path in the workspace. It
+records enough metadata about the imported data to enable DVC efficiently
+determining whether the local copy is out of date.
Note that `dvc repro` doesn't check or update import `.dvc` files, use
`dvc update` to bring the import up to date from the data source.
-DVC supports several types of (local or) remote locations (protocols):
+DVC supports several types of external locations (protocols):
| Type | Description | `url` format example |
| --------- | ---------------------------- | --------------------------------------------- |
@@ -82,8 +88,7 @@ DVC supports several types of (local or) remote locations (protocols):
- In case of HTTP,
[ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) is
- necessary to track if the specified remote file (URL) changed to download it
- again.
+ necessary to track if the specified URL changed.
- `remote://myremote/path/to/file` notation just means that a DVC
[remote](/doc/command-reference/remote) `myremote` is defined and when DVC is
@@ -110,12 +115,8 @@ $ dvc run -n download_data \
wget https://data.dvc.org/get-started/data.xml -O data.xml
```
-`dvc import-url` generates an _import stage_ `.dvc` file and `dvc run` a regular
-stage (in `dvc.yaml`).
-
-⚠️ DVC won't push or pull imported data to/from
-[remote storage](/doc/command-reference/remote), it will rely on it's original
-source.
+`dvc import-url` generates an _import `.dvc` file_ and `dvc run` a regular stage
+(in `dvc.yaml`).
## Options
@@ -163,7 +164,7 @@ $ git checkout 3-config-remote
-## Example: Tracking a remote file
+## Example: Tracking a file from the web
An advanced alternate to the intro of the
[Versioning Basics](/doc/tutorials/get-started/data-versioning) part of the _Get
@@ -195,18 +196,18 @@ Let's take a look at the changes to the `data.xml.dvc`:
The `etag` field in the `.dvc` file contains the
[ETag](https://en.wikipedia.org/wiki/HTTP_ETag) recorded from the HTTP request.
-If the remote file changes, its ETag will be different. This metadata allows DVC
-to determine whether it's necessary to download it again.
+If the imported file changes online, its ETag will be different. This metadata
+allows DVC to determine whether it's necessary to download it again.
> See `.dvc` files for more details on the format above.
You may want to get out of and remove the `example-get-started/` directory after
trying this example (especially if trying out the following one).
-## Example: Detecting remote file changes
+## Example: Detecting external file changes
-What if that remote file is updated regularly? The project goals might include
-regenerating some results based on the updated data source.
+What if an imported file is updated regularly at it's source? The project goals
+might include regenerating some results based on the updated data source.
[Pipeline](/doc/command-reference/dag) reproduction can be triggered based on a
changed external dependency.
@@ -214,9 +215,9 @@ Let's use the [Get Started](/doc/tutorials/get-started) project again,
simulating an updated external data source. (Remember to prepare the
workspace, as explained in [Examples](#examples))
-To illustrate this scenario, let's use a local file system directory (external
-to the workspace) to simulate a remote data source location. (In real life, the
-data file will probably be on a remote server.) Run these commands:
+To illustrate this scenario, let's use a local file system directory external to
+the workspace (in real life, the data file could be on a remote server instead).
+Run these commands:
```dvc
$ mkdir /tmp/dvc-import-url-example
@@ -319,15 +320,15 @@ Data and pipelines are up to date.
In the data store directory, edit `data.xml`. It doesn't matter what you change,
as long as it remains a valid XML file, because any change will result in a
-different dependency file hash (`md5`) in the import stage `.dvc` file. Once we
-do so, we can run `dvc update` to make sure the import is up to date:
+different dependency file hash (`md5`) in the import `.dvc` file. Once we do so,
+we can run `dvc update` to make sure the import is up to date:
```dvc
$ dvc update data.xml.dvc
Importing '.../tmp/dvc-import-url-example/data.xml' -> 'data/data.xml'
```
-DVC notices the "external" data source has changed, and updates the import stage
+DVC notices the external data source has changed, and updates the `.dvc` file
(reproduces it). In this case it's also necessary to run `dvc repro` so that the
remaining pipeline results are also regenerated:
diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md
index 7d687ff3f0..525cbd9a6f 100644
--- a/content/docs/command-reference/import.md
+++ b/content/docs/command-reference/import.md
@@ -1,9 +1,7 @@
# import
-Download a file or directory tracked by DVC or by Git into the
-workspace. It also creates a `.dvc` file with information about the
-data source, which can later be used to [update](/doc/command-reference/update)
-the import.
+Download a file or directory tracked by another DVC or Git repository into the
+workspace, and track it (an import `.dvc` file is created).
> See also our `dvc.api.open()` Python API function.
@@ -25,9 +23,9 @@ positional arguments:
Provides an easy way to reuse files or directories tracked in any DVC
repository (e.g. datasets, intermediate results, ML models) or Git
repository (e.g. code, small image/other files). `dvc import` downloads the
-target file or directory (found at `path` in `url`) into the workspace and
-tracks it in the project. This makes it possible to update the import later, if
-it has changed in its data source (see `dvc update`).
+target file or directory (found at `path` in `url`), and tracks it in the local
+project. This makes it possible to update the import later, if the data source
+has changed (see `dvc update`).
> Note that `dvc get` corresponds to the first step this command performs (just
> download the data).
@@ -35,6 +33,15 @@ it has changed in its data source (see `dvc update`).
> See `dvc list` for a way to browse repository contents to find files or
> directories to import.
+The imported data is cached, and linked (or copied) to the current
+working directory with its original file name e.g. `data.txt` (or to a location
+provided with `--out`). An _import `.dvc` file_ is created in the same location
+e.g. `data.txt.dvc` – similar to using `dvc add` after downloading the data.
+
+⚠️ DVC won't push or pull data imported from other DVC repos to/from
+[remote storage](/doc/command-reference/remote). It will rely on it's original
+source.
+
The `url` argument specifies the address of the DVC or Git repository containing
the data source. Both HTTP and SSH protocols are supported (e.g.
`[user@]server:project.git`). `url` can also be a local file system path
@@ -46,33 +53,22 @@ tracked by either Git or DVC (including paths inside tracked directories). Note
that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the
repo.
-⚠️ DVC repos should have a default [DVC remote](/doc/command-reference/remote)
-containing the target actual for this command to work. The only exception is for
-local repos, where DVC will try to copy the data from its cache
-first.
+⚠️ Source DVC repos should have a default
+[DVC remote](/doc/command-reference/remote) containing the target data for this
+command to work. The only exception is for local repos, where DVC will try to
+copy the data from its cache first.
> See `dvc import-url` to download and track data from other supported locations
> such as S3, SSH, HTTP, etc.
-After running this command successfully, the imported data is placed in the
-current working directory (unless `-o` is used) with its original file name e.g.
-`data.txt`. An _import stage_ (`.dvc` file) is also created in the same
-location, extending the name of the imported data e.g. `data.txt.dvc` – similar
-to having used `dvc run` to generate the data as a stage output.
-
`.dvc` files support references to data in an external DVC repository (hosted on
-a Git server). In such a `.dvc` file, the `deps` field specifies the remote
-`url` and data `path`, and the `outs` field contains the corresponding local
-path in the workspace. It records enough metadata about the
-imported data to enable DVC efficiently determining whether the local copy is
-out of date.
-
-⚠️ DVC won't push or pull imported data to/from
-[remote storage](/doc/command-reference/remote), it will rely on it's original
-source.
+a Git server). In such a `.dvc` file, the `deps` field specifies the `url` and
+data `path`, and the `outs` field contains the corresponding local path in the
+workspace. It records enough metadata about the imported data to
+enable DVC efficiently determining whether the local copy is out of date.
To actually [version the data](/doc/tutorials/get-started/data-versioning),
-`git add` (and `git commit`) the import stage.
+`git add` (and `git commit`) the import `.dvc` file.
Note that `dvc repro` doesn't check or update import `.dvc` files (see
`dvc freeze`), use `dvc update` to bring the import up to date from the data
@@ -98,8 +94,8 @@ repo at `url`) are not supported.
download the file or directory from. The latest commit in `master` (tip of the
default branch) is used by default when this option is not specified.
- > Note that this adds a `rev` field in the import stage that fixes it to the
- > revision. This can impact the behavior of `dvc update` (see the
+ > Note that this adds a `rev` field in the import `.dvc` file that fixes it to
+ > the revision. This can impact the behavior of `dvc update` (see the
> [Importing and updating fixed revisions](#example-importing-and-updating-fixed-revisions)
> example below).
@@ -140,8 +136,8 @@ Importing 'data/data.xml (git@github.com:iterative/example-get-started)'
```
In contrast with `dvc get`, this command doesn't just download the data file,
-but it also creates an import stage (`.dvc` file) with a link to the data source
-(as explained in the description above). (This `.dvc` file can later be used to
+but it also creates an import `.dvc` file with a link to the data source (as
+explained in the description above). (This `.dvc` file can later be used to
[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`:
```yaml
@@ -176,8 +172,8 @@ Importing
-> 'cats-dogs'
```
-When using this option, the import stage (`.dvc` file) will also have a `rev`
-subfield under `repo`:
+When using this option, the import `.dvc` file will also have a `rev` subfield
+under `repo`:
```yaml
deps:
@@ -192,14 +188,14 @@ If `rev` is a Git branch or tag (where the underlying commit changes), the data
source may have updates at a later time. To bring it up to date if so (and
update `rev_lock` in the `.dvc` file), simply use `dvc update .dvc`. If
`rev` is a specific commit hash (does not change), `dvc update` without options
-will not have an effect on the import stage. You may force-update it to a
+will not have an effect on the import `.dvc` file. You may force-update it to a
different commit with `dvc update --rev`:
```dvc
$ dvc update --rev cats-dogs-v2
```
-> In the above example, the value for `rev` in the new import stage will be
+> In the above example, the value for `rev` in the new `.dvc` file will be
> `master` (a branch) so it will be able update normally going forward.
## Example: Data registry
@@ -230,7 +226,7 @@ $ dvc import git@github.com:iterative/dataset-registry.git \
`dvc import` provides a better way to incorporate data files tracked in external
DVC repositories because it saves the connection between the
current project and the source repo. This means that enough information is
-recorded in an import stage (`.dvc` file) in order to
+recorded in an import `.dvc` file in order to
[reproduce](/doc/command-reference/repro) downloading of this same data version
in the future, where and when needed. This is achieved with the `repo` field,
for example (matching the import command above):
@@ -265,8 +261,8 @@ Importing ...
> Note that Git-tracked files can be imported from DVC repos as well.
-The file is imported, and along with it, an import stage (`.dvc` file) is
-created. Check `it-standards.csv.dvc`:
+The file is imported, and along with it, an import `.dvc` file is created. Check
+`it-standards.csv.dvc`:
```yaml
deps:
diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md
index 0f93345120..95ecce7bc4 100644
--- a/content/docs/command-reference/remote/modify.md
+++ b/content/docs/command-reference/remote/modify.md
@@ -799,7 +799,7 @@ by HDFS. Read more about by expanding the WebHDFS section in
> written to a Git-ignored config file.
> Note that `user/password` and `token` authentication are incompatible. You
-> should authenticate against yout WebDAV remote by either `user/password` or
+> should authenticate against your WebDAV remote by either `user/password` or
> `token`.
- `ask_password` - ask each time for the password to use for `user/password`