Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: improve add/import* to-cache/remote info and examples #2302

Merged
merged 32 commits into from
Mar 19, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
3f1217e
guide: update Ext Data guide link to add to-cache/remote examples
jorgeorpinel Mar 13, 2021
bab95a9
ref: config options copy edits
jorgeorpinel Mar 14, 2021
69cbbb6
ref: destroy copy edit
jorgeorpinel Mar 14, 2021
a1e609e
Merge branch 'master' into jorge +
jorgeorpinel Mar 15, 2021
cd599c8
ref: fix mac config file locs
jorgeorpinel Mar 15, 2021
eb4af97
ref: small update to plots * --open
jorgeorpinel Mar 15, 2021
043db23
ref: clarify and correct info on add/import to-cache/remote strategies
jorgeorpinel Mar 15, 2021
fa82c89
ref: import-url vs import in terms of remote sync
jorgeorpinel Mar 15, 2021
5729f49
ref: roll back changes unrelated to get/import from this PR
jorgeorpinel Mar 15, 2021
c393212
ref: remove wrong info about import* to-cache
jorgeorpinel Mar 15, 2021
b020b4e
Update content/docs/command-reference/add.md
jorgeorpinel Mar 15, 2021
86813ad
Restyled by prettier
restyled-commits Mar 15, 2021
725aa92
Merge pull request #2303 from iterative/restyled/jorge
jorgeorpinel Mar 15, 2021
f2350ab
ref: import + push/pull notes
jorgeorpinel Mar 15, 2021
14b62cc
ref: simplify add -o
jorgeorpinel Mar 15, 2021
d63b07f
ref: update add --to-remote desc
jorgeorpinel Mar 15, 2021
94010d7
ref: simplify add -o example intro
jorgeorpinel Mar 15, 2021
562b63c
ref: mention soft/hard links in add -o example
jorgeorpinel Mar 15, 2021
f694719
ref: external data cop edits
jorgeorpinel Mar 15, 2021
d5e793e
ref: avoid term "transfer" for -o/-to-remote (1)
jorgeorpinel Mar 15, 2021
0166b1f
ref: relink to add/import -o/-to-remote examples including
jorgeorpinel Mar 16, 2021
696fa53
ref: updated add/import to-cache/remote example titles
jorgeorpinel Mar 16, 2021
15097f1
ref: a couple more copy edits to add -o/-to-remote
jorgeorpinel Mar 16, 2021
31394de
ref: update --to-remote copy edits
jorgeorpinel Mar 16, 2021
0f2a2b1
ref: roll back changes not related to #2302
jorgeorpinel Mar 16, 2021
3d10596
Merge branch 'master' into jorge
jorgeorpinel Mar 17, 2021
1082c85
ref: clarfy --out option
jorgeorpinel Mar 17, 2021
b943df5
ref: rename add -o/-to-remote examples
jorgeorpinel Mar 17, 2021
b25da5c
ref: other copy edits to add -o/-to-remote
jorgeorpinel Mar 17, 2021
aa171d4
ref: no hard links for add -o + ext cache
jorgeorpinel Mar 17, 2021
61f8806
ref: more edits to add/import-url to-cache/remote
jorgeorpinel Mar 17, 2021
d5f284a
Merge branch 'master' into jorge
jorgeorpinel Mar 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 40 additions & 50 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,12 @@ not.
> Additionally, this typically requires an external cache setup (see link
> above).

- `-o <path>`, `--out <path>` - destination `path` to make a local target copy,
or to [transfer](#example-transfer-to-cache) an external target into the cache
(and link to workspace). Note that this can be combined with `--to-remote` to
avoid storing the data locally, while still adding it to the project.
- `-o <path>`, `--out <path>` - destination `path` inside the workspace to link
(or copy) a data target, which will now be tracked by DVC. Note that combining
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
this with an
[external cache transfer](#example-transfer-to-an-external-cache), or with the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
`--to-remote` option, let's you avoid storing an external target locally,
while still adding it to the project.

- `--to-remote` - import an external target, but don't move it into the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import is not the right term here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would describe it as when specified dvc addcan accept path to an external data (e.g. S3 url, or path on a mounted volume, etc) and it copies data into remote storage w/o caching data into project's cache and w/o bringing it into the workspace. Besides that, it behaves in the same way - it creates a regular.dvcfile, which can be used withdvc pullto get data into workspace when and where it's needed. This options expects (?)-o to be provided. It is useful to "bootstrap" the project, when data is too large for the disk the workspace is located on. See example ...

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated along those lines in d63b07f.

Besides that, it behaves in the same way...

This seems like too many details for an option desc. It's already explained in the main command Description and I think it's assumed that everything still applies with every option except if otherwise noted.

This options expects (?) -o

Not required, it should use uses ./<basename> by default (from dvc add --to-remote /ext/path/basename).

useful to "bootstrap" ...

Similarly, this idea is covered in the example (linked from here). Trying to keep the option desc. succinct.

workspace, nor cache it. [Transfer it](#example-transfer-to-remote-storage) it
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -336,28 +338,21 @@ $ tree .dvc/cache
Only the hash values of the `dir/` directory (with `.dir` file extension) and
`file2` have been cached.

## Example: Transfer to the cache
## Example: Transfer to an external cache

When you have a large dataset in an external location, you may want to add it to
the <abbr>project</abbr> without having to copy it into the workspace. Maybe
your local disk doesn't have enough space, but you have setup an
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
that could handle it.
Sometimes you may want to add a large dataset currently found in an external
location, so it becomes local to the project. However, your local file system
may not have enough space to download it — which is needed to add data in DVC,
right? Not necessarily!
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

The `--out` option lets you add external paths in a way that they are
The `--out` option lets you add external data in a way that it's
<abbr>cached</abbr> first, and then
[linked](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to a given path inside the <abbr>workspace<abbr>. Let's initialize an example
DVC project to try this:

```dvc
$ mkdir example # workspace
$ cd example
$ git init
$ dvc init
```
to a given path inside the <abbr>workspace</abbr>. Combined with an
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
setup, this let's you avoid using your local file system completely.

Now we can add a `data.xml` file via HTTP for example, putting it a local path
For example, we can add a `data.xml` file via HTTP, outputting it a local path
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
in our project:

```dvc
Expand All @@ -368,9 +363,10 @@ data.xml data.xml.dvc
```

The resulting `.dvc` file will save the provided local `path` as if the data was
already in the workspace, while the `md5` hash points to the copy of the data
that has now been transferred to the <abbr>cache</abbr>. Let's check the
contents of `data.xml.dvc` in this case:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
always in the workspace, while the `md5` hash points to the copy of the data
that has now been transferred to the <abbr>cache</abbr> (which again, we assume
it's already setup in some storage drive that can handle it). Let's check the
contents of `data.xml.dvc`:

```yaml
outs:
Expand All @@ -384,43 +380,37 @@ outs:

## Example: Transfer to remote storage
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

When you have a large dataset in an external location, you may want to track it
as if it was in your project, but without downloading it locally (for now). The
`--to-remote` option lets you do so, while storing a copy
[remotely](/doc/command-reference/remote) so it can be
[pulled](/doc/command-reference/plots) later. Let's initialize a DVC project,
and setup a remote:
Similarly to the previous scenario, you may sometimes want to track a large
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
dataset found externally into a regular <abbr>project</abbr> (with a local
<abbr>cache</abbr>). Can it be done without downloading the data locally (for
now)? Yes!
shcheklein marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ mkdir example # workspace
$ cd example
$ git init
$ dvc init
$ mkdir /tmp/dvc-storage
$ dvc remote add myremote /tmp/dvc-storage
```
The `--to-remote` option lets you transfer a copy of the target data to
[remote storage](/doc/command-reference/remote), while creating a `.dvc` file
locally so it can be [pulled](/doc/command-reference/plots) later. This is a way
to "bootstrap" your project in your local machine, to be reproduced on the right
environment later (e.g. a GPU cloud server or a CI/CD system).

Now let's add the `data.xml` to our remote storage from the given remote
location.
Let's setup a simple remote and transfer a `data.xml` file from the web into it
via DVC:

```dvc
$ mkdir /tmp/dvc-storage
$ dvc remote add myremote /tmp/dvc-storage
$ dvc add https://data.dvc.org/get-started/data.xml -o data.xml \
--to-remote -r myremote
...
```

The only difference that dataset is transferred straight to remote, so DVC won't
control the remote location you gave but rather continue managing your remote
storage where the data is now on. The operation will still be resulted with an
`.dvc` file:

```dvc
$ ls
data.xml.dvc
```

Whenever anyone wants to actually download the added data (for example from a
system that can handle it), they can use `dvc pull` as usual:
> Note that this can be combined with `--out` to specify a local destination
> `path` (written to the `.dvc` file).

DVC won't control the original data source after this, but rather continue
managing your remote storage, where the data is now found. Whenever anyone wants
to actually download the added data (from a system that can handle it), they can
use `dvc pull` as usual:

```dvc
$ dvc pull data.xml.dvc -r tmp_remote
Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,10 @@ multiple projects and users, respectively:
}
</style>

| Flag | Priority | Mac location | Linux location (typical\*) | Windows location |
| ---------- | -------- | -------------------------------------- | -------------------------- | --------------------------------------------------------- |
| `--global` | 3 | `$HOME/Library/Preferences/dvc/config` | `$HOME/.config/dvc/config` | `%LocalAppData%\iterative\dvc\config` |
| `--system` | 4 | `/Library/Preferences/dvc/config` | `/etc/xdg/dvc/config` | `%AllUsersProfile%\Application Data\iterative\dvc\config` |
| Flag | Priority | Mac location | Linux location (typical\*) | Windows location |
| ---------- | -------- | ----------------------------------------------- | -------------------------- | --------------------------------------------------------- |
| `--global` | 3 | `$HOME/Library/Application\ Support/dvc/config` | `$HOME/.config/dvc/config` | `%LocalAppData%\iterative\dvc\config` |
| `--system` | 4 | `/Library/Application\ Support/dvc/config` | `/etc/xdg/dvc/config` | `%AllUsersProfile%\Application Data\iterative\dvc\config` |

> \* For Linux, the global `dvc/config` may be found in `$XDG_CONFIG_HOME`, and
> the system-wide one in `$XDG_CONFIG_DIRS[0]`, if those env vars are defined.
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@ repository (e.g. source code, small image/other files). `dvc get` copies the
target file or directory (found at `path` in `url`) to the current working
directory. (Analogous to `wget`, but for repos.)

> See `dvc list` for a way to browse repository contents to find files or
> directories to download.

> Note that unlike `dvc import`, this command does not track the downloaded
> files (does not create a `.dvc` file). For that reason, it doesn't require an
> existing DVC project to run in.

> See `dvc list` for a way to browse repository contents to find files or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some general copy edits to the descriptions of all the commands related here (add, get, import, etc.) as I reviewed the -o/-to-remote stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it increases surfaces, please please let's try to avoid it as much as we can

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will extract ⌛

I think this is a occasionally recurring misunderstanding because I see the changes as related when I make them, as docs have lots of conceptual interconnections... Also I don't always have the capacity to remember small details for later (they would otherwise be lost, though maybe that's better?) and it's it's not always feasible to stash inner file chunks either. So it's tricky, sorry if it happens now and then 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, rolled back.

> directories to download.

The `url` argument specifies the address of the DVC or Git repository containing
the data source. Both HTTP and SSH protocols are supported (e.g.
`[user@]server:project.git`). `url` can also be a local file system path
Expand Down
77 changes: 36 additions & 41 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,8 @@ positional arguments:
## Description

In some cases it's convenient to add a data file or directory from an external
location into the workspace (or to
[remote storage](/doc/command-reference/remote)), such that it can be updated
later, if/when the external data source changes. Example scenarios:
location into the project, such that it can be updated later if/when the
external data source changes. Example scenarios:

- A remote system may produce occasional data files that are used in other
projects.
Expand All @@ -37,25 +36,27 @@ later, if/when the external data source changes. Example scenarios:

`dvc import-url` helps you create such an external data dependency, without
having to manually copy files from the supported locations (listed below), which
may require installing a different tool for each type.

When you don't want to store the target data in your local system, you can still
create an import `.dvc` file while transferring a file or directory directly to
remote storage, by using the `--to-remote` option. See the
[Transfer to remote storage](#example-transfer-to-remote-storage) example for
more details.
would require installing/using a different tool for each type.

The `url` argument specifies the external location of the data to be imported.
The imported data is <abbr>cached</abbr>, and linked (or copied) to the current
working directory with its original file name e.g. `data.txt` (or to a location
provided with `out`).
working directory with its original file name e.g. `data.txt`, or to a location
provided with `out`.

An _import `.dvc` file_ is created in the same location e.g. `data.txt.dvc` –
similar to using `dvc add` after downloading the data. This makes it possible to
update the import later, if the data source has changed (see `dvc update`).
similar to using `dvc add` after downloading the data. It saves the information
about the data source, so the import can be updated later if the data source has
changed (see `dvc update`).

💡 Using an
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
or the `--to-remote` option lets you
[transfer](#example-transfer-to-remote-storage) an import without using the
local file system.

> Note that the imported data can be [pushed](/doc/command-reference/push) to
> remote storage normally.
> Note that imported data can be [pushed](/doc/command-reference/push) and
> [pulled](/doc/command-reference/pull) to/from
> [remote storage](/doc/command-reference/remote) normally.

`.dvc` files support references to data in an external location, see
[External Dependencies](/doc/user-guide/external-dependencies). In such an
Expand All @@ -64,8 +65,9 @@ field contains the corresponding local path in the <abbr>workspace</abbr>. It
records enough metadata about the imported data to enable DVC efficiently
determining whether the local copy is out of date.

Note that `dvc repro` doesn't check or update import `.dvc` files, use
`dvc update` to bring the import up to date from the data source.
Note that `dvc repro` doesn't check or update import `.dvc` files by default
(see `dvc freeze`), use `dvc update` to bring the import up to date from the
data source.

DVC supports several types of external locations (protocols):

Expand Down Expand Up @@ -360,40 +362,33 @@ Running stage 'prepare' with command:

## Example: Transfer to remote storage

When you have a large dataset in an external location, you may want to import it
to your project without downloading it to the local file system (for using it
later/elsewhere). The `--to-remote` option let you skip the download, while
storing the imported data [remotely](/doc/command-reference/remote). Let's
initialize a DVC project, and setup a remote:
Normally, `dvc import-url` downloads the target data (to the <abbr>cache</abbr>)
in order to link and track it locally. But what if there's not enough disk space
for the download?

```dvc
$ mkdir example # workspace
$ cd example
$ git init
$ dvc init
$ mkdir /tmp/dvc-storage
$ dvc remote add myremote /tmp/dvc-storage
```
One option is to setup an
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
in a location that can handle the data. Another is to use the `--to-remote`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need a full to-cache example in import-url (like in https://dvc.org/doc/command-reference/add#example-transfer-to-the-cache).

Copy link
Contributor

@isidentical isidentical Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import-url doesn't have the to-cache ability, but rather work on its own (like you just import an URL with it and it puts it to your cache, but not in a chunked way).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. But then maybe we should explain about the chunking (transferring) a bit more... I'll review again ⌛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with --to-cache the chinked transfer does happen, right @isidentical ? Just double checking. Already updated in c393212, PTAL.

option so the target data is transferred to
[remote storage](/doc/command-reference/remote), while also tracked via an
import `.dvc` file in the project.

Now let's create an import `.dvc` file without downloading the target data,
transferring it directly to remote storage instead:
Let's setup a simple remote and create an import `.dvc` file without downloading
the target data, transferring it directly to the remote:

```
$ mkdir /tmp/dvc-storage
$ dvc remote add myremote /tmp/dvc-storage
$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \
--to-remote -r myremote
...
```

The only change in our local <abbr>workspace</abbr> is a newly created import
`.dvc` file:

```dvc
$ ls
data.xml.dvc
```

Whenever anyone wants to actually download the imported data (for example from a
system that can handle it), they can use `dvc pull` as usual:
The only change in our local <abbr>workspace</abbr> is a the tiny `.dvc` file
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
that was created. Whenever anyone wants to actually download the imported data
(into a system that can handle it), they can use `dvc pull` as usual:

```
$ dvc pull data.xml.dvc -r tmp_remote
Expand Down
23 changes: 14 additions & 9 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,25 @@ target file or directory (found at `path` in `url`), and tracks it in the local
project. This makes it possible to update the import later, if the data source
has changed (see `dvc update`).

> Note that `dvc get` corresponds to the first step this command performs (just
> download the data).

> See `dvc list` for a way to browse repository contents to find files or
> directories to import.

> Note that `dvc get` corresponds to the first step this command performs (just
> download the data).

The imported data is <abbr>cached</abbr>, and linked (or copied) to the current
working directory with its original file name e.g. `data.txt` (or to a location
provided with `--out`). An _import `.dvc` file_ is created in the same location
e.g. `data.txt.dvc` – similar to using `dvc add` after downloading the data.

(ℹ️) DVC won't push or pull imported data to/from
[remote storage](/doc/command-reference/remote), it will rely on it's original
source.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was wrong right? import-url states the opposite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import and import-url have different behavior, the original docs for each command were correct

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestinggggg 🤔 OK will roll back, thanks! ✔️

💡 Using an
[external cache](/doc/use-cases/shared-development-server#configure-the-external-shared-cache)
lets you transfer an import there (and link it in the <abbr>workspace</abbr>),
without using the local file system.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

> Note that imported data can be [pushed](/doc/command-reference/push) and
> [pulled](/doc/command-reference/pull) to/from
> [remote storage](/doc/command-reference/remote) normally.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

The `url` argument specifies the address of the DVC or Git repository containing
the data source. Both HTTP and SSH protocols are supported (e.g.
Expand Down Expand Up @@ -70,9 +75,9 @@ enable DVC efficiently determining whether the local copy is out of date.
To actually [version the data](/doc/tutorials/get-started/data-versioning),
`git add` (and `git commit`) the import `.dvc` file.

Note that `dvc repro` doesn't check or update import `.dvc` files (see
`dvc freeze`), use `dvc update` to bring the import up to date from the data
source.
Note that `dvc repro` doesn't check or update import `.dvc` files by default
(see `dvc freeze`), use `dvc update` to bring the import up to date from the
data source.

Also note that chained imports (importing data that was imported into the source
repo at `url`) are not supported.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/plots/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ all the current plots, without comparisons.
[Vega specification](https://vega.github.io/vega/docs/specification/) file
instead of HTML. See `dvc plots` for more info.

- `--open` - opens the generated plot directly in the browser.
- `--open` - opens the generated plot in the browser automatically.

- `--no-header` - lets DVC know that CSV or TSV `--targets` do not have a
header. A 0-based numeric index can be used to identify each column instead of
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/plots/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ please see `dvc plots`.
[Vega specification](https://vega.github.io/vega/docs/specification/) file
instead of HTML. See `dvc plots` for more info.

- `--open` - opens the generated plot directly in the browser.
- `--open` - opens the generated plot in the browser automatically.

- `--no-header` - lets DVC know that CSV or TSV `targets` do not have a header.
A 0-based numeric index can be used to identify each column instead of names.
Expand Down
13 changes: 7 additions & 6 deletions content/docs/user-guide/managing-external-data.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# External Outputs

> ⚠️ This is an advanced feature for very specific situations and not
> recommended except if there's absolutely no other alternative. In most cases
> alternatives like the
> [to-cache](/doc/command-reference/add#example-transfer-to-the-cache) or
> [to-remote](/doc/command-reference/add#example-transfer-to-remote-storage)
> strategies of `dvc add` and `dvc import-url` are more convenient. **Note**
> that external outputs are not pushed or pulled from/to
> recommended except if there's absolutely no other alternative. Note that
> external outputs are not pushed or pulled from/to
> [remote storage](/doc/command-reference/remote).
>
> In most cases the
> [to-cache](/doc/command-reference/add#example-transfer-to-an-external-cache)
> or [to-remote](/doc/command-reference/add#example-transfer-to-remote-storage)
> strategies of `dvc add` and `dvc import-url` are more convenient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't like are more convenient. but I guess we'll address it later. It's not about convenience to my mind. It's about different workflows around data. And we need to explain to people that in certain cases (unless you indeed write/read directly from/to remote location from code) you don't need this. If you are looking for a way to add data to DVC in regular way (remote storage, dvc pull, dvc push , whatever - find a way to define it ) there are better options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Changing to "better" for now.


There are cases when data is so large, or its processing is organized in such a
way, that its impossible to handle it in the local machine disk. For example
Expand Down