Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get: add example on downloading normal git files #821

Merged
merged 2 commits into from
Dec 9, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 47 additions & 36 deletions static/docs/command-reference/get.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,32 @@
# get

Download or copy file or directory from the
[remote storage](/doc/command-reference/remote) of any <abbr>DVC project</abbr>
in a Git repository (e.g. hosted on GitHub) into the current working directory.
Obtain a file or directory from any <abbr>DVC project</abbr> or Git repository
(e.g. hosted on GitHub) into the current working directory.

> Unlike `dvc import`, this command does not track the downloaded data files
> (does not create a DVC-file).
> Unlike `dvc import`, this command does not track the obtained files (does not
> create a DVC-file).

## Synopsis

```usage
usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path

Download/copy files or directories from DVC repository.
Documentation: <https://man.dvc.org/get>

positional arguments:
url URL of Git repository with DVC project to download from.
path Path to data within DVC repository.
url URL of Git repository with DVC project to download
from.
path Path to a file or directory within a DVC repository.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

## Description

Provides an easy way to download datasets, intermediate results, ML models, or
other files and directories (any <abbr>data artifact</abbr>) tracked in another
<abbr>DVC repository</abbr>, by downloading them into the current working
directory. (It works like `wget`, but for DVC repositories.)
Provides an easy way to obtain files or directories tracked in any <abbr>DVC
repository</abbr>, both by Git (e.g. source code) and DVC (e.g. datasets, ML
models). The file or directory in path is copied to the current working
directory. (For remote URLs, it works like downloading with wget, but supporting
DVC <abbr>data artifacts</abbr>.)

Note that this command doesn't require an existing DVC project to run in. It's a
single-purpose command that can be used out of the box after installing DVC.
Expand All @@ -32,31 +36,28 @@ external <abbr>project</abbr>. Both HTTP and SSH protocols are supported for
online repositories (e.g. `[user@]server:project.git`). `url` can also be a
local file system path to an "offline" repository.

The `path` argument of this command is used to specify the location of the data
to be downloaded within the source project. It should point to a data file or
directory tracked by that project – specified in one of the
[DVC-files](/doc/user-guide/dvc-file-format) of the repository at `url`. (You
will not find these files directly in the source Git repository.) The source
project should have a default [DVC remote](/doc/command-reference/remote)
configured, containing them.)
The `path` argument of this command is used to specify the location of the file
or directory within the source project. If the file is a
[DVC-file](/doc/user-guide/dvc-file-format) the source project must have a
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
default [DVC remote](/doc/command-reference/remote) configured.

> See `dvc get-url` to download data from other supported URLs.
> See `dvc get-url` to obtain data from other supported URLs.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

After running this command successfully, the data found in the `url` `path` is
created in the current working directory, with its original file name.

## Options

- `-o`, `--out` - specify a path (directory and/or file name) to the desired
location to place the downloaded data in. The default value (when this option
location to place the obtained file in. The default value (when this option
isn't used) is the current working directory (`.`) and original file name. If
an existing directory is specified, then the output will be placed inside of
it.

- `--rev` - specific
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
(such as a branch name, a tag, or a commit hash) of the DVC repository to
download the data from. The tip of the default branch is used by default when
obtain the file from. The tip of the default branch is used by default when
this option is not specified.

- `-h`, `--help` - prints the usage/help message, and exit.
Expand All @@ -66,12 +67,12 @@ created in the current working directory, with its original file name.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples
## Example: Retrieve a model from a DVC remote
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

> Note that `dvc get` can be used from anywhere in the file system, as long as
> DVC is [installed](/doc/install).

We can use `dvc get` to download the resulting model file from our
We can use `dvc get` to obtain the resulting model file from our
[get started example repo](https://github.com/iterative/example-get-started), a
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
<abbr>DVC project</abbr> external to the current working directory. The desired
<abbr>output</abbr> file would be located in the root of the external project
Expand All @@ -95,26 +96,36 @@ is found, that specifies `model.pkl` in its outputs (`outs`). DVC then
its
[config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)).

> A recommended use for downloading binary files from DVC repositories, as done
> in this example, is to place a ML model inside a wrapper application that
> serves as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load)
> pipeline or as an HTTP/RESTful API (web service) that provides predictions
> upon request. This can be automated leveraging DVC with
> A recommended use for obtaining binary files from DVC repositories, as done in
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
> this example, is to place a ML model inside a wrapper application that serves
> as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) pipeline
> or as an HTTP/RESTful API (web service) that provides predictions upon
> request. This can be automated leveraging DVC with
> [CI/CD](https://en.wikipedia.org/wiki/CI/CD) tools.

The same example applies to raw or intermediate <abbr>data artifacts</abbr> as
well, of course, for cases where we want to download those files or directories
well, of course, for cases where we want to obtain those files or directories
and perform some analysis on them.

## Examples: Retrieve a file from a git repository
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

We can also use `dvc get` to retrieve any file or directory that exists in a git
repository.

```dvc
$ dvc get https://github.com/schacon/cowsay/install.sh install.sh
$ ls
install.sh
```

## Example: Compare different versions of data or model

`dvc get` has the `--rev` option, to specify which version of the repository to
download a <abbr>data artifact</abbr> from. It also has the `--out` option to
specify the file or directory path and file name for the download. Combining
these two options allows us to do something we can't achieve with the regular
`git checkout` + `dvc checkout` process – see for example the
[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get
Started_ section.
obtain a <abbr>data artifact</abbr> from. It also has the `--out` option to
specify the target path. Combining these two options allows us to do something
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
we can't achieve with the regular `git checkout` + `dvc checkout` process – see
for example the [Get Older Data Version](/doc/get-started/older-versions)
chapter of our _Get Started_ section.

Let's use the
[get started example repo](https://github.com/iterative/example-get-started)
Expand Down Expand Up @@ -148,7 +159,7 @@ get the most recent one, we use a similar command, but with
`-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev`
(since it's the latest version anyway). In fact, in this case using `dvc pull`
with the corresponding [DVC-files](/doc/user-guide/dvc-file-format) should
suffice, downloading the file as just `model.pkl`. We can then rename it to make
suffice, obtaining the file as just `model.pkl`. We can then rename it to make
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
its version explicit:

```dvc
Expand Down