Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular doc updates (early Dec) #846

Merged
merged 21 commits into from
Dec 12, 2019
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
c853e79
wrap /api/comments file doc paragraph
jorgeorpinel Dec 5, 2019
b68c001
Merge branch 'master' into jorgeorpinel
jorgeorpinel Dec 9, 2019
7a5c344
user-guide: improve doc contrib guide instructions for #843
jorgeorpinel Dec 9, 2019
d9ab97f
user-guide: change note about GDrive access token
jorgeorpinel Dec 9, 2019
63a4ac9
remote: use consistent order and terminology for remote types
jorgeorpinel Dec 9, 2019
b799248
term: improve note about "local remote"
jorgeorpinel Dec 9, 2019
884968d
cmd ref: link to settings header in remote modify text
jorgeorpinel Dec 9, 2019
2be074f
Merge branch 'master' into jorgeorpinel
jorgeorpinel Dec 10, 2019
a9a9bda
use-cases: address pending feedback from iterative/dvc.org/pull/821
jorgeorpinel Dec 10, 2019
735cae1
cmd ref: clarify around term "download" in get and import
jorgeorpinel Dec 10, 2019
3f1524b
SEO: expand list of remotes in main landing page meta info
jorgeorpinel Dec 10, 2019
c68f2fa
term: remove some bold notes in quotes, add some emojis
jorgeorpinel Dec 10, 2019
15cf7c5
term: more note and emojis reviews
jorgeorpinel Dec 10, 2019
66d1e52
term: finish reviewing bold notes in docs, adds some more emojis
jorgeorpinel Dec 10, 2019
2570a18
rewrap server.js comment
jorgeorpinel Dec 10, 2019
75dab08
Merge branch 'master' into jorgeorpinel
jorgeorpinel Dec 11, 2019
9955833
back ticks for `dvc` and H for "GitHub"
jorgeorpinel Dec 11, 2019
dc6a03e
cmd ref: remove outdated note about pyarrow in remote add
jorgeorpinel Dec 11, 2019
408073c
cmd ref: revise yellow ! notes
jorgeorpinel Dec 11, 2019
4455fb0
cmd ref: addressed misc. feedback for PR #846
jorgeorpinel Dec 11, 2019
53f840c
revert .github/PULL_REQUEST_TEMPLATE.md auto formatting
jorgeorpinel Dec 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
Disregard the recommendations below if you use **Edit on Github** button to improve the docs in place.
Disregard the recommendations below if you use **Edit on GitHub** button to
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
improve the docs in place.

❗ Please read the guidelines in the [Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs) list if you make any substantial changes to the documentation or JS engine.
❗ Please read the guidelines in the
[Contributing to the Documentation](https://dvc.org/doc/user-guide/contributing/docs)
list if you make any substantial changes to the documentation or JS engine.

🐛 Please make sure to mention `Fix #issue` (if applicable) in the description of the PR. This enables GitHub to link the PR to the corresponding bug and close it automatically when PR is merged.
🐛 Please make sure to mention `Fix #issue` (if applicable) in the description
of the PR. This enables GitHub to link the PR to the corresponding bug and close
it automatically when PR is merged.

Thank you for the contribution - we'll try to review and merge it as soon as possible. 🙏
Thank you for the contribution - we'll try to review and merge it as soon as
possible. 🙏
12 changes: 5 additions & 7 deletions pages/api/comments.js
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
/*
* This API endpoint is used by https://blog.dvc.org
* to get comments count for the post, it gets
* discuss.dvc.org topic url as a param and returns
* comments count or error.
* This API endpoint is used by our blog to get comments count for the post, it
* gets discuss.dvc.org topic URL as a param and returns comments count or
* error.
*
* It made this way to configure CORS, reduce user's payload
* and to add potential ability to cache comments count
* in the future.
* It made this way to configure CORS, reduce user's payload and to add
* potential ability to cache comments count in the future.
*/

import Cors from 'micro-cors'
Expand Down
7 changes: 4 additions & 3 deletions pages/features.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,10 @@ export default function FeaturesPage() {
</Icon>
<Name>Storage agnostic</Name>
<Description>
Use S3, Azure, Google Drive, GCP, SSH, SFTP, Aliyun OSS rsync or
any network-attached storage to store data. The list of supported
protocols is constantly expanding.
Use Amazon S3, Microsoft Azure Blob Storage, Google Drive, Google
Cloud Storage, Aliyun OSS, SSH/SFTP, HDFS, HTTP, network-attached
storage, or rsync to store data. The list of supported remote
storage is constantly expanding.
</Description>
</Feature>
<Feature>
Expand Down
9 changes: 4 additions & 5 deletions server.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
/* eslint-env node */

// This file doesn't go through babel or webpack transformation.
// Make sure the syntax and sources this file requires are compatible with the
// current node version you are running.
// See https://github.com/zeit/next.js/issues/1245 for discussions on Universal
// Webpack or universal Babel.
// This file doesn't go through babel or webpack transformation. Make sure the
// syntax and sources this file requires are compatible with the current Node.js
// version you are running. (See https://github.com/zeit/next.js/issues/1245 for
// discussions on universal Webpack vs universal Babel.)

const { createServer } = require('http')
const { parse } = require('url')
Expand Down
5 changes: 3 additions & 2 deletions src/Diagram/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ const ColumnOne = () => (
<Description fullWidth>
<p>
Version control machine learning models, data sets and intermediate
files. DVC connects them with code and uses S3, Azure, Google Drive,
GCP, SSH, Aliyun OSS or to store file contents.
files. DVC connects them with code, and uses Amazon S3, Microsoft Azure
Blob Storage, Google Drive, Google Cloud Storage, Aliyun OSS, SSH/SFTP,
HDFS, HTTP, network-attached storage, or rsync to store file contents.
</p>
<p>
Full code and data provenance help track the complete evolution of every
Expand Down
24 changes: 11 additions & 13 deletions static/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,10 +124,10 @@ for more details.)
effective of those two. DVC avoids `symlink` and `hardlink` types by default
to protect user from accidental cache and repository corruption.

> **Note!** If you manually set `cache.type` to `hardlink` or `symlink`, **you
> will corrupt the cache** if you modify tracked data files in the workspace.
> See the `cache.protected` config option above and corresponding
> `dvc unprotect` command to modify files safely.
⚠️ If you manually set `cache.type` to `hardlink` or `symlink`, **you will
corrupt the cache** if you modify tracked data files in the workspace. See the
`cache.protected` config option above and corresponding `dvc unprotect`
command to modify files safely.

There are pros and cons to different link types. Refer to
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
Expand Down Expand Up @@ -164,7 +164,7 @@ for more details.)
- `cache.hdfs` - name of an
[HDFS remote to use as external cache](/doc/user-guide/managing-external-data#hdfs).

- `cache.azure` - name of an Azure remote to use as
- `cache.azure` - name of a Microsoft Azure Blob Storage remote to use as
[external cache](/doc/user-guide/managing-external-data).

### state
Expand All @@ -185,20 +185,18 @@ more about the state file (database) that is used for optimization.
so that when it needs to cleanup the database it could sort them by the
timestamp and remove the oldest ones. Default quota is set to 50(percent).

## Example: Core config options

Set the `dvc` log level to `debug`:
## Example: Set the debug level

```dvc
$ dvc config core.loglevel debug
```

Add an S3 remote and set it as the <abbr>project</abbr> default:
## Example: Add an S3 remote

> 💡 Before adding an S3 remote, be sure to
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

> **Note!** Before adding a new remote be sure to login into AWS services and
> follow instructions at
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)
> to create your bucket.
This also sets the remote as the <abbr>project</abbr> default:

```dvc
$ dvc remote add myremote s3://bucket/path
Expand Down
9 changes: 5 additions & 4 deletions static/docs/command-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,11 @@ DVC supports several types of (local or) remote locations (protocols):
| `hdfs` | HDFS | `hdfs://[email protected]/path/to/data.csv` |
| `http` | HTTP to file | `https://example.com/path/to/data.csv` |

> Depending on the remote locations type you plan to download data from you
> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`,
> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all)
> when [installing DVC](/doc/install) with `pip`.
> If you installed DVC via `pip` and plan to use cloud services as remote
> storage, you might need to install these optional dependencies: `[s3]`,
> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to
> include them all. The command should look like this: `pip install "dvc[s3]"`.
> (This example installs `boto3` library along with DVC to support S3 storage.)

Another way to understand the `dvc get-url` command is as a tool for downloading
data files.
Expand Down
71 changes: 36 additions & 35 deletions static/docs/command-reference/get.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# get

Obtain a file or directory from any <abbr>DVC project</abbr> or Git repository
Download a file or directory from any <abbr>DVC project</abbr> or Git repository
(e.g. hosted on GitHub) into the current working directory.

> Unlike `dvc import`, this command does not track the obtained files (does not
> create a DVC-file).
> Unlike `dvc import`, this command does not track the downloaded files (does
> not create a DVC-file).

## Synopsis

Expand All @@ -15,14 +15,13 @@ Download/copy files or directories from DVC repository.
Documentation: <https://man.dvc.org/get>

positional arguments:
url URL of Git repository with DVC project to download
from.
path Path to a file or directory within a DVC repository.
url URL of Git repository with DVC project to download from.
path Path to a file or directory within a DVC repository.
```

## Description

Provides an easy way to obtain files or directories tracked in any <abbr>DVC
Provides an easy way to download files or directories tracked in any <abbr>DVC
repository</abbr>, both by Git (e.g. source code) and DVC (e.g. datasets, ML
models). The file or directory in path is copied to the current working
directory. (For remote URLs, it works like downloading with wget, but supporting
Expand All @@ -34,30 +33,35 @@ single-purpose command that can be used out of the box after installing DVC.
The `url` argument specifies the address of the Git repository containing the
external <abbr>project</abbr>. Both HTTP and SSH protocols are supported for
online repositories (e.g. `[user@]server:project.git`). `url` can also be a
local file system path to an "offline" repository.
local file system path to an "offline" repository (in this case instead of
downloading, DVC may copy the target data from the external source project or
it's cache).

The `path` argument of this command is used to specify the location of the file
or directory within the source project. If the file is a
[DVC-file](/doc/user-guide/dvc-file-format) the source project must have a
default [DVC remote](/doc/command-reference/remote) configured.
The `path` argument of this command is used to specify the location, within the
source repository at `url`, of the target(s) to be downloaded. It can point to
any file or directory in the source project, including all files tracked by Git.
Note that data tracked by DVC should be specified in one of the
[DVC-files](/doc/user-guide/dvc-file-format) of the source repository. (In this
case, a default [DVC remote](/doc/command-reference/remote) needs to be
configured in the project, containing the actual data.)

> See `dvc get-url` to obtain data from other supported URLs.
> See `dvc get-url` to download data from other supported URLs.

After running this command successfully, the data found in the `url` `path` is
created in the current working directory, with its original file name.

## Options

- `-o`, `--out` - specify a path (directory and/or file name) to the desired
location to place the obtained file in. The default value (when this option
location to place the download file in. The default value (when this option
isn't used) is the current working directory (`.`) and original file name. If
an existing directory is specified, then the output will be placed inside of
it.

- `--rev` - specific
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
(such as a branch name, a tag, or a commit hash) of the DVC repository to
obtain the file from. The tip of the default branch is used by default when
download the file from. The tip of the default branch is used by default when
this option is not specified.

- `-h`, `--help` - prints the usage/help message, and exit.
Expand All @@ -67,18 +71,14 @@ created in the current working directory, with its original file name.

- `-v`, `--verbose` - displays detailed tracing information.

## Example: Retrieve a model from a DVC remote
## Example: Get a DVC-tracked model file

> Note that `dvc get` can be used from anywhere in the file system, as long as
> DVC is [installed](/doc/install).

We can use `dvc get` to obtain the resulting model file from our
We can use `dvc get` to download the resulting model file from our
[get started example repo](https://github.com/iterative/example-get-started), a
<abbr>DVC project</abbr> external to the current working directory. The desired
<abbr>output</abbr> file would be located in the root of the external project
(if the
[`train.dvc` stage](https://github.com/iterative/example-get-started/blob/master/train.dvc)
was reproduced) and named `model.pkl`.
<abbr>DVC project</abbr> hosted on GitHub:

```dvc
$ dvc get https://github.com/iterative/example-get-started model.pkl
Expand All @@ -96,18 +96,18 @@ is found, that specifies `model.pkl` in its outputs (`outs`). DVC then
its
[config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)).

> A recommended use for obtaining binary files from DVC repositories, as done in
> this example, is to place a ML model inside a wrapper application that serves
> as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) pipeline
> or as an HTTP/RESTful API (web service) that provides predictions upon
> request. This can be automated leveraging DVC with
> A recommended use for downloading binary files from DVC repositories, as done
> in this example, is to place a ML model inside a wrapper application that
> serves as an [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load)
> pipeline or as an HTTP/RESTful API (web service) that provides predictions
> upon request. This can be automated leveraging DVC with
> [CI/CD](https://en.wikipedia.org/wiki/CI/CD) tools.

The same example applies to raw or intermediate <abbr>data artifacts</abbr> as
well, of course, for cases where we want to obtain those files or directories
well, of course, for cases where we want to download those files or directories
and perform some analysis on them.

## Examples: Retrieve a file from a git repository
## Examples: Get a Git-tracked model file

We can also use `dvc get` to retrieve any file or directory that exists in a git
repository.
Expand All @@ -121,11 +121,12 @@ install.sh
## Example: Compare different versions of data or model

`dvc get` has the `--rev` option, to specify which version of the repository to
obtain a <abbr>data artifact</abbr> from. It also has the `--out` option to
specify the target path. Combining these two options allows us to do something
we can't achieve with the regular `git checkout` + `dvc checkout` process – see
for example the [Get Older Data Version](/doc/get-started/older-versions)
chapter of our _Get Started_ section.
download a <abbr>data artifact</abbr> from. It also has the `--out` option to
specify the location to place the artifact within the workspace. Combining these
two options allows us to do something we can't achieve with the regular
`git checkout` + `dvc checkout` process – see for example the
[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get
Started_ section.

Let's use the
[get started example repo](https://github.com/iterative/example-get-started)
Expand Down Expand Up @@ -159,7 +160,7 @@ get the most recent one, we use a similar command, but with
`-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev`
(since it's the latest version anyway). In fact, in this case using `dvc pull`
with the corresponding [DVC-files](/doc/user-guide/dvc-file-format) should
suffice, obtaining the file as just `model.pkl`. We can then rename it to make
suffice, downloading the file as just `model.pkl`. We can then rename it to make
its version explicit:

```dvc
Expand Down
9 changes: 5 additions & 4 deletions static/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,11 @@ DVC supports several types of (local or) remote locations (protocols):
| `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` |
| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` |

> Depending on the remote locations type you plan to download data from you
> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`,
> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all)
> when [installing DVC](/doc/install) with `pip`.
> If you installed DVC via `pip` and plan to use cloud services as remote
> storage, you might need to install these optional dependencies: `[s3]`,
> `[azure]`, `[gdrive]`, `[gs]`, `[oss]`, `[ssh]`. Alternatively, use `[all]` to
> include them all. The command should look like this: `pip install "dvc[s3]"`.
> (This example installs `boto3` library along with DVC to support S3 storage.)

<!-- Separate MD quote: -->

Expand Down
4 changes: 3 additions & 1 deletion static/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ the data source changes. (See `dvc update`.)
The `url` argument specifies the address of the Git repository containing the
source <abbr>project</abbr>. Both HTTP and SSH protocols are supported for
online repositories (e.g. `[user@]server:project.git`). `url` can also be a
local file system path to an "offline" repository.
local file system path to an "offline" repository (in this case instead of
downloading, DVC may copy the target data from the external source project or
it's cache).

The `path` argument of this command is used to specify the location of the data
to be downloaded within the source project. It should point to a data file or
Expand Down
Loading