Skip to content

Commit

Permalink
Merge pull request #330 from kurianbenoy/master
Browse files Browse the repository at this point in the history
[WIP] Fix #313
  • Loading branch information
shcheklein authored May 14, 2019
2 parents 798f306 + f247b58 commit ab74826
Show file tree
Hide file tree
Showing 17 changed files with 251 additions and 250 deletions.
1 change: 1 addition & 0 deletions src/Documentation/Markdown/Markdown.js
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ const CodeBlock = ({ value, language }) => {
const dvcStyle = Object.assign({}, docco)
dvcStyle['hljs-comment'] = { color: '#999' }
dvcStyle['hljs-meta'] = { color: '#333', fontSize: '14px' }
dvcStyle['hljs']['padding'] = '0.5em 0.5em 0.5em 2em'
return (
<SyntaxHighlighter language={language} style={dvcStyle}>
{value}
Expand Down
16 changes: 8 additions & 8 deletions static/docs/get-started/add-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,24 @@ link as`(Chrome) or `Save object as`(Firefox).
</details>

```dvc
$ mkdir data
$ wget https://dvc.org/s3/get-started/data.xml -O data/data.xml
$ mkdir data
$ wget https://dvc.org/s3/get-started/data.xml -O data/data.xml
```

To take a file (or a directory) under DVC control just run `dvc add`, it accepts
any **file** or a **directory**:

```dvc
$ dvc add data/data.xml
$ dvc add data/data.xml
```

DVC stores information about your data file in a special `.dvc` file, that has a
human-readable [description](/doc/user-guide/dvc-file-format) and can be
committed to Git to track versions of your file:

```dvc
$ git add data/.gitignore data/data.xml.dvc
$ git commit -m "add source data to DVC"
$ git add data/.gitignore data/data.xml.dvc
$ git commit -m "add source data to DVC"
```

<details>
Expand All @@ -44,9 +44,9 @@ You can see that actual data file has been moved to the `.dvc/cache` directory
(usually hardlink or reflink is created, so no physical copying is happening).

```dvc
$ ls -R .dvc/cache
.dvc/cache/a3:
04afb96060aad90176268345e10355
$ ls -R .dvc/cache
.dvc/cache/a3:
04afb96060aad90176268345e10355
```

where `a304afb96060aad90176268345e10355` is an MD5 hash of the `data.xml` file.
Expand Down
20 changes: 10 additions & 10 deletions static/docs/get-started/compare-experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ Let's run evaluate for the latest `bigram` experiment we created in one of the
previous steps. It mostly takes just running the `dvc repro`:

```dvc
$ git checkout master
$ dvc checkout
$ dvc repro evaluate.dvc
$ git checkout master
$ dvc checkout
$ dvc repro evaluate.dvc
```

`git checkout master` and `dvc checkout` commands ensure that we have the latest
Expand All @@ -21,19 +21,19 @@ experiment code and data respectively. And `dvc repro`, as we discussed in the
commands to build the model and measure its performance.

```dvc
$ git commit -a -m "evaluate bigram model"
$ git tag -a "bigram-experiment" -m "bigrams"
$ git commit -a -m "evaluate bigram model"
$ git tag -a "bigram-experiment" -m "bigrams"
```
Now, we can use `-T` option of the `dvc metrics show` command to see the
difference between the `baseline` and `bigrams` experiments:

```dvc
$ dvc metrics show -T
$ dvc metrics show -T
baseline-experiment:
auc.metric: 0.588765
bigram-experiment:
auc.metric: 0.620421
baseline-experiment:
auc.metric: 0.588765
bigram-experiment:
auc.metric: 0.620421
```

DVC provides built-in support to track and navigate `JSON`, `TSV` or `CSV`
Expand Down
6 changes: 3 additions & 3 deletions static/docs/get-started/configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ project/repository itself.
</details>

```dvc
$ dvc remote add -d myremote /tmp/dvc-storage
$ git commit .dvc/config -m "initialize DVC local remote"
$ dvc remote add -d myremote /tmp/dvc-storage
$ git commit .dvc/config -m "initialize DVC local remote"
```
> We only use a local remote in this guide for simplicity's sake in following
> these basic steps as you are learning to use DVC. We realize that for most
Expand Down Expand Up @@ -53,7 +53,7 @@ for all remotes.
For example, to setup an S3 remote we would use something like:

```dvc
$ dvc remote add -d s3remote s3://mybucket/myproject
$ dvc remote add -d s3remote s3://mybucket/myproject
```
> This command is only shown for informational purposes. No need to actually run
> it in order to continue with this guide.
Expand Down
66 changes: 33 additions & 33 deletions static/docs/get-started/connect-code-and-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ to get the sample code:
> On Windows just use your browser to download the archive instead.
```dvc
$ wget https://dvc.org/s3/get-started/code.zip
$ unzip code.zip
$ rm -f code.zip
$ wget https://dvc.org/s3/get-started/code.zip
$ unzip code.zip
$ rm -f code.zip
```

You'll also need to install its dependencies: Python packages like `pandas` and
Expand All @@ -27,34 +27,34 @@ You'll also need to install its dependencies: Python packages like `pandas` and
After downloading the sample code, your project structure should look like this:

```dvc
$ tree
.
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ data.xml
β”‚Β Β  └── data.xml.dvc
β”œβ”€β”€ requirements.txt
└── src
Β Β  β”œβ”€β”€ evaluate.py
Β Β  β”œβ”€β”€ featurization.py
Β Β  β”œβ”€β”€ prepare.py
 └── train.py
$ tree
.
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ data.xml
β”‚Β Β  └── data.xml.dvc
β”œβ”€β”€ requirements.txt
└── src
Β Β  β”œβ”€β”€ evaluate.py
Β Β  β”œβ”€β”€ featurization.py
Β Β  β”œβ”€β”€ prepare.py
 └── train.py
```

We **strongly** recommend using `virtualenv` or a similar tool to isolate your
environment:

```dvc
$ virtualenv .env
$ echo ".env/" >> .gitignore
$ source .env/bin/activate
$ virtualenv .env
$ echo ".env/" >> .gitignore
$ source .env/bin/activate
```

Now, we are ready to install dependencies to run the code:

```dvc
$ pip install -U -r requirements.txt
$ git add .
$ git commit -m "add code"
$ pip install -U -r requirements.txt
$ git add .
$ git commit -m "add code"
```

</details>
Expand All @@ -64,10 +64,10 @@ command transforms it into a reproducible **stage** for the ML **pipeline**
(describes in the next chapter).

```dvc
$ dvc run -f prepare.dvc \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml
$ dvc run -f prepare.dvc \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml
```

`dvc run` generates the `prepare.dvc` file. It has the same
Expand All @@ -86,18 +86,18 @@ This is how the result should look like now:
```diff
.
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ data.xml
β”‚Β Β  β”œβ”€β”€ data.xml.dvc
+ β”‚Β Β  └── prepared
+ β”‚Β Β  β”œβ”€β”€ test.tsv
+ β”‚Β Β  └── train.tsv
β”‚ β”œβ”€β”€ data.xml
β”‚ β”œβ”€β”€ data.xml.dvc
+ β”‚ └── prepared
+ β”‚ β”œβ”€β”€ test.tsv
+ β”‚ └── train.tsv
+ β”œβ”€β”€ prepare.dvc
β”œβ”€β”€ requirements.txt
└── src
Β Β  β”œβ”€β”€ evaluate.py
Β Β  β”œβ”€β”€ featurization.py
Β Β  β”œβ”€β”€ prepare.py
 └── train.py
β”œβ”€β”€ evaluate.py
β”œβ”€β”€ featurization.py
β”œβ”€β”€ prepare.py
└── train.py
```

This is how `prepare.dvc` looks like internally:
Expand Down
Loading

0 comments on commit ab74826

Please sign in to comment.