Skip to content

Commit

Permalink
diff: update docs according to the new patch (#953)
Browse files Browse the repository at this point in the history
* diff: update docs according to the new patch

* restore PR template

* address diff command review

* address diff command review

* minor change to diff descritption, address PR comment

Co-authored-by: Ivan Shcheklein <[email protected]>
  • Loading branch information
Mr. Outis and shcheklein authored Feb 24, 2020
1 parent 569e53d commit 1e680ef
Showing 1 changed file with 115 additions and 112 deletions.
227 changes: 115 additions & 112 deletions public/static/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,53 @@
# diff

Show changes between commits in the <abbr>DVC repository</abbr>, or between a
commit and the <abbr>workspace</abbr>. The comparison can be narrowed down to
specific target files/directories tracked by DVC.
Show added, modified, or deleted DVC-tracked files and directories between
commits in the <abbr>DVC repository</abbr>, or between a commit and the
workspace.

## Synopsis

```usage
usage: dvc diff [-h] [-q | -v] [-t TARGET] a_ref [b_ref]
usage: dvc diff [-h] [-q | -v]
[--show-json] [--show-hash]
[a_rev] [b_rev]
positional arguments:
a_rev Old Git commit to compare (defaults to HEAD)
b_rev New Git commit to compare (defaults to the
current workspace)
a_rev Old Git commit to compare (defaults to HEAD)
b_rev New Git commit to compare (defaults to the current workspace)
```

## Description

Given two commit hashes, branch or tag names, etc.
([references](https://git-scm.com/docs/revisions)) `a_ref` and `b_ref`, this
command shows a comparative summary of basic statistics related to files tracked
by DVC: how many files were deleted/changed, and the file size differences.
Prints a list of files and directories added, modified, deleted in a Git commit
`b_rev` as compared to another Git commit `a_rev`. Both `a_rev` and `b_rev`
accept any [Git revision](https://git-scm.com/docs/gitrevisions) - branch or tag
name, Git commit hash, etc.

> Note that `dvc diff` does not show the line-to-line comparisons like
> `git diff` or [GNU `diff`](https://www.gnu.org/software/diffutils/) can. This
> is because the data data tracked by DVC comes in many formats such as
> structured text, binary blobs, etc. For an example on how to create
> line-to-line text file comparison, refer to
> [issue #770](https://github.com/iterative/dvc/issues/770#issuecomment-512693256).
It defaults to comparing the current workspace and the last commit (`HEAD`), if
arguments `a_rev` and `b_rev` are not specified.

Options `--show-json` and `--show-hash` can be used to modify format and details
of the output produced. See the [Options](#options) and (Examples)(#examples)
sections below for more details.

`dvc diff` does not have an effect when the repository is not tracked by Git,
for example when `dvc init` was used with the `--no-scm` option.

> Note that current `dvc diff` implementation does not show the line-to-line
> comparison among the files in each revision, like `git diff` or
> [GNU `diff`](https://www.gnu.org/software/diffutils/) can. This is because the
> data data tracked by DVC can come in many possible formats e.g. structured
> text, or binary blobs, etc. For an example on how to create line-to-line text
> file comparison, refer to this
> [comment](https://github.com/iterative/dvc/issues/770#issuecomment-512693256).
## Options

- `-t TARGET`, `--target TARGET` - path to a data file or directory to limit
diff for.
- `--show-json` - generate output in JSON format. Usually needed to integrate
DVC into scripts.

- `--show-hash` - print file and directory hash values along with their path.
Useful for debug purposes.

- `-h`, `--help` - prints the usage/help message, and exit.

Expand All @@ -46,148 +58,139 @@ for example when `dvc init` was used with the `--no-scm` option.

## Examples

For these examples we can use the chapters in our
[Get Started](/doc/get-started) section, up to
[Add Files](/doc/get-started/add-files).
For these examples we can use the [Get Started](/doc/get-started) project.

<details>

### Click and expand to setup example
### Click and expand to setup the project to run examples

Start by cloning our example repo if you don't already have it. Then move into
the repo and checkout the
[3-add-file](https://github.com/iterative/example-get-started/releases/tag/3-add-file)
tag, corresponding to the [Add Files](/doc/get-started/add-files) _Get Started_
chapter:
Start by cloning our example repo if you don't already have it:

```dvc
$ git clone https://github.com/iterative/example-get-started
$ cd example-get-started
$ git checkout 3-add-file
```

Download the precomputed data using:
Download data using:

```dvc
$ dvc pull
$ dvc fetch -T
Preparing to download data from 'https://remote.dvc.org/get-started'
...
```

</details>
The `-T` flag passed to `dvc fetch` makes sure we have all the data files
related to all existing tags in the repo. You may see the available tags of our
example repo [here](https://github.com/iterative/example-get-started/tags).

## Example: Previous commit in the same branch
</details>

The minimal `dvc diff`, run without arguments, defaults to comparing DVC-tacked
files between `HEAD` (current Git commit) and the current <abbr>workspace</abbr>
(uncommitted changes, if any).
## Example: Checking workspace changes

To see the difference between the very previous commit of the project and the
workspace, we can use `HEAD^` as `a_ref`:
The minimal `dvc diff`, run without arguments, defaults to comparing DVC-tracked
files between `HEAD` (last Git commit) and the current <abbr>workspace</abbr>
(uncommitted changes, if any):

```dvc
$ dvc diff HEAD^
dvc diff from df613bc to ed10968
diff for 'data/data.xml'
+data/data.xml with md5 a304afb96060aad90176268345e10355
added file with size 37.9 MB
$ dvc diff
```

## Example: Specific targets across Git commits

We can base this example in the [Metrics](/doc/get-started/metrics) and
[Compare Experiments](/doc/get-started/compare-experiments) chapters of our _Get
Started_ section, that describe different experiments to produce the `model.pkl`
file. Our example repository has the `bigrams-experiment` and
`baseline-experiment`
[tags](https://github.com/iterative/example-get-started/tags) respectively to
reference these experiments.
## Example: Comparing workspace with arbitrary commits

<details>

### Click and expand to setup example
### Click and expand to setup the example

Having followed the previous example's setup, move into the
`example-get-started/` directory. Then make sure that you have the latest code
and data with the following commands.
Let's checkout the
[3-add-file](https://github.com/iterative/example-get-started/releases/tag/3-add-file)
tag, corresponding to the [Add Files](/doc/get-started/add-files) _Get Started_
chapter, right after we added `data.xml` file with DVC:

```dvc
$ git checkout master
$ dvc fetch -T
$ git checkout 3-add-file
$ dvc pull
```

The `-T` flag passed to `dvc fetch` makes sure we have all the data files
related to all existing tags in the repo. You take a look at the
[available tags](https://github.com/iterative/example-get-started/tags) of our
example repo.

</details>

To see the difference in `model.pkl` among these tags, we can run the following
command.
To see the difference between the very previous commit of the project and the
workspace, we can use `HEAD^` as `a_ref`:

```dvc
$ dvc diff -t model.pkl baseline-experiment bigrams-experiment
dvc diff from bc1722d to 8c1169d
$ dvc diff HEAD^
Added:
data/data.xml
diff for 'model.pkl'
-model.pkl with md5 a664896
+model.pkl with md5 3863d0e
...
files summary: 1 added, 0 deleted, 0 modified
```

The output from this command confirms that there's a difference in the
`model.pkl` file between the 2 Git commits (tags `baseline-experiment` and
`bigrams-experiment`) we indicated.
## Example: Comparing tags or branches

### What about directories?
<details>

Unlike Git, DVC features controlling entire directories without having to add
each individual file. See `dvc add` without `--recursive` for example. `dvc run`
can track entire directories (when these are specified as command dependencies
or <abbr>outputs</abbr>).
### Click and expand to setup the example

We can use `dvc diff` to check for changes in a directory by specifying the
directory as the target (with option `-t`). Note that we skip the `b_ref`
argument this time, that defaults to `HEAD`.
Our example repository has the `baseline-experiment` and `bigrams-experiment`
[tags](https://github.com/iterative/example-get-started/tags) tags, that
reference two different modeling experiments.

Having followed the example's setup, move into the `example-get-started/`
directory. Then make sure that you have the latest code and data with the
following commands:

```dvc
$ dvc diff -t data/features baseline-experiment
dvc diff from bc1722d to 8c1169d
$ git checkout master
$ dvc checkout
```

diff for 'data/features'
-data/features with md5 3338d2c.dir
+data/features with md5 42c7025.dir
</details>

0 files not changed, 0 files modified, 0 files added,
0 files deleted, size was increased by 2.9 MB
```dvc
$ dvc diff baseline-experiment bigrams-experiment
Modified:
auc.metric
data/features/
data/features/test.pkl
data/features/train.pkl
model.pkl
files summary: 0 added, 0 deleted, 4 modified
```

## Example: Confirming that a target has not changed
The output from this command confirms that there's a difference in 4 files
between the tags `baseline-experiment` and `bigrams-experiment`.

Let's use our example repo once again, that has several
[available tags](https://github.com/iterative/example-get-started/tags) for
conveniency. The `5-preparation` tag corresponds to the
[Connect Code and Data](/doc/get-started/connect-code-and-data) chapter of our
_Get Started_ section, where the `dvc run` command is used to create a
`prepare.dvc` stage file. This DVC-file tracks the `data/prepared` directory
<abbr>output</abbr>.
## Example: Using different output formats

```dvc
$ dvc diff -t data/prepared 5-preparation
dvc diff from 3deeec1 to 8c1169d
diff for 'data/prepared'
-data/prepared with md5 6836f79.dir
+data/prepared with md5 6836f79.dir
Let's use the same command as above, but with JSON output and including hash
values:

2 files not changed, 0 files modified, 0 files added,
0 files deleted, size was not changed
```dvc
$ dvc diff --show-json --show-hash \
baseline-experiment bigrams-experiment
```

The command above checks whether there have been any changes to the
`data/prepared` directory after the `5-preparation` tag (since the `b_ref` is
`HEAD` by default). The output tells us that there have been no changes to that
directory (or to any other file).
It outputs:

```json
{
"added": [],
"deleted": [],
"modified": [
...{
"path": "data/features/",
"hash": {
"old": "3338d2c21bdb521cda0ba4add89e1cb0.dir",
"new": "42c7025fc0edeb174069280d17add2d4.dir"
}
},
...{
"path": "model.pkl",
"hash": {
"old": "43630cce66a2432dcecddc9dd006d0a7",
"new": "662eb7f64216d9c2c1088d0a5e2c6951"
}
}
]
}
```

0 comments on commit 1e680ef

Please sign in to comment.