Skip to content

Commit

Permalink
doc: Git does not replace (or include) Git
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeorpinel committed Sep 29, 2022
1 parent 22f66fa commit 4b1eff0
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 22 deletions.
13 changes: 11 additions & 2 deletions content/docs/install/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
# Installation

> Please double check that you don't already have DVC (for example running
> `which dvc`) before trying to install it.
<admon>

DVC does not replace or include Git. You must have `git` in your system to
enable important features such as [data versioning] and [quick experimentation]
(recommended).

[data versioning]: /doc/use-cases/versioning-data-and-models
[quick experimentation]:
/doc/user-guide/experiment-management/experiments-overview

</admon>

- [Install on macOS](/doc/install/macos)
- [Install on Windows](/doc/install/windows)
Expand Down
8 changes: 4 additions & 4 deletions content/docs/start/data-management/data-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,10 +278,10 @@ $ git commit data/data.xml.dvc -m "Revert dataset updates"

</details>

Yes, DVC is technically not even a version control system! `.dvc` file contents
define data file versions. Git itself provides the version control. DVC in turn
creates these `.dvc` files, updates them, and synchronizes DVC-tracked data in
the <abbr>workspace</abbr> efficiently to match them.
Yes, DVC is technically not a version control system! Git itself provides that
layer. DVC in turn manipulates `.dvc` files, whose contents define the data file
versions. DVC also synchronizes DVC-tracked data in the <abbr>workspace</abbr>
efficiently to match them.

## Large datasets versioning

Expand Down
14 changes: 9 additions & 5 deletions content/docs/user-guide/project-structure/index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Project Structure

Using `dvc init` in your <abbr>workspace</abbr> will start a <abbr>DVC
Using `dvc init` in your <abbr>workspace</abbr> will initialize a <abbr>DVC
project</abbr>, including the internal `.dvc/` directory. From there on, you
will create and manage different DVC files and populate the <abbr>cache</abbr>
as you use DVC and work on your data science experiments.
will create and manage different DVC metafiles (below), and populate the
<abbr>cache</abbr> with data artifacts as you work on your ML experiments.

- `dvc.yaml` files define stages that form the pipeline(s) of a project. All
stage-based features such as `dvc params`, `dvc metrics`, and `dvc plots` are
Expand All @@ -19,5 +19,9 @@ as you use DVC and work on your data science experiments.
[configuration](/doc/command-reference/config) file(s), default local cache
location, and other utilities that DVC needs to operate.

These metafiles should be versioned with Git (in Git-enabled
<abbr>repositories</abbr>).
<admon type="info">

These metafiles are typically versioned with Git, as DVC does not replace its
distributed version control features, but rather extends on them.

</admon>
23 changes: 12 additions & 11 deletions content/docs/user-guide/what-is-dvc.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@

## DVC does not replace Git!

DVC files such as `dvc.yaml` and `.dvc` files serve as placeholders to track
large data files and directories for versioning (among other
[purposes](/doc/user-guide/project-structure)). These metafiles change along
with your data, and you can use Git to place them under
[version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
as a proxy to the actual data versions, which are stored in the <abbr>DVC
cache</abbr> (outside of Git). This does not replace features of Git.

DVC does, however, provide several commands similar to Git such as `dvc init`,
`dvc add`, `dvc checkout`, or `dvc push`, which interact with the underlying Git
repo (if one is being used, which is not required).
[DVC metafiles] change along with your data, and you can use Git to place them
under distributed [version control] as a proxy to the actual data versions,
which are stored in the <abbr>DVC cache</abbr> (outside of Git). DVC does not
replace features of Git, but rather extends on them for ML-specific needs.

DVC does provide several commands similar to those in `git`, such as `dvc init`,
`dvc add`, `dvc checkout`, and `dvc push`. DVC operations interact with the
underlying Git repo (if one is being used, which is not required).

[dvc metafiles]: (/doc/user-guide/project-structure)
[version control]:
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control

0 comments on commit 4b1eff0

Please sign in to comment.