diff --git a/content/docs/install/index.md b/content/docs/install/index.md index 29d8ceca42..241cf45caf 100644 --- a/content/docs/install/index.md +++ b/content/docs/install/index.md @@ -1,7 +1,16 @@ # Installation -> Please double check that you don't already have DVC (for example running -> `which dvc`) before trying to install it. + + +DVC does not replace or include Git. You must have `git` in your system to +enable important features such as [data versioning] and [quick experimentation] +(recommended). + +[data versioning]: /doc/use-cases/versioning-data-and-models +[quick experimentation]: + /doc/user-guide/experiment-management/experiments-overview + + - [Install on macOS](/doc/install/macos) - [Install on Windows](/doc/install/windows) diff --git a/content/docs/start/data-management/data-versioning.md b/content/docs/start/data-management/data-versioning.md index bd73bc756b..3766c167ea 100644 --- a/content/docs/start/data-management/data-versioning.md +++ b/content/docs/start/data-management/data-versioning.md @@ -278,10 +278,10 @@ $ git commit data/data.xml.dvc -m "Revert dataset updates" -Yes, DVC is technically not even a version control system! `.dvc` file contents -define data file versions. Git itself provides the version control. DVC in turn -creates these `.dvc` files, updates them, and synchronizes DVC-tracked data in -the workspace efficiently to match them. +Yes, DVC is technically not a version control system! Git itself provides that +layer. DVC in turn manipulates `.dvc` files, whose contents define the data file +versions. DVC also synchronizes DVC-tracked data in the workspace +efficiently to match them. ## Large datasets versioning diff --git a/content/docs/user-guide/project-structure/index.md b/content/docs/user-guide/project-structure/index.md index 4bf4b428bf..ae6682bd82 100644 --- a/content/docs/user-guide/project-structure/index.md +++ b/content/docs/user-guide/project-structure/index.md @@ -1,9 +1,9 @@ # Project Structure -Using `dvc init` in your workspace will start a DVC +Using `dvc init` in your workspace will initialize a DVC project, including the internal `.dvc/` directory. From there on, you -will create and manage different DVC files and populate the cache -as you use DVC and work on your data science experiments. +will create and manage different DVC metafiles (below), and populate the +cache with data artifacts as you work on your ML experiments. - `dvc.yaml` files define stages that form the pipeline(s) of a project. All stage-based features such as `dvc params`, `dvc metrics`, and `dvc plots` are @@ -19,5 +19,9 @@ as you use DVC and work on your data science experiments. [configuration](/doc/command-reference/config) file(s), default local cache location, and other utilities that DVC needs to operate. -These metafiles should be versioned with Git (in Git-enabled -repositories). + + +These metafiles are typically versioned with Git, as DVC does not replace its +distributed version control features, but rather extends on them. + + diff --git a/content/docs/user-guide/what-is-dvc.md b/content/docs/user-guide/what-is-dvc.md index d07351a830..db057a8b51 100644 --- a/content/docs/user-guide/what-is-dvc.md +++ b/content/docs/user-guide/what-is-dvc.md @@ -43,14 +43,15 @@ ## DVC does not replace Git! -DVC files such as `dvc.yaml` and `.dvc` files serve as placeholders to track -large data files and directories for versioning (among other -[purposes](/doc/user-guide/project-structure)). These metafiles change along -with your data, and you can use Git to place them under -[version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control) -as a proxy to the actual data versions, which are stored in the DVC -cache (outside of Git). This does not replace features of Git. - -DVC does, however, provide several commands similar to Git such as `dvc init`, -`dvc add`, `dvc checkout`, or `dvc push`, which interact with the underlying Git -repo (if one is being used, which is not required). +[DVC metafiles] change along with your data, and you can use Git to place them +under distributed [version control] as a proxy to the actual data versions, +which are stored in the DVC cache (outside of Git). DVC does not +replace features of Git, but rather extends on them for ML-specific needs. + +DVC does provide several commands similar to those in `git`, such as `dvc init`, +`dvc add`, `dvc checkout`, and `dvc push`. DVC operations interact with the +underlying Git repo (if one is being used, which is not required). + +[dvc metafiles]: /doc/user-guide/project-structure +[version control]: + https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control