diff --git a/content/docs/use-cases/index.md b/content/docs/use-cases/index.md
index 971ad2e99f..bc316c479b 100644
--- a/content/docs/use-cases/index.md
+++ b/content/docs/use-cases/index.md
@@ -1,18 +1,17 @@
# Use Cases
-We provide short articles on common ML workflow or data management scenarios
-that DVC can help with or improve. Our use cases are not written to be run
-end-to-end like tutorials. For more general, hands-on experience with DVC,
-please see our [Get Started](/doc/tutorials/get-started) instead.
+We provide short articles on common ML workflow and data science use cases that
+DVC can help with or improve. Our use cases are not written to be run end-to-end
+like tutorials. For more general, hands-on experience with DVC, please see
+[Get Started](/doc/tutorials/get-started) instead.
## Why DVC?
Even with all the success we've seen today in machine learning (ML), especially
-with deep learning and its applications in business, the data science community
-still lacks good practices for organizing their projects and collaborating
-effectively. This is a critical challenge: while ML algorithms and methods are
-no longer tribal knowledge, they are still difficult to implement, reuse, and
-manage.
+with deep learning and its applications in business, data scientists still lack
+best practices for organizing their projects and collaborating effectively. This
+is a critical challenge: while ML algorithms and methods are no longer tribal
+knowledge, they are still difficult to implement, reuse, and manage.
## Basic uses of DVC
@@ -20,10 +19,11 @@ If you store and process data files or datasets to produce other data or machine
learning models, and you want to
- capture and save data artifacts the same way you capture code;
-- track and switch between different versions of data or models easily;
-- understand how data or models were built in the first place;
-- be able to compare models and metrics to each other;
-- bring software engineering best practices to your data science team
+- track, control, and switch between different versions of data or models
+ easily;
+- understand how data or ML models were built in the first place;
+- compare machine learning models and metrics to each other;
+- bring software engineering best practices and tools to your data science team
DVC is for you!
diff --git a/content/docs/use-cases/versioning-data-and-model-files/index.md b/content/docs/use-cases/versioning-data-and-model-files/index.md
index be28edd905..448bdeab55 100644
--- a/content/docs/use-cases/versioning-data-and-model-files/index.md
+++ b/content/docs/use-cases/versioning-data-and-model-files/index.md
@@ -11,8 +11,8 @@ pull requests, etc.)
To actually store the data, DVC uses a built-in cache, and supports
synchronizing it with various types of
-[remote storage](/doc/command-reference/remote). This allows storing and sharing
-data easily, and alongside code.
+[remote storage](/doc/command-reference/remote). This allows for easy data and
+model versioning, storage, and sharing — right alongside code.
![](/img/model-versioning-diagram.png) _Code and data flows in DVC_
@@ -30,9 +30,9 @@ on-premises storage (e.g. SSH, NAS) as well as any major cloud storage provider
## DVC is not Git!
DVC metafiles such as `dvc.yaml` and `.dvc` files serve as placeholders to track
-data files and directories (among other purposes). They point to specific data
-contents in the cache, providing the ability to store multiple data
-versions out-of-the-box.
+data files and directories for versioning (among other purposes). They point to
+specific data contents in the cache, providing the ability to store
+multiple data versions out-of-the-box.
Full-fledged
[version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
@@ -46,7 +46,7 @@ several other novel features (see [Get Started](/doc/start/) for a primer.)
Let's say you have an empty DVC repository and put a dataset of
images in the `images/` directory. You can start tracking it with `dvc add`.
-This generate a `.dvc` file, which can be committed to Git in order to save the
+This generates a `.dvc` file, which can be committed to Git in order to save the
project's version:
```dvc
@@ -116,7 +116,8 @@ M model.pkl
```
However, we can checkout certain parts only, for example if we want to keep the
-latest source code and model but rewind to the previous dataset only:
+latest source code and model versions, but rewind to the previous version of the
+dataset:
```dvc
$ git checkout v1.0 images.dvc
@@ -125,5 +126,5 @@ M images
```
DVC [optimizes](/doc/user-guide/large-dataset-optimization) this operation by
-avoiding copying files each time, so checking out data is quick even if you have
-large data files.
+avoiding copying files each time, so checking out data is quick even if you are
+versioning large data files.
diff --git a/content/docs/use-cases/versioning-data-and-model-files/tutorial.md b/content/docs/use-cases/versioning-data-and-model-files/tutorial.md
index ad8c5a628e..20d6766ca3 100644
--- a/content/docs/use-cases/versioning-data-and-model-files/tutorial.md
+++ b/content/docs/use-cases/versioning-data-and-model-files/tutorial.md
@@ -1,8 +1,8 @@
-# Tutorial: Versioning
+# Tutorial: Data & Model Versioning
The goal of this example is to give you some hands-on experience with a basic
-machine learning version control scenario: working with multiple versions of
-datasets and ML models using DVC commands. We'll work with a
+machine learning version control scenario: managing multiple datasets and ML
+model versions using DVC commands. We'll work with a
[tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html)
that [François Chollet](https://twitter.com/fchollet) put together to show how
to build a powerful image classifier using a pretty small dataset.
@@ -237,9 +237,9 @@ $ git commit -m "Second model, trained with 2000 images"
$ git tag -a "v2.0" -m "model v2.0, 2000 images"
```
-That's it! We have tracked a second dataset, model, and metrics versioned DVC,
-and the DVC-files that point to them committed with Git. Let's now look at how
-DVC can help us go back to the previous version if we need to.
+That's it! We've tracked a second version of the dataset, model, and metrics in
+DVC and committed the DVC-files that point to them with Git. Let's now look at
+how DVC can help us go back to the previous version if we need to.
## Switching between workspace versions
@@ -338,15 +338,15 @@ changed. For example, when we added new images to built the second version of
our model, that was a dependency change. It also updates outputs and puts them
into the cache.
-To make things a little simpler: if `dvc add` and `dvc checkout` provide a basic
-mechanism to version control large data files or models, `dvc run` and
-`dvc repro` provide a build system for ML models, which is similar to
+To make things a little simpler: `dvc add` and `dvc checkout` provide a basic
+mechanism for model and large dataset versioning. `dvc run` and `dvc repro`
+provide a build system for machine learning models, which is similar to
[Make](https://www.gnu.org/software/make/) in software build automation.
## What's next?
-In this example, our focus was on giving you hands-on experience with versioning
-ML models and datasets. We specifically looked at the `dvc add` and
+In this example, our focus was on giving you hands-on experience with dataset
+and ML model versioning. We specifically looked at the `dvc add` and
`dvc checkout` commands. We'd also like to outline some topics and ideas you
might be interested to try next to learn more about DVC and how it makes
managing ML projects simpler.
diff --git a/content/docs/user-guide/what-is-dvc.md b/content/docs/user-guide/what-is-dvc.md
index 18f86f0acb..ab7e2c2753 100644
--- a/content/docs/user-guide/what-is-dvc.md
+++ b/content/docs/user-guide/what-is-dvc.md
@@ -1,6 +1,6 @@
# What Is DVC?
-**Data Version Control** is a new type of data versioning, workflow and
+**Data Version Control** is a new type of data versioning, workflow, and
experiment management software, that builds upon [Git](https://git-scm.com/)
(although it can work stand-alone). DVC reduces the gap between established
engineering tool sets and data science needs, allowing users to take advantage
@@ -10,7 +10,8 @@ of new [features](#core-features) while reusing existing skills and intuition.
Data science experiment sharing and collaboration can be done through a regular
Git flow (commits, branching, pull requests, etc.), the same way it works for
-software engineers.
+software engineers. Using Git and DVC, data science and machine learning teams
+can version experiments, manage large datasets, and make projects reproducible.
## Core Features
@@ -22,7 +23,7 @@ software engineers.
[versioning](/doc/use-cases/versioning-data-and-model-files) capabilities.
- **Data versioning** is enabled by replacing large files, dataset directories,
- ML models, etc. with small
+ machine learning models, etc. with small
[metafiles](/doc/user-guide/dvc-files-and-directories) (easy to handle with
Git). These placeholders point to the original data, which is decoupled from
source code management.