diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json
index 144c0266c2..fcd91959bd 100644
--- a/content/docs/sidebar.json
+++ b/content/docs/sidebar.json
@@ -122,7 +122,7 @@
},
{
"slug": "data-management",
- "source": false,
+ "source": "data-management/index.md",
"children": [
"large-dataset-optimization",
{
diff --git a/content/docs/user-guide/data-management/index.md b/content/docs/user-guide/data-management/index.md
new file mode 100644
index 0000000000..b610d1a1fa
--- /dev/null
+++ b/content/docs/user-guide/data-management/index.md
@@ -0,0 +1,188 @@
+# Data Management for Machine Learning
+
+
+
+Where and how to store data and ML model files is one of the first decisions
+your team will face, but traditional back up strategies do not fit the Data
+Science lifecycle. Large files end up scattered around multiple buckets;
+Overlapping dataset versions coexist, causing data leakage and inefficient use
+of space; The project evolution is harder to track. What was the name of the
+best model? Is it safe to delete `2020-dset_v2.zip`? Can others reproduce my
+results?
+
+![Direct access storage](/img/direct_access_storage.png) _The S3 bucket on the
+right is shared (and bloated) by several people and projects. You need to know
+the exact location of the correct files, and use cloud-specific tools (e.g. AWS
+CLI) to access them directly._
+
+To maintain control and visibility over all your data and models, DVC stores
+large files and directories for you in a structured way. It tracks them by
+logging their locations and unique descriptions in YAML files. Committing these
+to Git along ML source code creates reproducible project versions (no need for
+special file naming schemes to identify data or model variants). The project
+history becomes easy to review, rewind, and repeat.
+
+![DVC-cached storage](/img/dvc_managed_storage.png) _DVC writes `.dvc` files
+with YAML content next to large files. A data cache indexes them with `md5`
+checksums. Mass storage holds all unique files pushed with DVC for back up or
+sharing._
+
+## How it works
+
+
+
+Let's consider a simple ML project that looks like this:
+
+```
+training.csv
+validation.xml
+model.bin
+src/train.py
+```
+
+![]() _The first two data files are very large (multiple Gigabytes). The model
+file is not as large (several Megabytes) but still large enough to avoid storing
+it in Git. The `.py` code file (last) is safe to commit to Git (some
+Kilobytes)._
+
+DVC appends unique large files to a hidden cache, organized by
+content hashes (similar to an index). As the data changes, its full history can
+be preserved this way, while preventing accidental file deletions.
+
+```cli
+.dvc/cache
+├── 0a/aa77e # training.csv
+├── 3f/db533 # validation.xml before
+├── 6a/2aa4b # validation.xml now
+├── a7/28107 # first model.bin
+ ...
+```
+
+Now that they're cached safely, DVC-tracked files in your workspace
+can be replaced with [file links], so you continue seeing and using them as
+usual. File hashes (usually MD5) are written in human-readable YAML [metafiles]
+next to the original data.
+
+```git
+ training.csv -> .dvc/cache/0a/aa77e
++ training.csv.dvc
+ validation.xml -> .dvc/cache/6a/2aa4b
++ validation.xml.dvc
+ model.bin
+ src/train.py
+```
+
+```yaml
+# validation.xml.dvc
+md5: 6a2aa4b # Note: actual hashes are longer
+path: validation.xml
+```
+
+[metafiles]: /doc/user-guide/project-structure
+[file links]: /doc/user-guide/data-management/large-dataset-optimization
+
+
+
+Data tracked by DVC can be stored in more than one location. You get a project
+cache by default, but it's possible to synchronize all or parts of it with
+[remote storage]. The same content-addressable file structure is used remotely
+unless you enable [cloud versioning], which lets you see a similar directory
+structure in your cloud buckets as in the local project.
+
+[remote storage]: /doc/user-guide/data-management/remote-storage
+[cloud versioning]: /doc/user-guide/data-management/cloud-versioning
+
+
+
+To keep track of relevant versions of the data, models, etc. cached by DVC, the
+corresponding metafiles should be [versioned with Git] (or any SCM) along with
+the rest of the code. This also means that a single file name can represent
+different contents, keeping your project structure clean (use branches or tags
+to organize data versions instead).
+
+[versioned with git]:
+ https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
+
+```cli
+$ git checkout dev-branch
+$ dvc checkout
+$ ls
+ training.csv 2 G # old data
+ model.bin 2.7 M # old model
+ src/train.py 214 K
+
+$ git checkout latest-tag
+$ dvc checkout
+$ ls
+ training.csv 3 G # latest data
+ validation.xml 1 G
+ model.bin 3.2 M # better model
+ src/train.py 354 K
+ src/evaluate.py 175 K # more code
+```
+
+
+
+DVC replaces data assets in the project with code-like YAML [metafiles] (and
+links). Codifying data lets you treat it as a first-class citizen in any code
+repository.
+
+
+
+
diff --git a/content/docs/user-guide/data-management/storage-locations.md b/content/docs/user-guide/data-management/storage-locations.md
new file mode 100644
index 0000000000..9f780e7813
--- /dev/null
+++ b/content/docs/user-guide/data-management/storage-locations.md
@@ -0,0 +1,38 @@
+# Storage locations
+
+DVC can manage data anywhere: cloud storage, SSH servers, network resources
+(e.g. NAS), mounted drives, local file systems, etc. These locations can be put
+into three groups.
+
+![Storage locations](/img/storage-locations.png) _Local, external, and remote
+storage locations_
+
+Every DVC project starts with 2 locations. The
+workspace is the main project directory, containing your data,
+models, source code, etc. DVC also creates a data cache (found
+locally in `.dvc/cache` by default), which will be used as fast-access storage
+for DVC operations.
+
+
+
+The cache can be moved to an external location in the file system or network,
+for example to [share it] among several projects. It could even be set up in a
+remote system (Internet access), but this is typically too slow for working with
+data regularly.
+
+
+
+[share it]: /doc/user-guide/how-to/share-a-dvc-cache
+
+DVC supports additional storage locations such as cloud services (Amazon S3,
+Google Drive, Azure Blob Storage, etc.), SSH servers, network-attached storage,
+etc. These are called [DVC remotes], and help you to share or back up copies of
+your data assets.
+
+
+
+DVC remotes are similar to Git remotes, but for cached data.
+
+
+
+[dvc remotes]: /doc/command-reference/remote
diff --git a/static/img/direct_access_storage.png b/static/img/direct_access_storage.png
new file mode 100644
index 0000000000..75b57231ae
Binary files /dev/null and b/static/img/direct_access_storage.png differ
diff --git a/static/img/dvc_managed_storage.png b/static/img/dvc_managed_storage.png
new file mode 100644
index 0000000000..66aa85b9d1
Binary files /dev/null and b/static/img/dvc_managed_storage.png differ
diff --git a/static/img/project_versioning.png b/static/img/project_versioning.png
new file mode 100644
index 0000000000..d144c483f3
Binary files /dev/null and b/static/img/project_versioning.png differ
diff --git a/static/img/storage-locations.png b/static/img/storage-locations.png
new file mode 100644
index 0000000000..92fa9c7630
Binary files /dev/null and b/static/img/storage-locations.png differ