diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json
index 144c0266c2..fcd91959bd 100644
--- a/content/docs/sidebar.json
+++ b/content/docs/sidebar.json
@@ -122,7 +122,7 @@
       },
       {
         "slug": "data-management",
-        "source": false,
+        "source": "data-management/index.md",
         "children": [
           "large-dataset-optimization",
           {
diff --git a/content/docs/user-guide/data-management/index.md b/content/docs/user-guide/data-management/index.md
new file mode 100644
index 0000000000..b610d1a1fa
--- /dev/null
+++ b/content/docs/user-guide/data-management/index.md
@@ -0,0 +1,188 @@
+# Data Management for Machine Learning
+
+<!--
+## Data Management for Machine Learning
+-->
+
+Where and how to store data and ML model files is one of the first decisions
+your team will face, but traditional back up strategies do not fit the Data
+Science lifecycle. Large files end up scattered around multiple buckets;
+Overlapping dataset versions coexist, causing data leakage and inefficient use
+of space; The project evolution is harder to track. What was the name of the
+best model? Is it safe to delete `2020-dset_v2.zip`? Can others reproduce my
+results?
+
+![Direct access storage](/img/direct_access_storage.png) _The S3 bucket on the
+right is shared (and bloated) by several people and projects. You need to know
+the exact location of the correct files, and use cloud-specific tools (e.g. AWS
+CLI) to access them directly._
+
+To maintain control and visibility over all your data and models, DVC stores
+large files and directories for you in a structured way. It tracks them by
+logging their locations and unique descriptions in YAML files. Committing these
+to Git along ML source code creates reproducible project versions (no need for
+special file naming schemes to identify data or model variants). The project
+history becomes easy to review, rewind, and repeat.
+
+![DVC-cached storage](/img/dvc_managed_storage.png) _DVC writes `.dvc` files
+with YAML content next to large files. A data cache indexes them with `md5`
+checksums. Mass storage holds all unique files pushed with DVC for back up or
+sharing._
+
+## How it works
+
+<!--
+![Versioning data with Git](/img/project_versioning.png) _You can use Git
+history to store different datasets and model versions without renaming any
+files in your workspace. The project cache grows as more relevant versions are
+tracked._
+-->
+
+Let's consider a simple ML project that looks like this:
+
+```
+training.csv
+validation.xml
+model.bin
+src/train.py
+```
+
+![]() _The first two data files are very large (multiple Gigabytes). The model
+file is not as large (several Megabytes) but still large enough to avoid storing
+it in Git. The `.py` code file (last) is safe to commit to Git (some
+Kilobytes)._
+
+DVC appends unique large files to a hidden <abbr>cache</abbr>, organized by
+content hashes (similar to an index). As the data changes, its full history can
+be preserved this way, while preventing accidental file deletions.
+
+```cli
+.dvc/cache
+├── 0a/aa77e # training.csv
+├── 3f/db533 # validation.xml before
+├── 6a/2aa4b # validation.xml now
+├── a7/28107 # first model.bin
+    ...
+```
+
+Now that they're cached safely, DVC-tracked files in your <abbr>workspace</abbr>
+can be replaced with [file links], so you continue seeing and using them as
+usual. File hashes (usually MD5) are written in human-readable YAML [metafiles]
+next to the original data.
+
+```git
+  training.csv -> .dvc/cache/0a/aa77e
++ training.csv.dvc
+  validation.xml -> .dvc/cache/6a/2aa4b
++ validation.xml.dvc
+  model.bin
+  src/train.py
+```
+
+```yaml
+# validation.xml.dvc
+md5: 6a2aa4b # Note: actual hashes are longer
+path: validation.xml
+```
+
+[metafiles]: /doc/user-guide/project-structure
+[file links]: /doc/user-guide/data-management/large-dataset-optimization
+
+<admon type="tip" title="Remote storage">
+
+Data tracked by DVC can be stored in more than one location. You get a project
+cache by default, but it's possible to synchronize all or parts of it with
+[remote storage]. The same content-addressable file structure is used remotely
+unless you enable [cloud versioning], which lets you see a similar directory
+structure in your cloud buckets as in the local project.
+
+[remote storage]: /doc/user-guide/data-management/remote-storage
+[cloud versioning]: /doc/user-guide/data-management/cloud-versioning
+
+</admon>
+
+To keep track of relevant versions of the data, models, etc. cached by DVC, the
+corresponding metafiles should be [versioned with Git] (or any SCM) along with
+the rest of the code. This also means that a single file name can represent
+different contents, keeping your project structure clean (use branches or tags
+to organize data versions instead).
+
+[versioned with git]:
+  https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
+
+```cli
+$ git checkout dev-branch
+$ dvc checkout
+$ ls
+  training.csv      2 G # old data
+  model.bin       2.7 M # old model
+  src/train.py    214 K
+
+$ git checkout latest-tag
+$ dvc checkout
+$ ls
+  training.csv      3 G # latest data
+  validation.xml    1 G
+  model.bin       3.2 M # better model
+  src/train.py    354 K
+  src/evaluate.py 175 K # more code
+```
+
+<admon type="info" title="Data codification">
+
+DVC replaces data assets in the project with code-like YAML [metafiles] (and
+links). Codifying data lets you treat it as a first-class citizen in any code
+repository.
+
+</admon>
+
+<!-- ## Tradeoff
+
+Adopting DVC's approach requires a few key changes to your workflow:
+
+1. Relevant data and models are registered in a code repository (typically Git).
+1. Data operations (add, remove, move, etc.) happen [indirectly]: DVC checks the
+   metadata to locate files in both sides.
+1. Stored objects managed with DVC are not intended for handling manually.
+
+[indirectly]: https://en.wikipedia.org/wiki/Indirection
+
+At the same time, it comes with many benefits:
+
+- Easily manage **data as code** and [optimize space usage][file links]
+  automatically.
+- DVC keeps track of large files and directories for you, mapping them between
+  your <abbr>workspace</abbr> and storage.
+- Easily share, distribute, and migrate data among one or more storage locations
+  ([multiple providers supported]).
+- Your <abbr>repository</abbr> stays small and easy **collaborate** on (using
+  regular [Git workflows]).
+- [Data versioning] guarantees ML **reproducibility**.
+- Use a **consistent interface** to access and sync data anywhere (via [CLI],
+  [API], [IDE], or [web]), regardless of the storage platform (S3, GDrive, NAS,
+  etc.).
+- Data **integrity** based on a Git-based storage; Data **security** through an
+  authored project history that can be audited.
+- Advanced features: [Data registries], [ML pipelines], [CI/CD for ML],
+  [productize] your ML models, and more!
+
+[multiple providers supported]:
+  /doc/command-reference/remote/add#supported-storage-types
+[git workflows]:
+  https://git-scm.com/book/en/v2/Distributed-Git-Distributed-Workflows
+[data versioning]: /doc/use-cases/versioning-data-and-models
+[cli]: /doc/command-reference
+[api]: /doc/api-reference
+[ide]: /doc/vs-code-extension
+[web]: /doc/studio
+[data registries]: /doc/use-cases/data-registry
+[ml pipelines]: /doc/user-guide/pipelines
+[ci/cd for ml]: https://cml.dev/
+[productize]: https://mlem.ai/
+
+---
+
+In summary, DVC establishes a mature method to manage data assets for ML
+projects, letting you focus on more important tasks like exploration,
+preparation, cross validation, etc.
+-->
diff --git a/content/docs/user-guide/data-management/storage-locations.md b/content/docs/user-guide/data-management/storage-locations.md
new file mode 100644
index 0000000000..9f780e7813
--- /dev/null
+++ b/content/docs/user-guide/data-management/storage-locations.md
@@ -0,0 +1,38 @@
+# Storage locations
+
+DVC can manage data anywhere: cloud storage, SSH servers, network resources
+(e.g. NAS), mounted drives, local file systems, etc. These locations can be put
+into three groups.
+
+![Storage locations](/img/storage-locations.png) _Local, external, and remote
+storage locations_
+
+Every <abbr>DVC project</abbr> starts with 2 locations. The
+<abbr>workspace</abbr> is the main project directory, containing your data,
+models, source code, etc. DVC also creates a <abbr>data cache</abbr> (found
+locally in `.dvc/cache` by default), which will be used as fast-access storage
+for DVC operations.
+
+<admon type="tip">
+
+The cache can be moved to an external location in the file system or network,
+for example to [share it] among several projects. It could even be set up in a
+remote system (Internet access), but this is typically too slow for working with
+data regularly.
+
+</admon>
+
+[share it]: /doc/user-guide/how-to/share-a-dvc-cache
+
+DVC supports additional storage locations such as cloud services (Amazon S3,
+Google Drive, Azure Blob Storage, etc.), SSH servers, network-attached storage,
+etc. These are called [DVC remotes], and help you to share or back up copies of
+your data assets.
+
+<admon type="info">
+
+DVC remotes are similar to Git remotes, but for <abbr>cached</abbr> data.
+
+</admon>
+
+[dvc remotes]: /doc/command-reference/remote
diff --git a/static/img/direct_access_storage.png b/static/img/direct_access_storage.png
new file mode 100644
index 0000000000..75b57231ae
Binary files /dev/null and b/static/img/direct_access_storage.png differ
diff --git a/static/img/dvc_managed_storage.png b/static/img/dvc_managed_storage.png
new file mode 100644
index 0000000000..66aa85b9d1
Binary files /dev/null and b/static/img/dvc_managed_storage.png differ
diff --git a/static/img/project_versioning.png b/static/img/project_versioning.png
new file mode 100644
index 0000000000..d144c483f3
Binary files /dev/null and b/static/img/project_versioning.png differ
diff --git a/static/img/storage-locations.png b/static/img/storage-locations.png
new file mode 100644
index 0000000000..92fa9c7630
Binary files /dev/null and b/static/img/storage-locations.png differ