diff --git a/content/docs/user-guide/data-management/index.md b/content/docs/user-guide/data-management/index.md index a1fe2f96ab..d964885d8a 100644 --- a/content/docs/user-guide/data-management/index.md +++ b/content/docs/user-guide/data-management/index.md @@ -1,29 +1,55 @@ # Data Management with DVC - +You work with data normally in a local workspace. DVC tracks, +restores, and synchronize everything with a few, straightforward commands +(similar to Git) that do not change regardless of the underlying file systems, +transfer protocols, etc. -Managing datasets and ML models tends to be a manual and different process for -each team and project. +![]() _Separating data from code_ - +To achieve this, DVC relies on data _codification_: replacing large files and +directories with small [metafiles] that describe the assets. Data files are +moved to a separate cache but kept virtually (linked) in the +workspace. This **separates your data from code** (including metafiles). -With DVC, you manipulate the project files normally in your local workspace; DVC -tracks, restores, and synchronizes them across locations. + + +This also allows you to [version] all project files with Git, a battle-tested +[SCM] tool. + + + +DVC operations stay the same because they work [indirectly], by going through +the metafiles and [configuration] of your project to find out where +and how to handle files. This is transparent to you as user, but it's important +to understand the mechanics in general. + +## Workflow and benefits + + + +... + + + +[metafiles]: /doc/user-guide/project-structure +[indirectly]: https://en.wikipedia.org/wiki/Indirection +[configuration]: /doc/command-reference/config +[version]: /doc/user-guide/data-management/data-versioning +[scm]: https://www.atlassian.com/git/tutorials/source-code-management -## How it works +## Storage locations -DVC helps you manage and share arbitrarily large files anywhere: cloud storage, -SSH servers, network resources (e.g. NAS), mounted drives, local file systems, -etc. To do so, several storage locations can be defined. +DVC can manage data anywhere: cloud storage, SSH servers, network resources +(e.g. NAS), mounted drives, local file systems, etc. These locations can be +separated into three groups.