From c31d9713cbc2ffd2d4903db8a23c902fd18393c7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 19 Nov 2019 18:59:06 -0600 Subject: [PATCH 01/28] use-cases: address smaller points from review (#795) --- static/docs/use-cases/data-registry.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index a5eead5b21..937c6e9d72 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -13,13 +13,13 @@ example, project A may use a data file to begin its data same file; Instead of [adding it](/doc/command-reference/add#example-single-file) it to both projects, B can simply import it from A. Furthermore, the version of the data file -imported to B can be an older iteration than what's currently used in A. +imported to B can be different than what's currently used in A. Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any kind of large files). This way we would -have a repository that has all the metadata and change history for the project's -data. We can see who updated what, and when; use pull requests to update data -the same way you do with code; and we don't need ad-hoc conventions to store +have a repository with all the metadata and history of changes in the project's +data. We can see who updated what, and when, use pull requests to update data +the same way you do with code, and we don't need ad-hoc conventions to store different data versions. Other projects can share the data in the registry by downloading (`dvc get`) or importing (`dvc import`) them for use in different data processes. @@ -28,9 +28,8 @@ The advantages of using a DVC **data registry** project are: - Data as code: Improve _lifecycle management_ with versioning of simple directory structures (like Git for your cloud storage), without ad-hoc - conventions. Leverage Git and Git hosting features such as change history, - branching, pull requests, reviews, and even continuous deployment of ML - models. + conventions. Leverage Git and Git hosting features such as commits, branching, + pull requests, reviews, and even continuous deployment of ML models. - Reusability: Reproduce and organize _feature stores_ with a simple CLI (`dvc get` and `dvc import` commands, similar to software package management systems like `pip`). @@ -49,8 +48,8 @@ The advantages of using a DVC **data registry** project are: ## Example -A dataset we use for several of our examples and tutorials is one containing -2800 images of cats and dogs. We partitioned the dataset in two for our +A dataset we use for several of our examples and tutorials contains 2800 images +of cats and dogs. We partitioned the dataset in two for our [Versioning Tutorial](/doc/tutorials/versioning), and backed up the parts on a storage server, downloading them with `wget` in our examples. This setup was then revised to download the dataset with `dvc get` instead, so we created the From 6002cba2d1e166cd1b628212382531340db6a396 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 20 Nov 2019 18:25:05 -0600 Subject: [PATCH 02/28] use-cases: reinforce hypothetical phrasing in data registry intro paragraph per https://github.com/iterative/dvc.org/issues/795#issuecomment-556114361 --- static/docs/use-cases/data-registry.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 937c6e9d72..eccaeedb15 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -10,7 +10,7 @@ different projects (similar to package management systems, but for data), DVC also includes the `dvc get`, `dvc import`, and `dvc update` commands. For example, project A may use a data file to begin its data [pipeline](/doc/command-reference/pipeline), but project B also requires this -same file; Instead of +same file. Instead of [adding it](/doc/command-reference/add#example-single-file) it to both projects, B can simply import it from A. Furthermore, the version of the data file imported to B can be different than what's currently used in A. @@ -18,13 +18,13 @@ imported to B can be different than what's currently used in A. Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any kind of large files). This way we would have a repository with all the metadata and history of changes in the project's -data. We can see who updated what, and when, use pull requests to update data -the same way you do with code, and we don't need ad-hoc conventions to store -different data versions. Other projects can share the data in the registry by -downloading (`dvc get`) or importing (`dvc import`) them for use in different -data processes. +data. We could see who updated what, and when, use pull requests to update data +(the same way we do with code), and avoid ad-hoc conventions to store different +data versions. This is what we call a data registry. Other projects can share +datasets in a registry by downloading (`dvc get`) or importing (`dvc import`) +them for use in different data processes. -The advantages of using a DVC **data registry** project are: +Advantages of using a DVC **data registry** project: - Data as code: Improve _lifecycle management_ with versioning of simple directory structures (like Git for your cloud storage), without ad-hoc From 47ebae5868f88b11b6fda55b70a7b6df48b6c9d9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 20 Nov 2019 18:50:45 -0600 Subject: [PATCH 03/28] use-cases: partitioned->split in data registry case per #795 and https://github.com/iterative/dvc.org/issues/795#issuecomment-556114361 --- static/docs/use-cases/data-registry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index eccaeedb15..adcc0a7990 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -49,7 +49,7 @@ Advantages of using a DVC **data registry** project: ## Example A dataset we use for several of our examples and tutorials contains 2800 images -of cats and dogs. We partitioned the dataset in two for our +of cats and dogs. We split the dataset in two for our [Versioning Tutorial](/doc/tutorials/versioning), and backed up the parts on a storage server, downloading them with `wget` in our examples. This setup was then revised to download the dataset with `dvc get` instead, so we created the From a578c15d58384a25ac85fb9e1fa6c5b6f163e521 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 20 Nov 2019 18:56:50 -0600 Subject: [PATCH 04/28] use-cases: geatly simplify mention about project inter-dependency in data reg per https://github.com/iterative/dvc.org/issues/795#issuecomment-556114361 and https://github.com/iterative/dvc.org/issues/795#issuecomment-556651871 --- static/docs/use-cases/data-registry.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index adcc0a7990..45fc308360 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -7,13 +7,9 @@ tracking of datasets and any other data artifacts. With the aim to enable reusability of these versioned artifacts between different projects (similar to package management systems, but for data), DVC -also includes the `dvc get`, `dvc import`, and `dvc update` commands. For -example, project A may use a data file to begin its data -[pipeline](/doc/command-reference/pipeline), but project B also requires this -same file. Instead of -[adding it](/doc/command-reference/add#example-single-file) it to both projects, -B can simply import it from A. Furthermore, the version of the data file -imported to B can be different than what's currently used in A. +also includes the `dvc get`, `dvc import`, and `dvc update` commands. This means +that a project can depend on data from an external DVC project, but +chaining several projects this way can easily become messy... Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any kind of large files). This way we would From d9ad1ab2fb60e26fb2fdf6f51f5a6040b335cc2f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 21 Nov 2019 19:00:11 -0600 Subject: [PATCH 05/28] use-cases: improve intro to example in data registry case --- static/docs/use-cases/data-registry.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 45fc308360..cb8a07f0f3 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -44,17 +44,17 @@ Advantages of using a DVC **data registry** project: ## Example -A dataset we use for several of our examples and tutorials contains 2800 images -of cats and dogs. We split the dataset in two for our -[Versioning Tutorial](/doc/tutorials/versioning), and backed up the parts on a -storage server, downloading them with `wget` in our examples. This setup was -then revised to download the dataset with `dvc get` instead, so we created the +A dataset we commonly use for several of our examples and tutorials contains +2800 images of cats and dogs. We split it in two for our +[Versioning Tutorial](/doc/tutorials/versioning). Originally, the parts were +backed up on a storage server, and downloaded with `wget`. This setup was then +revised to download the dataset sing `dvc get` instead, so we created the [dataset-registry](https://github.com/iterative/dataset-registry)) repository, a DVC project hosted on GitHub, to version the dataset (see its [`tutorial/ver`](https://github.com/iterative/dataset-registry/tree/master/tutorial/ver) directory). -However, there are a few problems with the way this dataset is structured. Most +However, there are a few problems with the way that dataset is structured. Most importantly, this single dataset is tracked by 2 different [DVC-files](/doc/user-guide/dvc-file-format), instead of 2 versions of the same one, which would better reflect the intentions of this dataset... Fortunately, From 50b772ea806d078e974b7144bc87419db0a498e1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 23 Nov 2019 00:09:24 -0600 Subject: [PATCH 06/28] use-cases: rephrase much of the data registry example to improve its logic and readability per https://github.com/iterative/dvc.org/issues/795#issuecomment-557228299 --- static/docs/use-cases/data-registry.md | 101 +++++++++++++------------ 1 file changed, 52 insertions(+), 49 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index cb8a07f0f3..def518eb38 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -45,28 +45,29 @@ Advantages of using a DVC **data registry** project: ## Example A dataset we commonly use for several of our examples and tutorials contains -2800 images of cats and dogs. We split it in two for our +2800 images of cats and dogs, which was split it in two for our [Versioning Tutorial](/doc/tutorials/versioning). Originally, the parts were -backed up on a storage server, and downloaded with `wget`. This setup was then -revised to download the dataset sing `dvc get` instead, so we created the -[dataset-registry](https://github.com/iterative/dataset-registry)) repository, a -DVC project hosted on GitHub, to version the dataset (see its +backed up on a storage server, and downloaded with +[`wget`](https://www.gnu.org/software/wget/). This was then revised in order to +download the parts with `dvc get` instead, so we created the +[dataset-registry](https://github.com/iterative/dataset-registry) +project to version the dataset (in the [`tutorial/ver`](https://github.com/iterative/dataset-registry/tree/master/tutorial/ver) directory). -However, there are a few problems with the way that dataset is structured. Most -importantly, this single dataset is tracked by 2 different -[DVC-files](/doc/user-guide/dvc-file-format), instead of 2 versions of the same -one, which would better reflect the intentions of this dataset... Fortunately, -we have also prepared an improved alternative in the +However, there's a few problems with the way that dataset is versioned. Most +importantly, this split dataset is tracked by 2 different +[DVC-files](/doc/user-guide/dvc-file-format) (one for each part), instead of 2 +versions of a single DVC-file. An initial version could have the first part +only, while an update would have the entire, unified dataset. Fortunately, we +have also prepared this improved alternative in the [`use-cases/`](https://github.com/iterative/dataset-registry/tree/master/use-cases) directory of the same DVC repository. -To create a -[first version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) +To create the +[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) of our dataset, we extracted the first part into the `use-cases/cats-dogs` -directory (illustrated below), and ran `dvc add use-cases/cats-dogs` to -[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory). +directory, illustrated below: ```dvc $ tree use-cases/cats-dogs --filelimit 3 @@ -80,7 +81,10 @@ use-cases/cats-dogs └── dogs [400 image files] ``` -In a local DVC project, we could have obtained this dataset at this point with +Then we ran `dvc add use-cases/cats-dogs` to +[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory). + +At this point, we could have obtained this dataset in another DVC project with the following command: ```dvc @@ -90,15 +94,16 @@ $ dvc import git@github.com:iterative/dataset-registry.git \ > Note that unlike `dvc get`, which can be used from any directory, `dvc import` > always needs to run from an [initialized](/doc/command-reference/init) DVC -> project. +> project. Remember also that with both commands, the data comes from the source +> project's remote storage, not from the Git repository itself.
### Expand for actionable command (optional) The command above is meant for informational purposes only. If you actually run -it in a DVC project, although it should work, it will import the latest version -of `use-cases/cats-dogs` from `dataset-registry`. The following command would +it, although it will work, it will import the latest version of +`use-cases/cats-dogs` from `dataset-registry`. The following command would actually bring in the version in question: ```dvc @@ -112,54 +117,52 @@ See the `dvc import` command reference for more details on the `--rev`
-Importing keeps the connection between the local project and the source data -registry where we are downloading the dataset from. This is achieved by creating -a particular kind of [DVC-file](/doc/user-guide/dvc-file-format) that uses the -`repo` field (a.k.a. _import stage_). (This file can be used for versioning the -import with Git.) +Importing keeps the connection between the local project and the +data source (registry repository). This is achieved by creating a +particular kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import +stage_) that includes a `repo` field. (This file can be used staged and +committed with Git.) > For a sample DVC-file resulting from `dvc import`, refer to > [this example](/doc/command-reference/import#example-data-registry). -Back in our **dataset-registry** project, a +Back in our **dataset-registry** project, the [second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases) of our dataset was created by extracting the second part, with 1000 additional -images (500 cats, 500 dogs), into the same directory structure. Then, we simply -ran `dvc add use-cases/cats-dogs` again. +images (500 cats, 500 dogs) on top of the existing directory structure. Then, we +simply ran `dvc add use-cases/cats-dogs` again. -In our local project, all we have to do in order to obtain this latest version -of the dataset is to run: +All we would have to do in order to obtain this latest version in another +project where the first version was previously imported, is to run: ```dvc $ dvc update cats-dogs.dvc ``` -This is possible because of the connection that the import stage saved among -local and source projects, as explained earlier. -
### Expand for actionable command (optional) -As with the previous hidden note, actually trying the commands above should -produced the expected results, but not for obvious reasons. Specifically, the -initial `dvc import` command would have already obtained the latest version of -the dataset (as noted before), so this `dvc update` is unnecessary and won't -have an effect. +As with the previous hidden note, actually trying the command above will produce +the desired results, but not for obvious reasons. The initial `dvc import` +command would have already obtained the latest version of the dataset (as noted +before), so this `dvc update` is unnecessary and won't have any effect. -If you ran the `dvc import --rev cats-dogs-v1 ...` command instead, its import -stage (DVC-file) would be fixed to that Git tag (`cats-dogs-v1`). In order to -update it, do not use `dvc update`. Instead, re-import the data by using the -original import command (without `--rev`). Refer to -[this example](http://localhost:3000/doc/command-reference/import#example-fixed-revisions-re-importing) -for more information. +And if you ran the `dvc import --rev cats-dogs-v1 ...` command instead, its +import stage (DVC-file) would be +[fixed to that revision](/doc/command-reference/import#example-fixed-revisions-re-importing) +(`cats-dogs-v1` tag), so `dvc update` would also be ineffective. In order to +actually "update" it, re-import the data instead, by now running the initial +import command (the one without `--rev`): -
+```dvc +$ dvc import git@github.com:iterative/dataset-registry.git \ + use-cases/cats-dogs +``` -This downloads new and changed files in `cats-dogs/` from the source project, -and updates the metadata in the import stage DVC-file. + -As an extra detail, notice that so far our local project is working only with a -local cache. It has no need to setup a -[remotes](/doc/command-reference/remote) to [pull](/doc/command-reference/pull) -or [push](/doc/command-reference/push) this dataset. +This is possible because of the connection that the import stage saved among +local and source projects, as explained earlier. The update downloads new and +changed files in `cats-dogs/` based on the source project, and updates the +metadata in the import stage DVC-file. From 55ab757106eb8a19fe25317488fb3bbfcc97b4b9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 24 Nov 2019 17:45:41 -0600 Subject: [PATCH 07/28] review usage of ellipses thoughout docs per https://github.com/iterative/dvc.org/pull/805#discussion_r349956273 --- static/docs/command-reference/get.md | 2 +- static/docs/command-reference/install.md | 7 +++---- static/docs/tutorials/deep/reproducibility.md | 2 +- static/docs/use-cases/data-registry.md | 2 +- 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/static/docs/command-reference/get.md b/static/docs/command-reference/get.md index 120b3c98a3..f1cbc6c6e2 100644 --- a/static/docs/command-reference/get.md +++ b/static/docs/command-reference/get.md @@ -163,7 +163,7 @@ different names, and not currently tracked by Git: $ git status ... Untracked files: - (use "git add ..." to include in what will be committed) + (use "git add ..." to include in what will be committed) model.bigrams.pkl model.monograms.pkl diff --git a/static/docs/command-reference/install.md b/static/docs/command-reference/install.md index cda7101d8b..ff2c9710a2 100644 --- a/static/docs/command-reference/install.md +++ b/static/docs/command-reference/install.md @@ -155,7 +155,7 @@ checkout the `6-featurization` tag: $ git checkout 6-featurization Note: checking out '6-featurization'. -You are in 'detached HEAD' state. ... +You are in 'detached HEAD' state... $ dvc status @@ -216,7 +216,7 @@ We can now repeat the command run earlier, to see the difference. $ git checkout 6-featurization Note: checking out '6-featurization'. -You are in 'detached HEAD' state. ... +You are in 'detached HEAD' state... HEAD is now at d13ba9a add featurization stage @@ -257,8 +257,7 @@ helpfully informs us the workspace is out of sync. We should therefore run the ```dvc $ dvc repro evaluate.dvc - -... much output +... To track the changes with git run: git add featurize.dvc train.dvc evaluate.dvc diff --git a/static/docs/tutorials/deep/reproducibility.md b/static/docs/tutorials/deep/reproducibility.md index 1e3ad9fcb3..25d1e7024f 100644 --- a/static/docs/tutorials/deep/reproducibility.md +++ b/static/docs/tutorials/deep/reproducibility.md @@ -34,7 +34,7 @@ $ dvc repro model.p.dvc $ dvc repro ``` -Tries to reproduce the same pipeline... But there is still nothing to reproduce. +Tries to reproduce the same pipeline, but there is still nothing to reproduce. ## Adding bigrams diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index def518eb38..52269b8745 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -9,7 +9,7 @@ With the aim to enable reusability of these versioned artifacts between different projects (similar to package management systems, but for data), DVC also includes the `dvc get`, `dvc import`, and `dvc update` commands. This means that a project can depend on data from an external DVC project, but -chaining several projects this way can easily become messy... +chaining several projects this way can easily become messy. Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any kind of large files). This way we would From d125437dcfe5e7ac9a6b7665a6f5423d418bba7d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 24 Nov 2019 20:02:44 -0600 Subject: [PATCH 08/28] use-cases: remove remark about imports getting messy per https://github.com/iterative/dvc.org/issues/795#issuecomment-557943717 (and https://github.com/iterative/dvc.org/pull/805#pullrequestreview-321998559) --- static/docs/use-cases/data-registry.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 52269b8745..b03433b9dc 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -8,8 +8,7 @@ tracking of datasets and any other data artifacts. With the aim to enable reusability of these versioned artifacts between different projects (similar to package management systems, but for data), DVC also includes the `dvc get`, `dvc import`, and `dvc update` commands. This means -that a project can depend on data from an external DVC project, but -chaining several projects this way can easily become messy. +that a project can depend on data from an external DVC project. Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any kind of large files). This way we would From 3cba8f84e46a304d16ae64b58244c9fdf204d063 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 25 Nov 2019 00:36:25 -0600 Subject: [PATCH 09/28] use-cases: further simplify intro of data registry case for #818 --- static/docs/use-cases/data-registry.md | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index b03433b9dc..e938f1d323 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -3,28 +3,29 @@ One of the main uses of DVC repositories is the [versioning of data and model files](/doc/use-cases/data-and-model-files-versioning). This is provided by commands such as `dvc add` and `dvc run`, that allow -tracking of datasets and any other data artifacts. +tracking of datasets or any other data artifacts. With the aim to enable reusability of these versioned artifacts between -different projects (similar to package management systems, but for data), DVC -also includes the `dvc get`, `dvc import`, and `dvc update` commands. This means -that a project can depend on data from an external DVC project. +different projects, DVC also includes the `dvc get`, `dvc import`, and +`dvc update` commands. This means that a project can depend on data from an +external DVC project, similar to package management systems, but +for data. + + Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any kind of large files). This way we would -have a repository with all the metadata and history of changes in the project's +have a repository with all the metadata and history of changes of the project's data. We could see who updated what, and when, use pull requests to update data -(the same way we do with code), and avoid ad-hoc conventions to store different -data versions. This is what we call a data registry. Other projects can share -datasets in a registry by downloading (`dvc get`) or importing (`dvc import`) -them for use in different data processes. +(the same way we do with code). This is what we call a data registry, and it +works as data management middleware between your ML project and cloud storage. Advantages of using a DVC **data registry** project: - Data as code: Improve _lifecycle management_ with versioning of simple - directory structures (like Git for your cloud storage), without ad-hoc - conventions. Leverage Git and Git hosting features such as commits, branching, - pull requests, reviews, and even continuous deployment of ML models. + directory structures (like Git on cloud storage), without ad-hoc conventions. + Leverage Git and Git hosting features such as commits, branching, pull + requests, reviews, and even continuous deployment of ML models. - Reusability: Reproduce and organize _feature stores_ with a simple CLI (`dvc get` and `dvc import` commands, similar to software package management systems like `pip`). From 131a27e83c2b9cef829698bb19bb5fd63393289a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 25 Nov 2019 14:26:17 -0600 Subject: [PATCH 10/28] use-cases: separate example into 2 sections, expand on them to make it easier to follow the 2 parallel stories... --- static/docs/use-cases/data-registry.md | 68 ++++++++++++++------------ 1 file changed, 37 insertions(+), 31 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index e938f1d323..017bbc58a8 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -42,34 +42,31 @@ Advantages of using a DVC **data registry** project: HTTP location). Git versioning of DVC-files allows us to track and audit data changes. -## Example +## Building a data registry A dataset we commonly use for several of our examples and tutorials contains -2800 images of cats and dogs, which was split it in two for our -[Versioning Tutorial](/doc/tutorials/versioning). Originally, the parts were -backed up on a storage server, and downloaded with -[`wget`](https://www.gnu.org/software/wget/). This was then revised in order to -download the parts with `dvc get` instead, so we created the -[dataset-registry](https://github.com/iterative/dataset-registry) -project to version the dataset (in the -[`tutorial/ver`](https://github.com/iterative/dataset-registry/tree/master/tutorial/ver) -directory). - -However, there's a few problems with the way that dataset is versioned. Most -importantly, this split dataset is tracked by 2 different -[DVC-files](/doc/user-guide/dvc-file-format) (one for each part), instead of 2 -versions of a single DVC-file. An initial version could have the first part -only, while an update would have the entire, unified dataset. Fortunately, we -have also prepared this improved alternative in the +2800 images of cats and dogs, which was originally split it in two for our +[Versioning Tutorial](/doc/tutorials/versioning). We then properly versioned +this same dataset (without splitting) in the [`use-cases/`](https://github.com/iterative/dataset-registry/tree/master/use-cases) -directory of the same DVC repository. +directory of our +[dataset-registry](https://github.com/iterative/dataset-registry) +project (hosted on GitHub). Let's see how this was done. + +> Note that first, the **dataset-registry** repository was +> initialized with `git init` and `dvc init`, and the `tutorial/ver/` directory +> was populated with the 2 parts of the data as ZIP files, as shown in the +> Versioning tutorial above. To create the [initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) -of our dataset, we extracted the first part into the `use-cases/cats-dogs` -directory, illustrated below: +of our dataset, we extracted the first part (`data.zip`) into +`use-cases/cats-dogs`, and used `dvc add` to +[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory): ```dvc +$ mkdir use-cases && cd use-cases +$ unzip -q tutorial/ver/data.zip -d use-cases/cats-dogs $ tree use-cases/cats-dogs --filelimit 3 use-cases/cats-dogs └── data @@ -79,13 +76,26 @@ use-cases/cats-dogs └── validation ├── cats [400 image files] └── dogs [400 image files] +$ dvc add use-cases/cats-dogs ``` -Then we ran `dvc add use-cases/cats-dogs` to -[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory). +The +[second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases) +was created by similarly extracting the second part, with 1000 additional images +(500 cats, 500 dogs), on top of the same directory structure. Then we simply ran +`dvc add use-cases/cats-dogs` again. + +The result is a properly versioned dataset, with 2 versions of a single DVC-file +representing the entire (merged) data. This is in contrast to having one single +version of 2 separate DVC-files, one for each part of the data split (as in the +Versioning example). + +## Using a data registry -At this point, we could have obtained this dataset in another DVC project with -the following command: +Let's say at the time of creating the +[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) +of the dataset example above, we want to obtain it in another DVC project. This +could easily be done with the following command: ```dvc $ dvc import git@github.com:iterative/dataset-registry.git \ @@ -126,14 +136,10 @@ committed with Git.) > For a sample DVC-file resulting from `dvc import`, refer to > [this example](/doc/command-reference/import#example-data-registry). -Back in our **dataset-registry** project, the +Then, once the [second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases) -of our dataset was created by extracting the second part, with 1000 additional -images (500 cats, 500 dogs) on top of the existing directory structure. Then, we -simply ran `dvc add use-cases/cats-dogs` again. - -All we would have to do in order to obtain this latest version in another -project where the first version was previously imported, is to run: +of the dataset is created, we can easily bring the dataset up to date locally +with `dvc update`: ```dvc $ dvc update cats-dogs.dvc From a7dc46564993593ec67636ceffaba92c7898dd22 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 25 Nov 2019 17:08:05 -0600 Subject: [PATCH 11/28] use-cases: comlpete "Building a data registry" section in data-registry --- static/docs/use-cases/data-registry.md | 37 +++++++++++++++++++------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 017bbc58a8..19a9490a5d 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -46,9 +46,8 @@ Advantages of using a DVC **data registry** project: A dataset we commonly use for several of our examples and tutorials contains 2800 images of cats and dogs, which was originally split it in two for our -[Versioning Tutorial](/doc/tutorials/versioning). We then properly versioned -this same dataset (without splitting) in the -[`use-cases/`](https://github.com/iterative/dataset-registry/tree/master/use-cases) +[Versioning tutorial](/doc/tutorials/versioning). We then improved the +versioning of this same dataset (without splitting) in the `use-cases/` directory of our [dataset-registry](https://github.com/iterative/dataset-registry) project (hosted on GitHub). Let's see how this was done. @@ -59,10 +58,11 @@ directory of our > Versioning tutorial above. To create the -[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) -of our dataset, we extracted the first part (`data.zip`) into -`use-cases/cats-dogs`, and used `dvc add` to -[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory): +[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases), +we extracted the first part (`data.zip`) into `use-cases/cats-dogs` and used +`dvc add` to +[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory), +and committed this state with Git: ```dvc $ mkdir use-cases && cd use-cases @@ -77,13 +77,30 @@ use-cases/cats-dogs ├── cats [400 image files] └── dogs [400 image files] $ dvc add use-cases/cats-dogs + +... This creates DVC-file `use-cases/cats-dogs.dvc` + +$ git add .gitignore use-cases/cats-dogs.dvc +$ git commit -m 'Add 1800 cats and dogs images dataset.' ``` The [second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases) -was created by similarly extracting the second part, with 1000 additional images -(500 cats, 500 dogs), on top of the same directory structure. Then we simply ran -`dvc add use-cases/cats-dogs` again. +was created by extracting the remaining part of the dataset, with 1000 +additional training images (500 cats, 500 dogs), on top of the same directory +structure. Then we simply added the directory again! DVC recognizes the changes +and updates the DVC-file, which can then be committed with Git again: + +```dvc +$ dvc add use-cases/cats-dogs +$ git add use-cases/cats-dogs.dvc +$ git commit -m 'Add 1000 more cats and dogs images to dataset.' +``` + +> The versioned dataset was then uploaded to +> [remote storage](/doc/command-reference/remote) with `dvc push`. This is +> necessary for others being able to access the data from other projects and +> locations. The result is a properly versioned dataset, with 2 versions of a single DVC-file representing the entire (merged) data. This is in contrast to having one single From 57d4059d51fbc2f923932586e33358a512a382d2 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 26 Nov 2019 00:04:43 -0600 Subject: [PATCH 12/28] use-cases: provide high level abstract overview of the Git and DVC commands use to organize the registry for #818 --- static/docs/use-cases/data-registry.md | 78 +++++++++++++++++++------- 1 file changed, 57 insertions(+), 21 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 19a9490a5d..453a40cbfe 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -11,14 +11,14 @@ different projects, DVC also includes the `dvc get`, `dvc import`, and external DVC project, similar to package management systems, but for data. - + Keeping this in mind, we could build a DVC project dedicated to -tracking and versioning datasets (or any kind of large files). This way we would -have a repository with all the metadata and history of changes of the project's -data. We could see who updated what, and when, use pull requests to update data -(the same way we do with code). This is what we call a data registry, and it -works as data management middleware between your ML project and cloud storage. +tracking and versioning datasets (or any large data). This way we would have a +repository with all the metadata and history of changes of the project's data. +We could see who updated what, and when, use pull requests to update data (the +same way we do with code). This is what we call a data registry, and it works as +data management middleware between your ML project and cloud storage. Advantages of using a DVC **data registry** project: @@ -39,24 +39,59 @@ Advantages of using a DVC **data registry** project: copies on other remotes). This simplifies data management and optimizes space requirements. - Security: Registries can be setup to have read-only remote storage (e.g. an - HTTP location). Git versioning of DVC-files allows us to track and audit data - changes. + HTTP location). Git versioning of [DVC-files](/doc/user-guide/dvc-file-format) + allows us to track and audit data changes. -## Building a data registry +## Building data registries -A dataset we commonly use for several of our examples and tutorials contains -2800 images of cats and dogs, which was originally split it in two for our -[Versioning tutorial](/doc/tutorials/versioning). We then improved the -versioning of this same dataset (without splitting) in the `use-cases/` +A data registry is a kind of DVC repository, so it can be created +locally like to any other Git + DVC project. However, the registry +should be available online, so it must pushed to a Git server: + +```dvc +$ mkdir my-data-registry && cd my-data-registry +$ git init && dvc init +$ git commit -am "Initialize DVC project" +$ git remote add origin git@... # Git server URL +$ git branch -u origin/master +$ git push +``` + +What will make the online registry special, is that it will mainly contain +[DVC-files](/doc/user-guide/dvc-file-format). These will track the different +datasets we want to version. The actual data will be stored in one or more +[remote storage](/doc/command-reference/remote) locations configured in the +project. + +A good way to organize these DVC-files is in different directories that group +the data artifacts for different uses, for example `images/`, +`natural-language/`, etc. As an example, our +[dataset-registry](https://github.com/iterative/dataset-registry) uses a +directory for each of our website documentation sections, such as `get-started/` +and `use-cases/`. + +> We use this example registry for all of our docs, where needed, for example in +> the [Versioning](/doc/tutorials/versioning) tutorial, +> [in Get Started](/doc/get-started/add-files), and some Command Reference +> examples. + +### Adding datasets to the registry + +Imagine a training dataset with 1000 images of cats and dogs that will be used +to build an ML model. Without DVC, in order for a team to collaborate on this +project, we could just uploading it to cloud storage (e.g. Amazon S3) and +provide everyone with access. + +At some point though, we need to add another 1000 images to the dataset, but the +colleagues already have work based on the initial set. For simplicity, we keep +the dataset split into 2 directories (or compressed files) uploaded separately +to the cloud. + +We actually versioned such a dataset (without split) in the `use-cases/` directory of our [dataset-registry](https://github.com/iterative/dataset-registry) project (hosted on GitHub). Let's see how this was done. -> Note that first, the **dataset-registry** repository was -> initialized with `git init` and `dvc init`, and the `tutorial/ver/` directory -> was populated with the 2 parts of the data as ZIP files, as shown in the -> Versioning tutorial above. - To create the [initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases), we extracted the first part (`data.zip`) into `use-cases/cats-dogs` and used @@ -65,8 +100,8 @@ we extracted the first part (`data.zip`) into `use-cases/cats-dogs` and used and committed this state with Git: ```dvc -$ mkdir use-cases && cd use-cases -$ unzip -q tutorial/ver/data.zip -d use-cases/cats-dogs +$ mkdir use-cases +$ cp path/to/data-part-one/ use-cases/cats-dogs $ tree use-cases/cats-dogs --filelimit 3 use-cases/cats-dogs └── data @@ -89,7 +124,8 @@ The was created by extracting the remaining part of the dataset, with 1000 additional training images (500 cats, 500 dogs), on top of the same directory structure. Then we simply added the directory again! DVC recognizes the changes -and updates the DVC-file, which can then be committed with Git again: +and updates the [DVC-file](/doc/user-guide/dvc-file-format), which can then be +committed with Git again: ```dvc $ dvc add use-cases/cats-dogs From c49bc0c2d31c805279eab60b540909cc8d0a2891 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 26 Nov 2019 00:24:00 -0600 Subject: [PATCH 13/28] use-cases: simplify intro and 2nd section in data-registry --- static/docs/use-cases/data-registry.md | 51 ++++++++++++++------------ 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 453a40cbfe..be5a542328 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -1,24 +1,21 @@ # Data Registry One of the main uses of DVC repositories is the -[versioning of data and model files](/doc/use-cases/data-and-model-files-versioning). -This is provided by commands such as `dvc add` and `dvc run`, that allow -tracking of datasets or any other data artifacts. - -With the aim to enable reusability of these versioned artifacts between -different projects, DVC also includes the `dvc get`, `dvc import`, and -`dvc update` commands. This means that a project can depend on data from an -external DVC project, similar to package management systems, but -for data. +[versioning of data and model files](/doc/use-cases/data-and-model-files-versioning), +with commands such as `dvc add`. With the aim to enable reusability of these +data artifacts between different projects, DVC also provides the +`dvc get`, `dvc import`, and `dvc update` commands. This means that a project +can depend on data from an external DVC project, **similar to +package management systems, but for data**. Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any large data). This way we would have a -repository with all the metadata and history of changes of the project's data. -We could see who updated what, and when, use pull requests to update data (the -same way we do with code). This is what we call a data registry, and it works as -data management middleware between your ML project and cloud storage. +repository with all the metadata and history of changes of different datasets. +We could see who updated what, and when, and use pull requests to update data +(the same way we do with code). This is what we call a **data registry**, which +can work as data management _middleware_ between ML projects and cloud storage. Advantages of using a DVC **data registry** project: @@ -44,9 +41,9 @@ Advantages of using a DVC **data registry** project: ## Building data registries -A data registry is a kind of DVC repository, so it can be created -locally like to any other Git + DVC project. However, the registry -should be available online, so it must pushed to a Git server: +Data registries are DVC repositories, so they can be created +locally like any other Git + DVC project. However, registries +should be available online i.e. pushed to a Git server. For example: ```dvc $ mkdir my-data-registry && cd my-data-registry @@ -57,15 +54,16 @@ $ git branch -u origin/master $ git push ``` -What will make the online registry special, is that it will mainly contain -[DVC-files](/doc/user-guide/dvc-file-format). These will track the different -datasets we want to version. The actual data will be stored in one or more -[remote storage](/doc/command-reference/remote) locations configured in the -project. +What makes online data registries special, is that they mainly contain simple +[DVC-files](/doc/user-guide/dvc-file-format) (probably no source code or +[pipelines](/doc/command-reference/pipeline)). These [DVC-files track the +different datasets we may want to version. The actual data will be stored in one +or more [remote storage](/doc/command-reference/remote) locations configured in +the project. A good way to organize these DVC-files is in different directories that group -the data artifacts for different uses, for example `images/`, -`natural-language/`, etc. As an example, our +the data into separate uses, for example `images/`, `natural-language/`, etc. As +an example, our [dataset-registry](https://github.com/iterative/dataset-registry) uses a directory for each of our website documentation sections, such as `get-started/` and `use-cases/`. @@ -75,7 +73,12 @@ and `use-cases/`. > [in Get Started](/doc/get-started/add-files), and some Command Reference > examples. -### Adding datasets to the registry +### Adding datasets to a registry + + + + + Imagine a training dataset with 1000 images of cats and dogs that will be used to build an ML model. Without DVC, in order for a team to collaborate on this From 8c300a2d5ce553e052f1a53bd78b404cff82c445 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 26 Nov 2019 13:33:10 -0600 Subject: [PATCH 14/28] use-cases: fix typo in data-registry per https://github.com/iterative/dvc.org/pull/818#discussion_r350762688 --- static/docs/use-cases/data-registry.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index be5a542328..da4b388b50 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -82,8 +82,8 @@ and `use-cases/`. Imagine a training dataset with 1000 images of cats and dogs that will be used to build an ML model. Without DVC, in order for a team to collaborate on this -project, we could just uploading it to cloud storage (e.g. Amazon S3) and -provide everyone with access. +project, we could just upload it to cloud storage (e.g. Amazon S3) and provide +everyone with access. At some point though, we need to add another 1000 images to the dataset, but the colleagues already have work based on the initial set. For simplicity, we keep From 6854a8bdd8c0062c0b25d5271eea37eea92a9b42 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 27 Nov 2019 23:43:53 -0600 Subject: [PATCH 15/28] WIP: use-cases: simplofy middle sections per discussion with Ivan, by moving all the cats and dogs stuff to later Example sections --- static/docs/use-cases/data-registry.md | 46 +++++++++++--------------- 1 file changed, 20 insertions(+), 26 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index da4b388b50..7ea8cfed73 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -8,7 +8,7 @@ with commands such as `dvc add`. With the aim to enable reusability of these can depend on data from an external DVC project, **similar to package management systems, but for data**. - + Keeping this in mind, we could build a DVC project dedicated to tracking and versioning datasets (or any large data). This way we would have a @@ -41,25 +41,9 @@ Advantages of using a DVC **data registry** project: ## Building data registries -Data registries are DVC repositories, so they can be created -locally like any other Git + DVC project. However, registries -should be available online i.e. pushed to a Git server. For example: - -```dvc -$ mkdir my-data-registry && cd my-data-registry -$ git init && dvc init -$ git commit -am "Initialize DVC project" -$ git remote add origin git@... # Git server URL -$ git branch -u origin/master -$ git push -``` - -What makes online data registries special, is that they mainly contain simple -[DVC-files](/doc/user-guide/dvc-file-format) (probably no source code or -[pipelines](/doc/command-reference/pipeline)). These [DVC-files track the -different datasets we may want to version. The actual data will be stored in one -or more [remote storage](/doc/command-reference/remote) locations configured in -the project. +Data registries can be created locally like any other DVC +repositories with `git init` and `dvc init`, and pushed to a Git server +for sharing with others. A good way to organize these DVC-files is in different directories that group the data into separate uses, for example `images/`, `natural-language/`, etc. As @@ -69,16 +53,26 @@ directory for each of our website documentation sections, such as `get-started/` and `use-cases/`. > We use this example registry for all of our docs, where needed, for example in -> the [Versioning](/doc/tutorials/versioning) tutorial, +> the [Versioning tutorial](/doc/tutorials/versioning), > [in Get Started](/doc/get-started/add-files), and some Command Reference > examples. ### Adding datasets to a registry - - - - +... + +What makes data registries special, is that they mainly contain simple +[DVC-files](/doc/user-guide/dvc-file-format) (probably no source code or +[pipelines](/doc/command-reference/pipeline)). These [DVC-files track the +different datasets we may want to version. The actual data will be stored in one +or more [remote storage](/doc/command-reference/remote) locations configured in +the project. + +## Using a data registry + +... + +## Example Imagine a training dataset with 1000 images of cats and dogs that will be used to build an ML model. Without DVC, in order for a team to collaborate on this @@ -146,7 +140,7 @@ representing the entire (merged) data. This is in contrast to having one single version of 2 separate DVC-files, one for each part of the data split (as in the Versioning example). -## Using a data registry +## Example: Consuming Let's say at the time of creating the [initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) From e2d93c755786c4b5dd0594d85863ee422b6aef7f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 28 Nov 2019 01:11:15 -0600 Subject: [PATCH 16/28] WIP: use-cases: rewrite middle section of data registry without cats-dogs, and explain how to add datasets, high-level. Pending Using a data registry section (and maybe Example) --- static/docs/use-cases/data-registry.md | 215 +++++++------------------ 1 file changed, 57 insertions(+), 158 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 7ea8cfed73..7947cbc119 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -11,7 +11,7 @@ package management systems, but for data**. Keeping this in mind, we could build a DVC project dedicated to -tracking and versioning datasets (or any large data). This way we would have a +tracking and versioning _datasets_ (or any large data). This way we would have a repository with all the metadata and history of changes of different datasets. We could see who updated what, and when, and use pull requests to update data (the same way we do with code). This is what we call a **data registry**, which @@ -43,182 +43,81 @@ Advantages of using a DVC **data registry** project: Data registries can be created locally like any other DVC repositories with `git init` and `dvc init`, and pushed to a Git server -for sharing with others. - -A good way to organize these DVC-files is in different directories that group -the data into separate uses, for example `images/`, `natural-language/`, etc. As -an example, our +for sharing with others. A good way to organize them is with different +directories, to group the data into separate uses, such as `images/`, +`natural-language/`, etc. For example, our [dataset-registry](https://github.com/iterative/dataset-registry) uses a -directory for each of our website documentation sections, such as `get-started/` -and `use-cases/`. +directory for each of our website documentation sections, like `get-started/`, +`use-cases/`, etc. -> We use this example registry for all of our docs, where needed, for example in +> We use **dataset-registry** for all of our docs, where needed, for example in > the [Versioning tutorial](/doc/tutorials/versioning), > [in Get Started](/doc/get-started/add-files), and some Command Reference > examples. -### Adding datasets to a registry - -... - -What makes data registries special, is that they mainly contain simple -[DVC-files](/doc/user-guide/dvc-file-format) (probably no source code or -[pipelines](/doc/command-reference/pipeline)). These [DVC-files track the -different datasets we may want to version. The actual data will be stored in one -or more [remote storage](/doc/command-reference/remote) locations configured in -the project. - -## Using a data registry - -... - -## Example - -Imagine a training dataset with 1000 images of cats and dogs that will be used -to build an ML model. Without DVC, in order for a team to collaborate on this -project, we could just upload it to cloud storage (e.g. Amazon S3) and provide -everyone with access. - -At some point though, we need to add another 1000 images to the dataset, but the -colleagues already have work based on the initial set. For simplicity, we keep -the dataset split into 2 directories (or compressed files) uploaded separately -to the cloud. - -We actually versioned such a dataset (without split) in the `use-cases/` -directory of our -[dataset-registry](https://github.com/iterative/dataset-registry) -project (hosted on GitHub). Let's see how this was done. - -To create the -[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases), -we extracted the first part (`data.zip`) into `use-cases/cats-dogs` and used -`dvc add` to -[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory), -and committed this state with Git: - -```dvc -$ mkdir use-cases -$ cp path/to/data-part-one/ use-cases/cats-dogs -$ tree use-cases/cats-dogs --filelimit 3 -use-cases/cats-dogs -└── data - ├── train - │   ├── cats [500 image files] - │   └── dogs [500 image files] - └── validation - ├── cats [400 image files] - └── dogs [400 image files] -$ dvc add use-cases/cats-dogs - -... This creates DVC-file `use-cases/cats-dogs.dvc` - -$ git add .gitignore use-cases/cats-dogs.dvc -$ git commit -m 'Add 1800 cats and dogs images dataset.' -``` - -The -[second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases) -was created by extracting the remaining part of the dataset, with 1000 -additional training images (500 cats, 500 dogs), on top of the same directory -structure. Then we simply added the directory again! DVC recognizes the changes -and updates the [DVC-file](/doc/user-guide/dvc-file-format), which can then be -committed with Git again: +Adding datasets to a registry can be as simple as placing the data file or +directory in question inside the workspace, and telling DVC to +track it, with `dvc add`. For example: ```dvc -$ dvc add use-cases/cats-dogs -$ git add use-cases/cats-dogs.dvc -$ git commit -m 'Add 1000 more cats and dogs images to dataset.' -``` - -> The versioned dataset was then uploaded to -> [remote storage](/doc/command-reference/remote) with `dvc push`. This is -> necessary for others being able to access the data from other projects and -> locations. - -The result is a properly versioned dataset, with 2 versions of a single DVC-file -representing the entire (merged) data. This is in contrast to having one single -version of 2 separate DVC-files, one for each part of the data split (as in the -Versioning example). - -## Example: Consuming - -Let's say at the time of creating the -[initial version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases) -of the dataset example above, we want to obtain it in another DVC project. This -could easily be done with the following command: - -```dvc -$ dvc import git@github.com:iterative/dataset-registry.git \ - use-cases/cats-dogs -``` - -> Note that unlike `dvc get`, which can be used from any directory, `dvc import` -> always needs to run from an [initialized](/doc/command-reference/init) DVC -> project. Remember also that with both commands, the data comes from the source -> project's remote storage, not from the Git repository itself. - -
- -### Expand for actionable command (optional) - -The command above is meant for informational purposes only. If you actually run -it, although it will work, it will import the latest version of -`use-cases/cats-dogs` from `dataset-registry`. The following command would -actually bring in the version in question: - -```dvc -$ dvc import --rev cats-dogs-v1 \ - git@github.com:iterative/dataset-registry.git \ - use-cases/cats-dogs +$ mkdir -p music/Beatles +$ cp ~/Downloads/millionsongsubset_full music/songs +$ dvc add music/songs +100% Add 1/1 [00:03<00:00, 3.58s/file] +... +$ git add music/songs.dvc music/.gitignore +$ git commit -m "Track 1.8 GB 10,000 song dataset." ``` -See the `dvc import` command reference for more details on the `--rev` -(revision) option. - -
- -Importing keeps the connection between the local project and the -data source (registry repository). This is achieved by creating a -particular kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import -stage_) that includes a `repo` field. (This file can be used staged and -committed with Git.) +> This example dataset actually exists. See +> [MillionSongSubset](http://millionsongdataset.com/pages/getting-dataset/#subset). -> For a sample DVC-file resulting from `dvc import`, refer to -> [this example](/doc/command-reference/import#example-data-registry). +As shown above, a regular Git workflow can be followed with the tiny +[DVC-files](/doc/user-guide/dvc-file-format) that substitute the actual data +(`music/songs.dvc` in the example). This enables team collaboration on data at +the same level as with source code (commit history, branching, pull requests, +reviews, etc.) -Then, once the -[second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases) -of the dataset is created, we can easily bring the dataset up to date locally -with `dvc update`: +Datasets evolve, and DVC is prepared to handle it. Just add/remove or change the +contents of the data registry, and apply updates by running `dvc add` again. +This can be iterated as many times as necessary. Example: ```dvc -$ dvc update cats-dogs.dvc +$ cp /path/to/1000/image/dir music/songs +$ dvc add music/songs +... +$ git status +Changes not staged for commit: +... + modified: music/songs.dvc +$ git commit -am "Add 1000 more songs." ``` -
+Repeating this process for several datasets will give shape to a robust +registry, which are basically repositories that mainly version a bunch of +DVC-files, as you can see in the hypotetical example below. -### Expand for actionable command (optional) - -As with the previous hidden note, actually trying the command above will produce -the desired results, but not for obvious reasons. The initial `dvc import` -command would have already obtained the latest version of the dataset (as noted -before), so this `dvc update` is unnecessary and won't have any effect. - -And if you ran the `dvc import --rev cats-dogs-v1 ...` command instead, its -import stage (DVC-file) would be -[fixed to that revision](/doc/command-reference/import#example-fixed-revisions-re-importing) -(`cats-dogs-v1` tag), so `dvc update` would also be ineffective. In order to -actually "update" it, re-import the data instead, by now running the initial -import command (the one without `--rev`): +> The actual data will be [pushed](/doc/command-reference/push) to one or more +> [remote storage](/doc/command-reference/remote) locations that need to be +> configured separately in the project. ```dvc -$ dvc import git@github.com:iterative/dataset-registry.git \ - use-cases/cats-dogs +$ tree --filelimit=100 +. +├── images +│ ├── .gitignore +│ ├── cats-dogs [2800 entries] # Listed in .gitignore +│ ├── faces [10000 entries] # Listed in .gitignore +│ ├── cats-dogs.dvc +│ └── faces.dvc +├── music +│ ├── .gitignore +│ ├── songs [11000 entries] # Listed in .gitignore +│ └── songs.dvc +├── text +... ``` -
+## Using a data registry -This is possible because of the connection that the import stage saved among -local and source projects, as explained earlier. The update downloads new and -changed files in `cats-dogs/` based on the source project, and updates the -metadata in the import stage DVC-file. +... From faeb05767e04e8e30c403652fa088fae7615621e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 29 Nov 2019 18:07:23 -0600 Subject: [PATCH 17/28] use-cases: review Construction and workflow section per private review with Ivan --- static/docs/use-cases/data-registry.md | 38 ++++++++++++-------------- 1 file changed, 17 insertions(+), 21 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 7947cbc119..90f6b4df35 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -39,22 +39,16 @@ Advantages of using a DVC **data registry** project: HTTP location). Git versioning of [DVC-files](/doc/user-guide/dvc-file-format) allows us to track and audit data changes. -## Building data registries +## Construction and workflow -Data registries can be created locally like any other DVC -repositories with `git init` and `dvc init`, and pushed to a Git server -for sharing with others. A good way to organize them is with different +Data registries can be created like any other DVC repositories with +`git init` and `dvc init`. A good way to organize them is with different directories, to group the data into separate uses, such as `images/`, `natural-language/`, etc. For example, our [dataset-registry](https://github.com/iterative/dataset-registry) uses a -directory for each of our website documentation sections, like `get-started/`, +directory for each section in our website documentation, like `get-started/`, `use-cases/`, etc. -> We use **dataset-registry** for all of our docs, where needed, for example in -> the [Versioning tutorial](/doc/tutorials/versioning), -> [in Get Started](/doc/get-started/add-files), and some Command Reference -> examples. - Adding datasets to a registry can be as simple as placing the data file or directory in question inside the workspace, and telling DVC to track it, with `dvc add`. For example: @@ -65,18 +59,25 @@ $ cp ~/Downloads/millionsongsubset_full music/songs $ dvc add music/songs 100% Add 1/1 [00:03<00:00, 3.58s/file] ... -$ git add music/songs.dvc music/.gitignore -$ git commit -m "Track 1.8 GB 10,000 song dataset." ``` > This example dataset actually exists. See > [MillionSongSubset](http://millionsongdataset.com/pages/getting-dataset/#subset). -As shown above, a regular Git workflow can be followed with the tiny +A regular Git workflow can be followed with the tiny [DVC-files](/doc/user-guide/dvc-file-format) that substitute the actual data -(`music/songs.dvc` in the example). This enables team collaboration on data at +(`music/songs.dvc` in this example). This enables team collaboration on data at the same level as with source code (commit history, branching, pull requests, -reviews, etc.) +reviews, etc.): + +```dvc +$ git add music/songs.dvc music/.gitignore +$ git commit -m "Track 1.8 GB 10,000 song dataset." +``` + +> The actual data is stored in the project's cache and can be +> [pushed](/doc/command-reference/push) to one or more +> [remote storage](/doc/command-reference/remote) locations. Datasets evolve, and DVC is prepared to handle it. Just add/remove or change the contents of the data registry, and apply updates by running `dvc add` again. @@ -90,17 +91,12 @@ $ git status Changes not staged for commit: ... modified: music/songs.dvc -$ git commit -am "Add 1000 more songs." ``` Repeating this process for several datasets will give shape to a robust registry, which are basically repositories that mainly version a bunch of DVC-files, as you can see in the hypotetical example below. -> The actual data will be [pushed](/doc/command-reference/push) to one or more -> [remote storage](/doc/command-reference/remote) locations that need to be -> configured separately in the project. - ```dvc $ tree --filelimit=100 . @@ -118,6 +114,6 @@ $ tree --filelimit=100 ... ``` -## Using a data registry +## Usage ... From f4997cb020437c474cd79fedb4f6b60683a4dc28 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 1 Dec 2019 00:41:55 -0600 Subject: [PATCH 18/28] use-cases: more updates to data registry per private discussion --- static/docs/use-cases/data-registry.md | 27 +++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 90f6b4df35..c8aeaaf975 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -17,6 +17,9 @@ We could see who updated what, and when, and use pull requests to update data (the same way we do with code). This is what we call a **data registry**, which can work as data management _middleware_ between ML projects and cloud storage. +> Note that a single dedicated repository is just one possible pattern to create +> data registries with DVC. + Advantages of using a DVC **data registry** project: - Data as code: Improve _lifecycle management_ with versioning of simple @@ -39,7 +42,7 @@ Advantages of using a DVC **data registry** project: HTTP location). Git versioning of [DVC-files](/doc/user-guide/dvc-file-format) allows us to track and audit data changes. -## Construction and workflow +## Construction Data registries can be created like any other DVC repositories with `git init` and `dvc init`. A good way to organize them is with different @@ -57,8 +60,7 @@ track it, with `dvc add`. For example: $ mkdir -p music/Beatles $ cp ~/Downloads/millionsongsubset_full music/songs $ dvc add music/songs -100% Add 1/1 [00:03<00:00, 3.58s/file] -... +100% Add 1/1 [... ``` > This example dataset actually exists. See @@ -72,30 +74,37 @@ reviews, etc.): ```dvc $ git add music/songs.dvc music/.gitignore -$ git commit -m "Track 1.8 GB 10,000 song dataset." +$ git commit -m "Track 1.8 GB 10,000 song dataset in music/" ``` > The actual data is stored in the project's cache and can be > [pushed](/doc/command-reference/push) to one or more > [remote storage](/doc/command-reference/remote) locations. -Datasets evolve, and DVC is prepared to handle it. Just add/remove or change the -contents of the data registry, and apply updates by running `dvc add` again. -This can be iterated as many times as necessary. Example: +## Evolution + +Datasets change, and DVC is prepared to handle it. Just add/remove or change the +contents of the data registry, and apply the updates by running `dvc add` again: ```dvc $ cp /path/to/1000/image/dir music/songs $ dvc add music/songs ... +``` + +DVC then modifies the corresponding DVC-file to reflect the changes in the data, +and this will be noticed by Git: + +```dvc $ git status Changes not staged for commit: ... modified: music/songs.dvc ``` -Repeating this process for several datasets will give shape to a robust +Iterating on this process for several datasets can give shape to a robust registry, which are basically repositories that mainly version a bunch of -DVC-files, as you can see in the hypotetical example below. +DVC-files, as you can see in the hypothetical example below. ```dvc $ tree --filelimit=100 From 707a5071f552b01497fbdb9d62c072ae0dba0840 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 2 Dec 2019 19:35:16 -0600 Subject: [PATCH 19/28] use-cases: draft of new Usage section in data registry --- static/docs/use-cases/data-registry.md | 40 +++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index c8aeaaf975..c70170a3e4 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -42,7 +42,7 @@ Advantages of using a DVC **data registry** project: HTTP location). Git versioning of [DVC-files](/doc/user-guide/dvc-file-format) allows us to track and audit data changes. -## Construction +## Building Data registries can be created like any other DVC repositories with `git init` and `dvc init`. A good way to organize them is with different @@ -81,7 +81,7 @@ $ git commit -m "Track 1.8 GB 10,000 song dataset in music/" > [pushed](/doc/command-reference/push) to one or more > [remote storage](/doc/command-reference/remote) locations. -## Evolution +## Updating Datasets change, and DVC is prepared to handle it. Just add/remove or change the contents of the data registry, and apply the updates by running `dvc add` again: @@ -123,6 +123,38 @@ $ tree --filelimit=100 ... ``` -## Usage +## Using -... +The main methods to consume data artifacts from a **data registry** +are the `dvc import` and `dvc update` commands. + +To import a dataset versioned in a repository online, we can run +something like: + +```dvc +$ dvc import git@git-server.url:path/to/repository.git \ + path/to/dataset +``` + +> Note that unlike `dvc get`, which can be used from any directory, `dvc import` +> needs to run within an [initialized](/doc/command-reference/init) DVC project. + +Importing saves the dependency of the local project towards the +data source (registry repository). This is achieved by creating a particular +kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import stage_). +This file can be used staged and committed with Git. + +> For a sample DVC-file resulting from `dvc import`, refer to +> [this example](/doc/command-reference/import#example-data-registry). + +Given this saved dependency, whenever the the dataset changes in the source +project (data registry), we can easily bring it up to date in our consumer +project with: + +```dvc +$ dvc update dataset.dvc +``` + +`dvc update` downloads new and changed files or removed deleted ones from +`path/to/dataset` based on the latest version of the source project, and updates +the project dependency metadata in the import stage (DVC-file). From 7954f59e011962e0b3de26e913dac4fe5b612a3d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 15:46:44 -0800 Subject: [PATCH 20/28] use-cases: add diagram to data registry --- static/docs/use-cases/data-registry.md | 2 ++ static/img/data-registry.png | Bin 0 -> 38052 bytes 2 files changed, 2 insertions(+) create mode 100644 static/img/data-registry.png diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index c70170a3e4..a00fcc50be 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -10,6 +10,8 @@ package management systems, but for data**. +![](/static/img/data-registry.png) + Keeping this in mind, we could build a DVC project dedicated to tracking and versioning _datasets_ (or any large data). This way we would have a repository with all the metadata and history of changes of different datasets. diff --git a/static/img/data-registry.png b/static/img/data-registry.png new file mode 100644 index 0000000000000000000000000000000000000000..e254b0175bf9c7d48f393aa81a986ce7a24b3856 GIT binary patch literal 38052 zcmcG#WmlWQ8ZL|!C=6#yjG7aBy%K@^VrdaBxUKIJh_FZxLT54_!c9I5DK zUXRqK{2xCR-G1!vUNtwibDO1{9pCP3olP(7?j2uPgoBw4;;$c`PVOGguI}=i#&moO zdq!5$iaXpB>x&ADdZ+%@cFzs|>YiQQ8y%S*SlIT0&Y!yLEG!*PPr|@m(@u%?r_O3y z2j>mF^Afh1!04*t)``N7shp;H6B4Y}0W0%1+4G((mqpvXU}CKR*Z} zSw-uVYU}9w!c1)O@AiqmfX<1l>-)pq^UJgSp4pAm*|p+^iM)l)q1pA6njxpynzpUO zvYy$n+>VZxrkTy7*r^qp-WkaH{_x~xYxh`F9pp=F4Y<5|@$Aa9XR53HSM%UPW#9bC z?nZ0xVqM$tsTZJc@8sa`^4`W~cwVQg&fDO$x;p6M_|QN-q^ZAeG@`I)`uM`v1kZW! zPh#c3(e(Jj)cEvJ%}C2GbY*vRbq_MMu(vfIo7>z!x?T(W+dR4w<;=P?+Z|W%JIPn1 zw64EA)Al6VW#sUz_J>DnV?*Xw#Sm-CgM))S@UOnEjy7`spNU4f-qWMMOO6J9^|fZk zDk+#8Nlr>rY*k%CuTIPG^7l3KyVFn zH|{XPS0H?`9zF##8halE&I}ODkcRYj7(uC)SzHg_GzJL&3j(M4{|F*_eO$~FNB?ev ze|()8X;rY`tCoBeR%xOP0xIoSWOO4b2_;lqZX{)fG#D*SWwt$9_#tGd@nh3dsK@b8 zZjSfSOu56S%}1TBky7257wBxQ0>f3Q4G^TEFac#On)z7((pF%QV5&&fNQlwVVJLc2 z3D!^$Vbli{rDE4Z{yTYANk3DqGS4F+{$ikC+lBf&Cc8RyonN!Q-s*0H88AVnB&h9I z$CJL**YQDIFOGhp`Qs-?2SWR9aG^=?af&RPp@0|(E) zr`w!q$}4coF?)P*x0n|L{Xem)!+SOh>;TBKs=+@wwuUPMa?(6P>lfcG#wq5 zuR1!T^zv1R*`>Qs1?4w%x}M>5&3$rvTu8=1GZIbHqjw?IR<3%!rwkwW@_ z1}uF554EF)Z=q8D=+y6@!#BP+8@Wj&wTAewI#F;His&tmOF`H^;3SUVS_$eq)D``sY1)kTKr?nD!7q_~B0$Xw2o1Lz0O9`7foD_C&D( z*Ut^Qf^wfLjfIO64HLZeX!M>N7)0W0`rhpKmr^f%URH11Cy_BkT05GNU_y9m{+$-oG zu+D{P5gMHfD`PY$Ogb7W@w4Og(nvvEkp7#>>=XWZG(Ipmcd_SF%`Oze2v-T#e_IL8 ziZcNCbEinU%yiXYvrGLwEP7KiwsdPscY+hFrCeaR>i-`H>QTrIIRj59gGkIh)T~= z!;nwg$a3dvSpQQWgO&DS7pXO}MFi(EPP#feV+Xw@55We2j{}VT=$2Y8pY4YwZQ{U3 zrmZG)c6cL3**Pb=8n}Z26VUST*B4O>h=nCn9VMz1OcA=LRdP=T zHCHDR&Yv;;Ow$@quN5rBcn?MUomM-y@9HVvbkkp}TI@B>Nd^E`1AP)w9kAVeZB6P~ z_Y&0guL_5iuHzp;P{Swt{I>p8FZu~N_@W#brp(O3CYL^CF-ZGj8d?r(1}WP|s~%E! zv9;xrl?9O)_PDA#WehDKQFwTc%%iw54Fduky+|C7{N_XJ9FpSvW#nJ8i(7e!;O!t5 z@@b$9d^`GQuSy~6JHg@iGIw!^8xG?E%Nd`=T@2gxKCLsc@raNS5a-~v2`ThpNZu-` zy(wJAFdYSBH1;51w&e`PrS~;{nkHZsea!R*|2Qia?+k8Q3;!Xn0g@oO^+3RIw*d}_ z%b0|YRe<=q;I3xAX*ZhSc0oI$pjK{HX%A`LxxF6}G)VPSfGBJ6pTmzgA#3fJrch{s z!30$Fox!H>q%7D5xC7d3(p8PK$F3T`uq+fIywJrbGOzx~PF$Uf*0BUVSM|=;CdF1n zXlzl2o7Tb4b*MbFNGz7$C&!}JhY5JNazp0&aDFTAV<3lZ54pzcm1Fo^9*%Y}W0CUv zv|>`2ab>dQaA$uCr!^0^({ltS{3id=L(74)XZPN5W?@Mr5u$VbL2lY|6^s0puCn*! zq5qW*)5$(M{h#8sdEi`lC>(*en%3ESWP{*t%DCr)!}`%2a-#I~-&6%?B~jZ81qVOh6NtoBob`n?Nu#@0#G z(0=2rz%6W-d{FNAhvmIBTNeZEQ!|!g$l#pqhz?4qCam+;7~w^2gmYd1sVbO(X?W~h zAJ+Wf0UWM{^-n{``j{f(dZsy7%FWr}?bJcxeDsN@<=Ya@Om?fC5QDP>lN1C=(*-Cb zONaM(>QYg;_p6R4E8;thvqBPe`Ua!bW(J8bQ_Ub}!%Z!j+)h&bHusj6-*rna9^@9R zp}EsVl;A*7Z?qJo;q6@z$~KDiHsc37Nlj5suAV~TZO@;s}DxHVHUtcGgrn`G(1G+W)myeFVzYpG=Xk^dEWnBO%& z+K;?c+cV*mU)Q;^&fJso6K5{}p8a(%Q;m_HM?@6mm+UVW=hT*cGJSOw*BSY5Q#B2h z`A#Ocw?nh9?q(7@$=0{~r{F=u@X^l292ncXLQVG^1nf$wKOSyzpNL2;tbD-~d%~bKc8JJNF`#;VTvC)ogO12Qz9#Z+TMU)b&N-Tb<}ZDygcL3nnG7XO zSdA%eK^T3D^tuwOG0Gb!GqWnUvweakEnVG*tbHwCzEQ`%%2l9e9gmOokq?=&pixIO z)-d-PUVL5mC3DtKo3R42zS}xN-|OL4g7lYwVHJ1n=i(9aozBz`%^iha&&^0B#yo|; z^K40>4k0VB0QmAkfsm>cYFh!xsRc?BYnMF$w-m=0d>`D6@q!h|jvdWxrH_NH(J=N= z*#OUKK6eavzBdxxm*M^Kloz4~AYW*mvFfH>w+nvycWF)BQNYYkLRI`xmWQ(Ith(bV zwvPt9^T=9!U+Wbp14Fr*zl8OL_%zX3Zhr znbV!+yZW7dVG43pOCm2Y7so_v7GBuE{FM?38}*%G$fQ0mz}GE2dZ$IW{J6YW*nUv@ zJVEokKHoOw1izVbm6XKVEF3tA)fWZWWce6`pMvYST3~c;ZwuoNE?Ob;S#U}a`!x;w zq)KklK_V}!+u_>c)u5tL5FWe&u~ibpT+wsz;Ue~+S|Ae7s^yd8mDE5zG!8^@&I!j$ z`#&xQ;g>YnyUZ3OHdO}eT&(xRpy*_&mnJ?YOXOke(j z(>m7Ka%*HohaLGh`Q5B0ai!hxuju>p`fOR+0^Y~!%|T#1#KUk4`3^{4@tw4)w!z}o z1_#YB7C)u$BXCaoM)cchEGF2ioew$wvd%fuuck=iQd=c8**^l5G@AWlMNcgN!_Zv# zm|#yE^}u%)+YNLFt68ad{+Qi81!YitQZeRQ6TATQw8874UP z_^;IT|5Iw_6OgXucLSrbBn$*KUyWEEAjOVvg=Q88Dy#ASzCjGDQOHCfGyp^H&ho*|dfdC_;4lzpA+MDa!9WFDW z8&3x4_-XZOVo;fxJ&f@y#s%s7Z?f0i)kb6)qlsEgKx&GCXv)|KdYNIW1oKD0i~;@@ zHN?J+&_F-UU@O7R&bE}=H5OImiN(upcqQc&ZMu}&?wLy$p|F09AB}NS&2V z6UgR;A%3Gh~pq>7>330v7|SZ8H#2`{2?+C#u|UD+;{6O#{qy+fOXP@m8I zmEVyY6k+LUnn}Z?6?|)H19r@sU`Q@m2yr#&V30%=f!OC|JZcC1LcY8H%O5MN7$Xp~ z&HP0@hc9+Jt&+s2Q+sz*@URApC6+OWG9FhRrQ|e+?9|D+!9v{O{9=@U4YcCa#5-^O zfTbh?!JKYfH9w)nC4bHQD-*|mn7(nu8Ya6p9!a(mvIwnPgK91_3;ov$VD0p{@{GcW zy99ewaSQKIZw@o>2b`wPeN+aUJR27bSH@~r!^&i@d!66r4SIzoSl7Gg*of~oYM#`m zKN^Uyn)9}tGmYFPmINMQLH%*oo1O2-cCvN9q3gSzU& z5D0~RB67w_KXE%7tE<2!!(?uWg=g;nY$^hf+j0FyOXar&g+SR&qR)cL$#j~(^^DqF zviIU0D`{+#jpWv3J(s-!LvcXz$av+hQX{OP`e5v~rp_copEuM2as`;S-%-zmXvW6| zZe4VS3m9VhnfKLmsAFHiK(uTB1B1l0J|a{hfZQHVR?Tuqjk44JIpyyp44?0Htc5ky z!`IO;Kxo7=oc)c3k=26+&3MyaQ|A_1ik`8HELy6LqF<4xDRXYhQ6^c1cz#r*7Wkv; z+(7k&brQhiCl&)+gVb(mTi9f;)jrHDBx6|yA3;njocPT~l5~p+72R}(i}8h7|LDri z;0>ili96S3*(LmLP%T@?dsN*F=}P-@KQMOWellCk3Ru{1mi&CSBBrT+N!)~M1K0Fne6Jidz^);QJ!Hu#(P@a; zorANkyI6nV@7WIEmYvLbUG%)`28+&dklH+bmTh_%U$^`*Y7t`Nj?4F-$VFD}v-(#j zD>0hLzThv%kFrK`e#VfiOvGu{rExd?rAuty9r=imD5APGk2>JpVUwY({;xcJ`Q#qrC#% zyhh_Z@^n@XBTkr723Fv+@kH8$lg$OP16VgMMk{Z!UXe3R_b&@yfs_f$D2eFv&S@^U zGuR{XQF)Z=lq~F=bFiv_xv=|_aag(=cca_BU%y`5YDiFGxp5lneYfQP9#Oa3!@A{; znw+?W34=%n6VT<+~BdaXEGI9+ql#9ctRKB0z zk9JMy=y1A4ebk8IX>disc>SEUCV8)>zexm#d0Cu{JmJm9=KhOb)#;0Qv@3jEV|(w5 zbRNo;NIuGcoi+#vaIETn0@7haoNjvUN{Hh{SnQ*;H@s^r(% zTK>#fccjVRx=jdBt-XQ#Bo=qbz|bM1LC<3cN`B9Oq*@!SUiD{W;fq|+S!_s^8E0d< z3}A5ooZZjJQ{VpLzQa}~oQgtfbGa4v0kuru0WRSCsdy-mU|eX;!{VFzx zJl^CyciNR&*U$-NA`RMn{zSxgz$o?4I?v~L@BzQ5IY20XbDl75FSq!u$SL1$yC`Eb zFHFqT#=Mq)8Ugl~L_x~MvTJ$oOQtLIUsYW{pG&CEaUWcPKf)L;DaAs`l}~0K;Ld%) zJOUg&?+o&sWu3|q8))E-Ze6+OKTh`uJ4%Y?Z+!jd`nB9IoI(R^6kzJL{u_vHFNs6mB0EaSX+=D z2m9rG&?5M=wJ3l%A4J3MF{)btd&6ZV)mn5G^CRIo=Mxp89JXQsi@L~TjA!SvWPewGOs3x(GAWHd zA3`~WieBo5>p{u1#PzFUpW^8kyPKHzMqBiq^Mr7U3rGT;{zzBl%Ps_SEt3rx$hQZv zn3!%c1ZHBoP9DAL5T9dSBTnt8cQySodUN}y@A3m7arNaUtA0^?^9_x5q{PdVT0(Fa zVkqk~^Cprt!iszq(OweeUxlWD5b39Po@AH5{7|G=?E#Dh)X;6L=$q@Zw?FUrTI{3C zkW*>BDnD6hceXZT(rAYkODai>%384`F0o=8SzJ$L2G@=#UkxJRGA)zw#ZSi_3g5Qn z(<5TF)Bww()alFi-+nhvBR1c_xlqjnj2w%0;2qKDkw`dU`{g>(E`~{j6aM;5<{C#k z1U97hAl|Kw7J}c6B?DBA$b`Q859c+mpXZAAHS(~(e%jKjtAh~B@+f>MOIx_b8v$hgI_?i`YnBOqetB(M^75fz?tAEX#l@ zQ$%Yog;zz9{QMCKGM>~0M?3bU6&0|4Igv_pL1FYr!oaeB`6957+fAEbFIYf+Y)1-2Z%>6g{0!_VS{ra$4SJt-t6;~w^C;RrisYCN`8!cU9)_0!5*hL4 zK|`o{2SW1dL`N=WT2#Cf@yT#(vMgWsKShe>GKz(8Z}rd z!LK}Ge(DVc0&}V|aGhIHo4@?Zfj-db-LNZt<@l?jK>@ltG5}s9q}0$hp_lZ@;@qf& zu>!-s=uFziz_JsvnB#UhaxX#mp+dRuG786KRn(_S2o#4m3`0L!94~nouzoX|QGH0Z zGtYS%lBPGEFM@Z#I_N}N1|Gq&6R0Q9JX@T*MU_Ad)Jf~2vY}H2cT9*L#CrVP*=$hJ z()kEbyqWx@=~5v^Svo#WirA^BX43h+UTA{|ZTW)_%Og~w4?Px}k^5tG8>>iHh_7Q! zd=`KL>!B_PYoQ{tvI2+-*g+t`k7ZYIjPKu?Q*4oi!^w4 zlAl}9KCBeC{ZZ0w>`#MNq2;#nkuGCnzMw{xAXR!>4?(R{?27&Vrg`s}=K^$e7E{Br znD~TO)LA2pCoP&X9ts9OEtUil#6<^fg;I1|Uiu zomx*?_YL80*=*xFHP|S+CXBc5gB>XOr4@T%v*DESW8Iu|vP}#$%d!YEqgczdNY|({VhGLnu+&k1~rF+|Jx%hD?N~M2G^g0?$jS-2B4xiBsN5ocTQ3=BSg|p zCF@nSHB;rqd|mY}Aa1ijL+%^Gd9|QA5&`LnAR43KY8(@ylLIH-6vc0FK0k zZH<-;4CE0L{CHgPn$htaMG?1e^n<2CE?6z#pqo1Ft7P$e2BoESGh|Sn50|YlmB?Ys95Cq0`(S(O?@zezYQ%f6N9MwFda|-5=D`z zW@dyOW%uoR@@A?C-rW+sp2^P)0Ss5GzZz`%B&+gISVMHvaA4QMus83mnt}X7wT#y)U1NyUxPUc>a$ofaYhhAGzD@z zAl$7Lv_RbT9}v!&l)OQ+*(~0aZyMSB7lID8c1gW{0V8@Vz<=$_eoMOn*WM%u&0M^H zZSK0F_XMM$@C`$r`Bj^PU!a_gok-84%IpVTli2^*|CPZ|O^ zmp9&@wf`;$pV~+NBj`y@5b9n;R|A3Cb@7E?5%@}X&4}&R_}YNJfzH8yQ(6jNr(C=* zgDlh`{nN2EC3<`s*b%J-Xd&1CiAgZpHPnY!zZRGks|DBtV!1TiR57RByP$Y=VU|Lm ze|Y|5XzfgbP_rV<4`>WqSnxmAvfk>1H!HN*YQQkzYV_n5h#^6~R#(@N>lKF-oc zO~6xlh%WM8jCNyfqYg2_OQUh`r2j!dlTGm>+{TNlQPF7{I>MJ`jU2NBdgsB<-d zp4<`GU1rY@Z%kC^3an-QocWwXe#(=Y(8$@Xb&ZRUY)nMs2CRLCct!F%BB~P(49PNg zAWFr24n-IZMURNwd7tDTk3NVu0;a*a%jqqknbtsC!1Qo~QuMygo*I!D_@SX)?C@o- z2@<06zvm7Lz=jOkk3==}ZBjF==fHsY)lqNMqOsI1`y>q*Lmk($HiW)SdWQ3^RdAq# zh%J8x<^vc@)PTuaS+`YKs(XfHY*D$k3m-7l)#bKwSwl4%RltX-Fu*Y*AXVBOINT+= zOE)s+`H!;5KD-dBl9{_xl`IAGPypr+fOEHeIW*Jik<71?A@mzB!Xb9-iUo)GA0A2| zz9X(I`7@c2RQMqfX{N&mjw&PCXh`gT7!I%%as!w`YIzb3VM#DFs(zjfT6KI|P%fv? zG;OM&W?C)F>3iUrGR^P@j&c}jBt%i_01pVtC^Gn5#swrYpfnx_jV@N%{R*>XG4E@v z*i`e(HHIYtq~gE)GZbhaGN=r4V5M4D_>pS+QR>^i8f_%Rp!bjmIB82lQTxZF8VGfb zNc;}zQ~xYNTvE=0At=pJB9#asW%ty{M=p2dz>rAgZQrV|uNME}#lV8V<38i%#vZVH&Rg zFZQ1!%lxRMX#UVtZMHIq^p+?P+R3Du#@Xvw&ODa+c!-F}KtFsL@%1x|Wu2Gt;l%R{ z1SCdLw}xUzBo;&XwbvPw;u;}JPIC0AM(`O45aEw_`$~Y$w6|r9u`=)Jx76sFUg4J4 zP~KaulxzrdCd8jsy)jYbRQZc0uLGL6JZy9nlk_jL2|~E{%b6A|Gd}Fq=sSJtRB>4U z5G++WqZqQ@w;H^*t@z{oPY+))vurW5$zE!oJVZwqP;fJW3#Ib(0=fk`Ag>ne`lCiU zRdx1?-~r1bj$b3_@9?3CEmfLrfP_+$S0V=57P zca)^SOqGWbBiLH&L*e1X!$RuTVCoiA5yXabPYI^Qz-ZtdBzQYodN_MibfOH1fAvx4 ztu52RF{HIWRXMc)f)zyKIC#)8zF++M1&D!W8ntFxbt*K%dSEG4Ii(m<;+$M@Dz=Eb z9)M&7EPJ1*W(_?D>A@Z;IvBD`X$!8es&wuGJ+RoidSxYJ`n+0L0Xb@)aX%?;JA z*BS}`V?SE}P?6|>Y!e#Lzry{<;m6jsXPOaNziF zKJTI14PW;klZ^gZc!#kVxKn+#cjv%yjj{Mjs1cD^0HJ5ij5=^z5+aw{G_YN*nRUbv zmxWclsOOzG5*B0%c@;C8MH#B;yvIDs&3-lua-IYMeKC~t53$w#&U8`#lWGP73eTeU zB`?q8;sy{W`UXGZA0#~7tS`#b#YH;u-D*S(w$oD#*aXYB2nAHliJ_@wPOvwVhmUZf zLJZ9xKYV5XE60C{W0@Hj>yPSE>6p`!E3awiQLMD+w=LY&C$J;GU|)PU&UI609VKi_ zyewi2dMbHjs<0(4%q|0eJ{Vk7RV|D=v>OB!Pv*uz^~HS{E3E(f%x`d6wJR_WO8t=? z)S<{kTMqtio9I{!dFPNX#mwWbQ=hPJILlcQQX)TxmLlezIwM@n7JcXE50n+nq&F|S<2l|Q&~$wHk#>)TFJ|{>@|@KVnlR z04XpUeAg^(xdX+Nfi>vS{wEpV1IQ2;Lli*U*XhNv3bY`WQ3kDPBCV$I2TR6mm1rxk zELO$_tw!b&HRV@2e8{V}YzECtN0X%qcK(0*;`FONn$kHhi%w1Qq-kryAm~;+xe^`* zgTJ({SN>Mr7RK7~$}J7jb}+HAGj2;4r9sm+SBlHIN*&Pv8PJt%yxtfifHnb0pV1(> z2mJrbMgGYZad<#112aam|LkE_0-lT#bE@kDj0iwq7~8@P?7Z4U-D|$ND*`A!XD)q}BnjrAX$<=M_J#QSMZM@WP_F923Ff220!|J8T+TQ0RL zS#{c$!@C`R<3A;Tm_9b%1sCj_=*(3l#}Zvb7Lbu+7~8Z%q53jky4Zw&tg>v#Fv$T^ zEG@VNtS$Wn7e0Bx&9ALiOh3~wR%4nfor`$2jwYf}__uSO7niKLhT4HJjuIXxv0?7( zPzB5Jr2PL<+O2uwYR{pgbKW|v!9Q6M@hvDC!NTB@mCH=j%y2aN<0wBhs7PK!*T4jPaZW-{vu7-sJ`sq;N@gT3{wg;^cEM>?~j|)(4CXE z!JeCuu}Mb{2A-Q=BN1dMC~b&Tp^>C$MoB`h*g&}ytox{ikU*n#V%^4!MN(;yk)(`g zWD->GJ$b}Qiuqr-2V7qEvN2>4w;Ywn-8bSj3bjiqKSgt&66+Jcx0)roRzKICnm&8a zaymAKawp$J$~sa*B(H(>F>)j`DpSf*PMUXBlv1C3?|p%}nsr=_>|w6#)XMe2uE< zzob7po^wr8w*}L0v&*%~qG4_)0UskP>p=}0P*!7|MpZJ4v{^3n0+hb5?q;vQGInVX zo?Su7j5~ed(#`n{fLF`plUgtl%k>lt^0&wu0Q4aO1h}Vnm^<5$>cNzntr3BZq**f4 z$Z^&J)vATHb>{oBxBFnhx49;Pk{po|`%jQl@xa<|FR2hn+-8S+AOIE(P6m6{kj!cr zCbH`1@UdqxhF&4F)7yY&Xpur~GXBo(3abI>& zwvJvChg$~(L!7%AvzgOUkJc+0itpC;vm9bK_?Jr5cI?Rf;rQ-M29yrM>#>|6Vs@*7 za8h@FbCZL(95#Na{RoLZ8dQyJxd95t&p8}gexdlDo(#4nI^ z;71qrrj+sFYdMv5DE`?>Is|I9Faq^>e%t|eD92QQNWW_{Y%?90ESEN_(5ukQ#ZGcG zWG}rj73~M&_6(>3S{7OzfI!nf^4C>Pa#R&hw<5B-RSa#O?TUgmMG(mO6K^3TY!~q9 z+Zlfq_|Tge{y#OiTYk7u$+#H11IeHLndBvG0%#lFMSQcP>C_3q0gRmCrxOxM8Xx!V z&dOLZ*_GZrwylvn&3Z*V<;Hh$SuUv-RQi0-xJPC!Fwq63B3$-nniS*(P%v6LOM zZplX&Um`kbJIZ|mk@5~{$^m0rmM66Vc3ZqSh+-|cdvDHqR2Z4Yc{*%hZRi!uDjPv} zqObRIgTeD~X?iY#mLIn*3?p30Q%>~8-S(K6ObnW;m}S~s&*qKeb(e3G>weAiR_S^c zugGcuV0e@1|49rOVvuTV2zLemN4{gAO3)@Rzs%`3j)@vUGaM&|iBu zi-}=^ab3+g+O3g;3`A0Fa=gEGMs?=sgQX!4aiBJqrSH>jveTV$Qg(vpkQ6R7gn}zL zPR`P{5U$EZf=LUu%Mcs)`qUf&fXlle`?!H_E+fc!GN=o0Nv!Np4D^?-{+07CqbF&^ zX%M6AklR?>Yj8n;qNO317W$J!>sLu=A%x$;{teN)TFMP-OiG_Jm9vCO^+1!3?(|jD z3kN3avTLDtOck$YX+>u!;QVeO+SMZUE6M|>l;9!5eu4w z&f>>N@3o6t1JQW7)cow@G!`rUd7JLncgBla1Q?eBWsi1YO_$nqO&-tV zqz@eN9J+pb82_k;gbPOQziRaf7=S)c&yh0Ci(q#w=621uWzW%t@}vJ1-VJiZ(>y5V zSzL%SFYPU8Ahg%+l`D!$lf#u@@9f1_e?-lmF@M#$BQ-fOHBKYKvr~1>O0b- z`nOMGCM>EgjO~hcNM&Srl>H(>9Th2q%j@YAEWn6q;s0hqZF-XRE zwQz6fK=T>%{`ItC$VB^{U@#jJbC^##v%1JeF646hgzPUGi;vQ54NvSKk;~v(nfjo@ z-~eT;jf_zD%Ev6W#s9O~mF`=2%>$h;`w?h`v=pOqqh|f(m8(k=nhkk(n9nh@`f2*{ zw8#b@NOt@{yv?JY81EVh zf}IQ36vkSxg4?VEHNH&Z2Lcmdp4^`&m%1a@WhO6oAfoP*gx#w31l?B!&xKt&3Svqr z0>G`#^7_b$wid9fe8tpg**4;JK5yrM&?Z3u@fbdJ;6#LrVf_lR9s#Osx|`jlw? zqHpv56nQB8D)yM9+SIdLU}Qs*S>grh z@`s-)0oQYUlv(cBk_`2z+wJnd`mZrGCM?Bj_5-xP<%Dbk}=cYKnZT%wra~CV;S1imUwaiR14k+jmFIk z7Qqg|Y_=T7o|ilOth``Vs}G`V(^QDRW&9&mWY^`4C}X6eBGhmeOetSnJp`flntbY6 z-#8Xo{nXPmsenbXFhT_bg&Jj7w7E&oGL+f1r%=_CqtYuMJ>Fq1ycCIr_alce`Fbz( zm$IGF$EA&Ml0_O9;;lh~y7I4JYz8J6jvo3JSQ=)6J)H^bYp6r+wZ|0;;$R^!bTWF_ zw+VG8j$l!&M-Xqz`aJt>+U^cE;vV`$dyl*|gkKl)4ef^5ls|4n0tlNpT5+TvAB5%5Y5v zR)O-0Sy?!LF^Jc;gMUx6_*YwG(gw4gQh;oTs^TJJG}&Pl3#w-|n}vPHf zU!>6p%eEm@Ia{y?k^M}*jn^rtvaVSY#EBf(BpUvW^WFms10ME10J~eWVA8;W{?qUB zRDKMObriDkcuSU;BJjL(WAwwP>QNo*_jrZzSQzMuqVJcEeo%Uka9d2DhJJW=_%npY zIRs%GZmBq4K9y&<947$LER^G5ISDM?km}JM%^PI9lL;3xNA;3u6nk?}!cI~cd72?% zMfCMSn}BV5Hmy5olCs%4Sb+2`~7!rK&P%W0~(p$I8#^eOOy-f+M0srTbK{XQroZ z4cAR+#8e}@%d8V%)SJ#$7i<8h(wAE4o&&P7jnk%M4LPrxmUjLrRbus*2dTfMiL)&m zLEhOyVjm^>gOI(o%GK}WePS5_bvo9tFWRD~gDwk6!7&U5E*`W*jk=oIxiSFx1wKTB z>M>bQW1$f%!krbU{o~^WesK2ggzha0swzBUP6(vX zf5D+epPVJR9<8y_@ryD%KMyk7UNq-e9|*?Jo{HcVg3!@jqXOWs(+?{E|4z} z879UBB7r*f?th1pa(dlZpA z!Z|hzg2j#-u$N8KI&;Zh53J2&<0Q+D+K&OmEM9*2Lg$#Q7cELZ^q2u~LCa#rs=v}! zWOdglW#U~}t4H+jqgj7J3W^N=8j|C1#1h{p)dluHD?g<+^n$lQ`=fSTJqz}~#DPO? ze_1EY=4E8O{0b(uO>-8R>Kgw@0sbP9hQ4Ki<_!Os`<%-ocGyg1VUz?BZ`Ryi)-2_1 z6MkFrO(*Nk>2AKt{TpG5-~3IwRxP@P)%GP!4p=obmzc_uN8#p25HioKL0X2`EK(bx z-=#dS8o4ne>B?rs7mp}B4E^3<-m0g01OVseIKSv(SW%d&Y|;qLhTXsQx*wPqYyJeh z6J@mMMJQ;1g|j$svSSpPyopkY@5VX*b|DU&OM5qkiZPaJY=z)ztuD5vB4*C#rU?y2 zsr3JV7By#ubKz{7gCImDWi5OG$2%HWhtAg)l*>xw?8pIqBbBaZvNq-^W*+1WfFk$@ z&>GJB(9ukX#Tch-opO2iQEV)kWH!W|6vjF`E;8b1h(Cp!=dQne5L)Jned-HmutOUV z%&oSwXdDxuqtO+oJ>zo!f-cseEbMmURFVnIL1C;*vu8)!*~Tf5h9RS(=J+gFnB9+Dg>)+y;1V5802;9JXKK1)fsu z?EoD(y!PMOm-;m~-k4|62QoY7SD2)cra%w9_={V~^lLUQl$73))X;@!UPEUN45nFmrHUeHSwC+}Ssd3R1^z{4%7k>ncrz4P@Kk#7 zZ9qYqMs(`q*+=QXzKRh>4)63z$ai*JU{O#K!vQ&4$>^M`^WO{&TIVhsZ1ZwBAhn0WAPCwn6JxoN@iUun&d<`Zv7Rg9#OO{Nra3YVXr;g-M7 zlPd;yWsNj^!S~dldF<+P{f1M?SurwMr|g|No5|oW4cNoagoswK|M}uGuZQQ!bFD9e zn_HOVCqK+63?&+Az>#}SVK`{QcdMKaX-KF?ti2kB6lh!V?eYnK`SPvSnE*)Z%n5wz zv#UGiF_jOHCaOIUAeG1q&hs@o$SmW!DNhWg(44bw=RNrTJsk8K$@f-$Q`6$ondeZ} zDl_n}0uihInO_4lR3=;r5QI@VN(NKOS|HVnW>FTy^F{%iq~VouwjoUO`4)yQs*-fq zw)fcY_VrNW!txmJEur_UCM4e`RFVKf`@g2P1$(todwM=m>jWZA-@iEuUCigqX{{$^ zP7jmtcSrgEID6};sJ`z3ltvoq5Cx5idm25FF#?i2}`p}S!a7`jVphVG$} z5Cud)KtS+~-_P&;{nmPau+}|$cJFhz_uRYpKKg=ae2*frwN0#skO7#JeYc#2eV#l7 zcqw`^(EQ3r2X}$^rQx=&iuYT0@T~<6NJ*)}6(i?3wg&4**%)l`d2jI#;0G^7BXHj= ztEO<5GErA{T(jpR#|-Guu7Ke^c@YodUnga{L&7ZnCW@Ete^L#5GNo^v9kvFSzXlG{44J^DOY)a-G>nX zz2k2UgO4gG!BlQWVyoPjQbT;-lJt61y9Kp;3g0}&8WUS{t@0C28q4~FHNUg#KW(e8 z?}L%aFp<+6W8OKuSzjhkn5v2DFH7EGUr7{84^z{om^|6NAqvE1S$CqFa1d#)olfK7 zJmKWRE)rB|4tTW9>DCX)hQ`sTZ`!oHu$&@$oDEkPKXM|5%LzBv^lw^4YExr0CMeQJ z9vX`pW185SwIs!JlhdT(xwtRaKk) zP8_Iz?L=ELCN{r4_qRh|P&A1*wAcT876m9v_L%&or9q!A-jrKQcDZx(lJ>|4O%(a< z(B^&Y82PBzlhcpAWH(Y%`ND1O?6fR@SrtoOD@JFWbN;2Ud?!CfxibJS(c;$+V4jl| zp4N0KocNesG1+?8-+DD5yNaZIE5icrB^g~|a@qJqe{i||scY?Y0~(lnf8qfS^%AHY zZ8{?wbu#9BVxAEAkR656qntO56YL;YU3*{oqtuhQ;CYT<)GedHZu`+gEv`Q~!KW&? z!mp$rJd@G>E#kmOUeRv|9ugyS2HvQ-m=Tt~$>M7#{#0T~%9O9(bfuX0Qofb*&6q&F zg{P=g6Yoq30x$jaz#EmfeZ;U4W6wFO{fCJm#R}wmFbi{4XL{!jTC#hSa&qrr-hpj* zyrb4!OP2hul~2l&eF9B7FTl^-c|!k3rkL9jqkv+v-}*%-M2m3*>wUqrh3>Xlg%`bB zWB#q<6Q(TQK{(>UJiXvziRNhrG1=SisgFt&!QwM|%f z6S}cONI1&-se?b-W1q|{&UVrEp8=bH;e}4;cg&`{wyQu$V2*1!opJN>qfXm1gYNLU zAJ`3LnNPe)i(h`pE}pJng)9nlL>H(HHcr3}cq$?b5BG~LfT0-dAAuDn-sM!v6Fyv3 z?7%4_7gB{6$d0$?M_b0}kezg_NBNS;9_bU@%>&F#;r1T*CqD*H+xPkcsQAY5tPN4y z4BkE2w2Arm8$tOxj*hdj{9`u+!pr^z8%i!SG~7vUD%GwDi{qZAc`=hBD|))yX^yt# z7-d=B=T&SDv2LOobkj>r>gLB|Bba*<))}LGQCy8R+qt+C9IH06m;M7@S#KCZhqV6Q zbL^56bXUPKTMr$CD< z*_xnr2ZT@J;)-eF5RzKAxo_UL_oK zvk^X^a+=y~EJkj~-r{aU6AW6`kNXn`)Mp%s^*AsIc%ndy+9G5<$(pKhloCjn9EF zz1uGYA%!!*OT=+cP7c^gxf=0oJ(upH*X$aipk!F8U2R1Q4{1FNb8E_f;?QelqkZ%t z`_jLFW>BmFp4lmYF=`?(*vZGZR15;N*}^Zjw6+_5ktYR=aRVOMM!U2&$CvdqlZzn7^~%Rxv^v1 zoK^g_58r-K^}jG%@5lwasniDWx(HS3ZR<1`nJT%_AHY6g#F_&sQ*9I%s$n=!<~*lU z$6-U4-XLDN5vzTnRfo~*@72+Wt-y4x^TE5&D*l< zkFn|iUOGM71Ay{_%cBhhjqP14w>;0?g>Hf*0q}M*Fizw~@s9=YKJK~Y{LKdD-ZPX& zeDrHWly3V%Fc=>^mpG5@$yZiTurEC$RlCPj>7&3yRR#%|5BwQrR(EnulEH|y=Ui1vTom__vNE3hoA<~U_}N4?r&kSnM#pl@62%RY|? zGQqVu>gXATkrV%>bJg#%q+!eA=s%w9LJ^C5>`RLk9 z1R2(0V}AH!_+l%Ghmoe2(ebqSapfLfSpV6X{j4H#qAXGh{atwEXH&qO<`B!kLUAtl z!rO1R>mADiJGIT#yuPt>71^)cKw=n=m)oQDuKCk`hacRWe(a%h`$&>s1ut&=_WfCb zi)3HsQ`X;XSuM#Or$4Bf47{uK)Hi2XG@qx;IzAhJ7zm8``)+Z$6gd>FV~F}G5pbWf z>1)-N_u&`I@8C%8G|0r#6TVz(~)Pd9~UMcW6_Bw4|>ryqb{9t{e9 zxG2$ftnn-xqjr4!$d$z;)^LEMOlpiao*R4c>f03{mb>+FlACsIQ*-7}Ju}q&H^C># z$1Q|v+G*1K!t3ea!t0RGo*2MGPj3FZO7HV$9tCn9Arv?pwY6iUOLIM3CSg^gYgQiJ zeVOKeRZs3qMGQOI^YD9_8{xWAAC(B$5BI9+?Vh?l2byiYTV zgAAu5j%@&H_|G;Us2toQnpbkOuP=v^!^o@e^F_SU5&?rXQgt?<9X8=EPBhk54~-2M zg!B$bX-Wp%M1+l^v+f9M;J?jUB&kaEW zE~k#UO?Ywr9Abqvm@q(zmHWAsK@|eAd#vx^c+2d<@B(hu2v60|eo0-|z;oxIx+LP@ zuxJN*!LA(vS}vio11hMrGochYBus!sbo-9^&LKm$`{> zr{b?TCW^2J*|PhY6deBfP~8g87xw=9LdzE9#jZ`5to%H~=|wVAh4#f`8DpB202`Om zfw?^;mDuWWs^}Dm3Qgg5hEsQ>hf`wP(`BF?f`Pjqp_rueoKIwvti|(r@>6TPiU%XJ z&j`C(i;3((5_y3(ULLc$T=Pqs7pMW|%t-Z<0spwjif&N7AFMm<1aU zb$r;#p80p<+6j_BjR}y&?RAw5DSQU5S%QODlhx0N8qW!`rf{A%<*(yncA%EA4xywt zvdKrr#&P?}cy-58IYWA&Q?#~Dwp{aJ8a0hA6{uBZjj6*3j zBSzm#ZwbfP7^3unD@N$c1w0&LRNj_>vM+{8f2mc+HU)yTiLL>j9T@V|>rMS#e9dSb zzvUgMKx4YUNbo<(=OMfJ+QL^x#jBy*91dY;jT}T!Jm#}OSE3X({3|u-h+1E;|2Np^ubYHsk2a2}nn{3+a>}RMdV>)t_pqI#K7jD|?r2QlidB`&Xh1GW0n|qyGBOyd-!fBkjB9{-EWk)=cxCW{x?kYUtTj53Mbm!g z)#>10tg0ej1F2O9-lfY znfNW4{x+{J7)-DLMgag{=v|}6AfjV~ir^>`ER$B!ewQ-SSa8gL^EK$ z)la(!S;yzbOL2g1E20ZY#s)YI@K7PkSIh#WT8TT85KPEo2HF@?l(So(!mEWj3LL6O zFESf~>TqXrF=`dc5S@JC4^RFsG3$bm7{9*$+;z=EC;k^3;6HO5GBkE5HXnVE{>%)? z*MiS9D)mJSR~>V~f2Q;ZgWY%%u>{|?`{=(ej?J*4%6UadQN~=B_KHnGunak*J66Q% zOG+x0q5N0A$${VzDo=`vNyz6g-OuZ2?|?(Oi zw@1|HTyDZYsI@+P+c>FYe)N~Uzx`Vo8`Q=G z%w6`*MZKX1Y|be|yM&tD&y<}W=}%Gwx4{h3>-sEn-u`-w&fzL?b^EX_)XQsBqcp)Z z{H6iTm8^K0h%3NfG(kYQ?jF<{BO$1Knaf} z7o=1R9ZLp`GH3DyWeEP%Cw*r!JBGxw)R@(Rm~Jua)fj?OQBfr)k&9fy4{OuZ?6h=n z5sq1HRk`S_4~icY$R{cOw8D_-S`A?D#I)MMCvLt9NN39ZoD(RjS}_hV_oxPb z;Dhd~$r!(0KcRD1Ze4ty1HSb6Lmok6xa3LXxz<-$oGY;a>;~wDz$qx zn6-l6Va|Xfvh$b1PhI%b8?kg!ntF(~a5?K!?ERL+H$%k8D&B@zR>Pz*PbC)1x7B#j z?dFNiQlN0>KKiFy&!)FNQDjqwS#`y$CBK7Sv1wCLd=JSR+<8v!(1!xbp!KbNtT^6T z3UmXO&FQ{w)lrOG>5)tV0jp-=W6znKnve6_Da~Wz&ni}(o7Q|ieP~&Eh6`9UG$?W< zwDT2PsQt;J`y#C8!67X$4exfvaj7z(9kA?mXUQH_`cnozq|ynL>F7uFIJ|Ux@=b1= z_+H!E(5x`?Pr!*9@p9=8w|nCa5X3`zbY`#3?5vl)&1q+ zOR*nP5iban2ymy*q9Q-hcC^E_)?~T_#o42Vr)8U(r;6KLpvmJlgVC;6Z<>Jb{>npX zXMkS{-X_JF@vV&vxix`m_0V4pT7Y7&Ri`J+0WNA-clbH_7+B%mn7P9yNU2{3ycsa_ z91t{yEU<)Y*3db3diE=Hzxb;67_(!c{`104q;O1L#4ub!`G+qtw1qQbfslvef*vPz z=~$+~Y^k`yl?ym3g5+Bp78ZA0mOm*3&lnq(_5~c;A@L-|t|6(p&=C}T13r}NJb12# zat9XR+;b|6D74F)HF*a(1LJbwa?s-^8?kHyd}|-nJEWA?he}_#ZlklqtYsE{!ty6m zjdNU0`uWzfKD`bg^+OJ}7ugZMs|CHI$Z@0gDkW5mZw5P_&BIAWK|8y%bLsYrfHr`` zH-12A078KbzZM*JruzVRRSbs13WD!2>^8qX@5HQEzr!s@RW}0M)ZRo=%RtIKhYk?Sn+n4T=YDuId+|5* zzHsK3QSZ8&VM_N{oT5m+AR21#=8%~|b^W818zi}Niz`2K=I`D(dyzC6Y+D3>H?ggyR8SPZI<^p>aqf>D7SoybWi7~sUgo6@kl`aa@%X1HCEiP{U(Hf|37;=K}hRGO9VIPJbn9HEH5u-QdjaEBm1{Kp7)+k;$Gx zpuL&nP44$!1ftVlx7Fr z&IWC8ft1EOVW>R2GAW<088qaiXA44?nd}R1P}6ookFiQ9?PtR z^+ccp5MNeFi5)4baTCavLyaRRtG$~B`q+1z^m?fVCGeHNo#D}-2MbX@|Icyw_|`zU znqMQhPfe8fk?b*eI|W2(jSw0)6A3a8Y>M0-^4r`r6+4@6&vK~Cxo1JMBijsnXOAk= z*_}%l{Ygz%2vtU-Js@yA0J%S7BcQMZf&Lo~G~Ts2UazUx&S@_!{UsKO8+|+&cZW8} z@pNvC2G?`XvvCUEE67?)rU?f`Ge;IZrW_* z}`O**Tq4bAQm^PG3b7h9ku z5}xjW?`@3M?lV%kQ;?Ae`MuNNJuB*Aed8mT;EM`tkG&=UQ2XuNOa&e#5g4eE!yca* zf6md=*Z?a4p=`!UpJH`|p`niwxTdp`V?ZqcWQl{d3##7npVn%&Vl+@(lAUy>LHIv2 zQsd8!@w^&fAEKzs_1i5(c=JYhUdDUur&>?v4%CWuWUidP^2XyY z{G9(R(0j{E3SD$h#8mftjr?AQn#8f6nAhbUvRE?M9M%qcTr?0U@vfMbBMbO;F>s6w zDla&lvY`>f6xHGStR|m13I9u;G9$*^MvyYI{vlDFOu$}k?6t4h#dgR!l}+aOM4;@H z5bf@%nTV-q(Jw4^X&6X!?pTKM+yxBG)StYbU>t_y4?tDs za{J%SJAtFmVQiT^VnbaS(v`|mv>CCO-dIyII^o9Z0OWLH%t`m zX!pYdCPF^#n9H|r{H4O_7w8!gLmTJHN~`V{fUL>6fAD;G__K32nNq^;jR;h88#|GMqVV_e_2QY&x(Zc%K(8;r`q&I!~2)JO@jig~th5YXzH zl%&EOdRcAZ0Q|t8ZUo>J;XV8I?M+@zOGm+yuF;Z|(q`}qVq4%_@G^YzhV9 z<8!C4D=^h%*sgI;@+bB*yRfN;{7;NzJjqyq`;*ezy;V~Sc?6EYmYgYuQRYQKpJA}0jRbMU;laZmcFHQBM&fU z=_=X9IEeCpc|0a1X3CVhS(zOa(B9m!__IS4NvbkVsJpf1LS#B;ea zu%3J%vV{XIydx{lnH>JkM@K+6Y+cnFVIF-xPy4@^tzkK&RfckeOC|}4hYVOo%7^BR zdrvw)L%uZ@3>5$6i*J!fKmEz=p$#um9+`7$8n@f+?-ODFM&axU$GccpqsCAo9H1Cl z<~J3B{mO5Ge2jUMSG)`s8%psof>W5_Lj31iK$7cC!eI>h_2+2H%s~m9;J(24%NIr) zl)cH9EGt6uE~Jg1dPEv^mV zyVXC$gQfjarrnF5J1a_(mw{It-}W1Dhibh_I_B2vF7&C3r{@Jp+!ex*KT2){f4w6w z3I<}@V1{u0Vsrp@%e^CsR2)n)OO9)rv^a2%zr}S1Nv~j3pKsPc#JG{`_vcnP( z>N9$uOV`QbxC5R;O8>RxFZT$?G4;c<&oDiINYmX_`2N00ACFb0r0es)$h={Ux}JaD zc#K^o^Xuq3Zn+{{x-@V>D*JWhGX8ZbT)pH0;~L_%togc-N~N>CWrpXC$!9C*LM7f5Em%?_0A;aFE(7t(z!kY`&!97gM3g#{4DO zR`Ft`PtTDqKfzW^O5q@t*CQ&WC|8TuJG^Ji4k$f&^kT?4vHy9hmjxv$KwJv8Qn#x3 zuPJGV7OD}$0d~m9<8n+akF)TiDTh7Qx&{W0l!S^yiJdgr=uAMawS%Sd{*a&~P|q>`=E%pZTLHC-#ArLU z5--*W`Jh%7^P;?4bice%;SO0KPqsw(5vBM{O4AOXc?|CSIw4SA<`K{jXU6>nrE*g~ znO>8v8I2@wl;(}AOuEWC!*`O_i$xqtx@RyGbH7cz?3aIV1!fB%zm{=^;a2Bw)`{UT z=CHhDDL5{Fow9gG%dI)hYEpU`k-}LjNL3};n))d|@_S2?xU-lhlGX98*qu{edW<>t za0dS88rT=Fw=PZ0Z`<9tIkm(T6xQP%HjS7L_g7!9&1l#&l1O}f8L&An#Ek#@JFN_- z?E(hpF2mNgyc^I#hb}P=)DjfO_dm`3bxu!r9`5HiR;=xgTqj>(Ke9`oS)=BRkAGptVZN8jlcudpv)rMZPoD3p!(WTjKoZd&0*W9r~KM0%?i_3gpg;25fPusem|@ zF&mZd!4hRk?ce260YJhCS8aObI8yrWGvM*1@ky=kdt2Lgz5X2*Z#!~?IX3UkgXRA$ z_6qP@TWwwUBvC+;H`uYoo01HntQzr!pG}Ats*V}sOUXuw{y`oYiM{&bKhB(rs)Ce{ zQGDS%qsA8HVJy845Xb`7!>NFZfA;Rna8!60Yun!Pr2|hTS5-&R&;^M<`lawQnTNzY zl+LsMi<6fsP;&NG5OuYyrZ~71boHI_cSe}L1A(5cxp5j|#MfRd74_(JlfSQWjH0B! z6~+olJ{fV1o(PTM-ul1|j~?set9=dH?Rp{Y53Xn4rbV;6X#=5a)`+d?C|@*fv1~J< zK?+|?hS&8g;9kMDnq?DD`yz{Y69FuFpX(Dl$~8guJs?3Ha9BUF9GaovTMs%e1Jv)$ z{TzWN6Tu+?G~)Y(>-L3e*X}mrya{wW5H!PAKS;1aJdE?#Z{G!gOHb*>Xi(?CipIhg zB64ui5hq7`B4_7j1jfhv#!CSX#DJ!frCcO#Ri2NK;zn^GVo?G!kJqmJ-}BkMMXGvDC%1yVwXtNb){D z2aXWq2}hnUvO z`c-f*xxWGcvCRhzH;X-8YYFhb-k%vD=WZ5qtuqc06j#B|w zRNbdBUg_*wJ|tajKIlF%x|E7Jge1#8rV7m8l&gh_Tj?@oYcPwKGzrA%Lz2&uIt~!W zDG3w3fdWPjZVt_Q@{?d%oYM>!=|E7a4i2?)GY!nuv%6@Wh7}wRi(aVTwach+ohbZ! z>kZxt!Jq#6OkahjfZ2ddE+dS-2R_Rg=_pf_WGp6OrK|Mk;~2>sZs`(ZkbPM8(X^_~ z)~TfRFN2yWfpC7It`3mNQZ9Pavh?b+C|VyE3(^)I#=l65u-+Np;LC^mMb3^-o-gvV zc_&;f+c06+%Ih0b*axrWg3P~#A^%~<2P1Ihj5}ir$K(Q4ckatgL zhyYOPf+x4`^w|pDgpb$iYcRi!oy?EUK9e>Q>*EO6=Uj6*+ShLZ;7Xy|5!J}T=iWj` z4`H;U{1yH`Ste4u-(S{T(bu`-xC36RG;|BD8i zC!4rM>2r);1W%Q-tK#@WI^n8WD^=%mtA#Mv!DwzNvTwWS?7om)>eJo$la~WQm~6y+ zpi#sBa!=eLT`K$bsnDy4Jz;W6C)}ib(Lg~&x+B>BN#$lGlo)O zacmsQ>?f(ZJ|;`079dan#>6B{Jq>q7=|GJ>E6=u!rXFzd7uuy$G}5 z1SSifpdWW5waj)zBm9rOC|^G^L)=!I00}n1f(SHu49i;x)}x!(5H|_c+f;g!D^DNt z#Y929H=w)V`@8m`xqN2G%goEQf>oc@0es4s?~L1xn{Wk zbJlgBF@8HW+JIOcGrSXhVnXvwOqly?*e^Sapic=!gZ3*`z$xV;0@aH;)AUx#U4(P; z1(QSI)ZVpi8RDxfy~CX7yxb;6w%G-9O0Fu#j8?*(CEJsYFXvpQK((h@o~qLD#xRQ zl1i+C$NaTsey9_dt0h)Oh3t(@!w>2eIjb9XDJw=ns%#xB0#$+2rv`9%TRBSDCfkS; z{@$yB_qN4~n>xI@NUg9*M+yU%f}uRXc}>m_ApZO9f}!E?HCNtFHD3#b&I~ZFr~^WQ=v!z_58Ft&J1DHl|FqQ&N008+3meT? z1ig@vnjOA2)hZLcC-$d-G+y!|q2>(qv!*WDTlSOnzj>^(#6Q+iC(a_$4(KnlE2?&_ zgeI+(bto;c3Ju|AaEoyV{NB;nVchWBL2b(%lkcIu_L#B$=Ree_Zg?FFfUwJ z5M2(t<_%VXYPKv9;VtrZ9rHHHJ!QQ!cStZ4gb(ws&6|+phdrFI-d+49#xi>EdWrCZ zqEK5$wMb;`k+tw^!$Zm!*WXQA+a?qneCw7U`D5|$Hq9jDaH66S2!K!Vo@Q%H{J*Or zSBK|3YbZrpwi-s%dxYQB3eEXK&{+$m0Wdp;TiPS0+_1GaY(YND*?MP{))3Vz8~sRu zrsX-KA<7TDh`~{HXdazkg)kEhC*bL}@Z_z9$2JK{mY?>B&GEQM5Pj4SVUv!pbgxe7z8M|J zko@=5kt8G}0-H}_j_@(e`pF)54f_fa#htEvB&S{d=BL#e27Cx#oMV9 z>6)qP;Qx*3~2ewj^+=fXP6#>@E z(KDBc?EK_wg{q&bf(bNnsn5gs-SfWQQ2oX`mq|H-gPj<(0$$gO^AKU6J91*Yia}t> zqZ^He>lMQ>X3+gdZ49db7$Ukr3_5Bw3dw&ccCdrHT`xJ>?W9up&q|)QqYF*fjD)3h z{qXu^#9-z!?L7!I-@`Mzm9DO-d^E($Ee65Rt^#zUzFTV~Np=Ny<(O6_x-2YXq4QCG zmM{1R5qOlF@V+3gXLv^HDG%CF%X>2fjGPy@+b%iP#_~}G?Tj1el2T-Of&5oJsfwg| zh;&G6p!Q4caDmkfq8WNf3};Nv2jCi_z`ai~e5wi{!@ba~+zNTA5B&8CO{}oSPm$pS z(y7U#D-u8xF9nvSr)u&B8Sej}y)A+NWYf_cg@HiUEpD9h{)RNY$6Eh}WUKlUIENq5 z204|uX%LJxz#ZLoFGV~RK|7OH9x@uQiF(Xlzt#j2E29A)PqJQS$>bn2hII{T9*!?Z zEU;GTH{N%?K{~UB6Y&s*g|e--rlpF}(59)c1)Q-w(E|oF&$c@7vp`~;_0&z<48$n< z2_P}5-Yrj>S^yu5vfy-R(WNQ&S8A`^tNCgTF~kjNWM3XV+d2rtMgRC7F<4vP%5Fg4 zQh2=a!u^XbRgA5rN~fj1b|J+$wVPRIbvG^?YYD@UN+doK&thlpC4$3r_Wi5U*VBfL zr)lqW{N@JSP}kfI%guoK})GBr7s zH0`C)1in`VJw8SnEKa}|3*q!~gcVBYvGG*ay(Q$FEsFx80!tq`Zq2|SD60b4QHy}B zv*7XapvOPHR0lc%r80lD;{2od@k8a0=--hHl}9{D0ki@;qVwKB>qc84Lz-Da`ldo7 zwP%gCiiR{;ZkYC0x~t#pvIrqD@|l0L@?U@+M@#{Nd9(VRZvk$cI9`?QeQOu1RHkV|~;I?xX-i8aw>SENVk2HxU63Q6q3a zao&Uqasrpxz?)7%C?0Jme#D`MxLwFpKi2s-HV^{iLVAQj#quB)f8+N+1<;y>$IL(z z+J8Z{7GA>|Cw~`dz7(9er3UyJ(l|SstapbHgo+;zX2jq?Vm6Z;CJOAr zwWDvoPvS%qMPLyjsmLXdN zc=x}SSG$x*A?#b1SsTrmSNDY>j9(U_HY)LyssMajZPoZuh~F30*Odyn7we^`BqO@O zNVdDDTDfAMhm;ui>hQa!|AXa*ih^CBJzq*eyUd1rMOV_$2B?)mi^ zL_pDV`6kd+WY;fC=}CoUeAG|M*>I-Os77EX&Xw6{G2HgGg6LTao|0XFO1Pvh%bWFjNgtXzc>dk*df-?$5m{zOCDc zplPoiZ{uCFRg1oE1UYE@VHmwdTh%&w6*l`JMpoc=fQ;t+jw#0->^*{Fuj}2r&CKd$OlnR2Ih3-hLg8rlb|1tcHbZ1% zJQjhKq^|^uP^m74_m}+tK)V{!W=ch$yvmGG^0%_We5aB-3@|q3L7**t^Z_d^H1%XR<<{c6c(~hAJiK?2h)$8ychnmQ zhI}kf5(q@p3iSf7Uf>Kehn1-d1R^26=CcAoup&|3D-dYX*za{us8RwDQz8n1V5oDY zIqAIm}90Yy0 zW;k%=l|f-EUFJ{5ZJEOd^&6!$?4L2i6vBaQydjt$$qQKqLI^ZE6yA1e3FY9v?4rIV zJMSi=rQ%TZgJhW|l&j&{zAHyAd&rZN5j@*%$GkJ~;Vuqvxy zb#R2$CAl2MD_Y2Z<0(W9zxL){6b}S=}qp5o!M#-g_nZHGA z^npdhkd{RYl$KSGOr^iQu@Mgux; zOtv1p|LM~L*;rCqqTRpbU^S%SQUJM8oA`M%LSGdwG)HBkpVIzmxB=O)*y!{zt8y7d zTK4Jzi*V2ySfiqJT*i_aa-8VrheUDEnij!ZxAaf~;+e5tq#6Zw$DRBm68mBkyRu3a z59t2#>{YN6aic5Xe_{-;{#F5WBN3}r$Iq^Qd_`9V6K+e~Z8N=f!D45_;iluEBaTl{ z=AH|T=Qf4Nf8>~qy(%${exgr!l@*2SAsN6GfMA&!DhO=+tHA$2_Q37y=VwCjcL4~k=AoWjeP!i!vCZHEV1paLv_BQyh$ z#a+aRa4G!$2(+c)i1T6-y^+r5i_4qHPZsTVKA}z$aE8n(HX)Wr@w2~dI-!ro5^(u0SJuW?BsHz#PvWAF{UM8*Nbmur3lO8TYiiZPJ@TSah!Nf>93)E-SUu%#{@z~3_RFn7s++Gr5RK6Tz z^&c9emJfbVgU{jYs!gJyGce6TrMA5Gh@k%Ww)9cFxea|*s1QotB-0~>8JttM$g+As zKFJZE?M%M=67tE^_+pE>3pyV|@y#0|jCS-+>-_*kYsI=AMlh{jWxHaAkL9Q9DL-cn zor(<^mZYA9(Y!KUcC$>d%!b)VQ(}CA?flQQS2|^lJM|JmI~~?mR;@qWrY)1xZ3kQg zUJG1&0uXK%o(?(=wdw`D4YJxA@047es9{#COk2PNH$}UvB54>a@aZZ7 znfkYwbb=20q8*;VSZQ&hK=xp9nJ)=B>F?PAA<0%vbn{9oi81Nh^q3E)(u-+Z953r&u7kygTGLBlf!Y4&&t!wS zUn%pZtKz%#t%ktmpSYcbl+1oo-wzDi)<5b9RZ zTW^+&iaU(4W$sDV{r;9(0Ed7h+X#*+Qkf@@L(O;Z7cw|w3_#_eys-L^Wc@30Ds}o6 z$8jUUm`)&R=eZJdoW^UDx?;qY-pWj!f>IMHhUD2RPY{XYzhFs7m%p)oGT>2d)oj3R zz)OVBwa2BC%y}q{8)Nor1Nb&l(H1WqywtcnB*b+Kr1FzTYb4oyZPtWGG7NgOi(r@% z!ZQ)GH3Xi2D_M=Qp5wBNKQfKt<#q8R(We}@-#0CK-`Vte3eN-ci=*h%`Td43G%?L2 zh(9&?`QN-y`v|$mSlT32sP;Oy7%~Lky%lz=9IRlE3cb=u zwxW;8gUwgmWy-=zbTKKo&_VrHU*u7aMK0}<|F$Om#skwWq=5whn)QVNqhfm(yNPQ^ zZ15WYt$@6f%f}cWI;Zu+N_bZT?HxeHMZv{~MGfeFFc5q+s|P?fj9HRycuFcc)I(0e zHm;S_jNWOfs90yf-#E?n|6La!UqPrxqV5+wR-e7c>|a2v9>Nt&OSex-Jn`6cU)=-k z+U5?CYhF#V%6~7u`lONTq2fNclQ5`X9+#eyhW}qE%Je|xXWJF|w<;$;?q0m!$k2@m zP~Le5jMByyFr7m|Tc*zVJb zcQv86)P}R46qLTPvQ@wxIeSS3qfF6rKGYFibv6HJSK`C&vxgx!QRV>(b$PX-bouJo ziVS_}FG8Hi&nKh=}4JxOGh z)se9WA_@4?5s3SKX?$mrhUosta+zc(V0f?xxRweGQH`vHw_AyAvqnT9_;X>fN_P$; z7ZUaI9nY+3vQhQ&fg)eMT2UeDbMhI)1K1#Lr1yfv3nlAFzv!~zx8frUT1s2yB$YlV z6b_;yGhM0TX|rWy=wtiX*wjX=1oB9DjXo;{$-&!2J^*tSMGP8vl~I7iZ^X7NUtOBD z%}WqTo&rVe3?f|Lwx(Ti_1f*sAG^I@3AtEi=LynK#=?eJb`H&$Tm^ruKP|l-cv?bD z@dRX8583qb$%&T~`u^$on-meFoWo;eA57`Um%>#~#%6wh!p(3A^$V`fVR&^FkAuh4 z<*-u^^Ge~nV~x8N3sOt95hx9;^SRCV_|>6i1SrA$6~H|J?XeU(`O$b{Z{$eztm&uo zl~`ya`z;jk5?$@`9}kvKuKfAhDFs_wHppA{B~gOK$o3LV*J$pxIdu98BpJ(bVGwMLuTX7>{6Osol0tX$z;@Sa^4F8A>NnFuF0 zSt?f@TunA_c)I2h0{Lg5A7}M@S&76h{`<_PvRi|z21|AYNtG^f_Np(_RI9Qe0s1c4FUXp(=tL-+uWDpNmJQINbe4cJ=+5E(#9b0N~}ih z6!B$^8k_$=!ZgIw%kkz>#gyY8$l7iP`j9&6_oD@2FS@{Ll>(^!C>Xp6?m)7+;OJ30 z=g{{1#bX{u&ouO@H<`!UhGJ0OkJTRKwVHlTH@>3(gsPWT2D|;uVv|IE%F}>P`YCXZ z15wi-o$pYuPBGV1PK^>CK}g#FO$ZGd1bQ+#{XflJcQl;cy4OaVAWHNaeY7x2bVm6I zhG=6jhAW}x~ukep!C zOpx`SmX>3)af-?|@T_}4B6EWSQqDZt>pDASoqBqOffKm#eTutT@qjd{0m@}&1pJj!Bzp(&fOf2zV zJTL@z69WsT$_Y9%mO}jJ1Sx|>cnv2!TAZ4>tz5JOuDc z$@atFEZ8B+ToCXg1*H5*FFfi$AOj*LdjtmHE&UHR)*@my1(Cq<=eY)E0)|9Z8vIpr zTI`?m)7^tRKG#o8KJhFthuVjsh!Ywj%~D(~yi2!<7?$Fm=506tH<`w z6t%l0qBu$Zq>f!|8?!y+SY7r5%IBlz^$W*JB|lmy$eo51&zDslr-z*w3!~;muSuW> z+e9*7tT$2HKT%19KF^T89$NY-$4Ta* zH(1i3icXKP;&zX?i8=aY+uQyY|LCg}^gG@5$whVSI*t_-m=K{w2^I*dCB7+<|96Qr zQ#(s8n^)*uzLVym+4cq0%$K{wA z>oO^7@{FsI`vRNa6w7P%{e`rk69dnUEM5yY&R#Usa0UcLMKDkQV%?Us6H*9S^L0x_ zo>BGTQtQt$uRY9YyX&6d_KuoQ3wFiS-%v?liAdE?Ad_F)hIFTn29Xd!JD%=sqiKa2 z{L-k3inJAuPT)m=e*580FUgGK+VgR{A{?BbGc!$BUKw3Bzu=DKQxJ{SQ)6z@$l&Fp z(pQylDqPekswybUJIt~*V=FOAaK9!~nK|M&c2=@*&}dkNtf5~{nh4Ag1s6Q4)U$bO zTlJ-94PPPBc)*(WxixZCoQlziGF@6c0?(A0=o!fyOe5N&TP0<(rR|}4SIp&b{!qs2 z2TOIv>M5B#EED(WfgFZ-eQEY|$osWu?C?nIlTpF_tX1Oqmpo34m>%PK8K!YL0hf#j zu99Hk7RD#%=74b<*Xq!yhdCY5VR2WBD?ZH`ArWPSoN&Cqa!#9GFxQ*-k2(Ac))WoW zhVkuWutPbM<~Mq;&(mnMOX6!vGXPuW?pZh+8;@GN+B}0$t-qU+HqWRWa_d{i9M*Lw z6LIjtk?$0;a8>RbZ|?p8zdef**Y(J>4b63Ni3u`mdFY5SP1AXamvk<5QG85E#SpUT zd^rEG4Bey9OqrQ@NI9jWP%fM@?6b^8{o6!U49MkT={prmedpP1Au-jKcSGrQE|Z}m z=j|Bt=Ie$w3XH{7JOSF>@#`enLC}pqVbBJTk*{jEhlDxi1O$oe^pwIMJl+jqv+rlk z#PK`*6-V~rSfNwG&E)H&<2RP+L(cZGoq3IxEL-FCOsc*b5Q&I;Tlh2ZeQgo5Z}M|~ zn@U%PzBHC*b(Zd4DQ^4O#s^yE`>xnT@=Lp2AONUgiwOei zy`5=JOJ7tu$G1UAPrd53I8#vNbXRx5Xn=6XXVXb&2|Z|RYTVz}%z6b)_fw}z^><`M ziHdy}1#6*13FOmIul)+0XZI65y(Vkf_BK%H*E5DWp}aezBp^TD3X`d{%ng;ij9Dou zlOYqUd5&3R!SB#|B-)dT&eC73SV*mjTv@0)6k! z0~3S9_4jHj@SpSOsMV8En}9ushO!N2c`ek~AIjhAP7wo9WiIy+-$$evW7!^?O|m){ zy9Y1+y_d6ha3w+@fSTTMV>l%oO06>I>D9dVGK%T8!q};0Fn4#D4!&jRt}D)HveIFajh z>FH_5E{W7h@y>+e+dc_R6_Dy9`CC2)W0saF806ax3YkxYm_rIvQ$#(X+jRSa%+2am zLQdz9hEfPmujfMP`;{zxa3sWYWX^R5D6d{sGy9Q$Y0^&(tt#H_=Vn&66Go_jsED0a$HL>?NQ9&> zo3Zcy>TH%Ia`AX}WGGW0oEapd-YL9y=dwzFLw1WeA9$nq^Z9VB*MV zrdjPE^^QzN?Sn;5#`)=la3YZ!*mo;H|nbW|q6C;tEZ=3ED>j zO9tc`J}KcsO7xENvFAkhB5n7s@YcG~B;a%t!n?RSO-r+Rb%ekJCjH!{4WeREa0iMi zxlM0uC#$|X@x!8%sCF7@=PPiemaK=wP#l|76kAFTzXOi-cVS?w+i{V~St{r4% z!!CWWa`HY|A!0ei)N&M`pMm!t<;b3Pd%xmuw49T1ksgEaKc#M^KfxOXUF!^lUTV*O zQ?XA*)j6vt41vJR9zA|UoN&i*=w8v&FDxpj>=Ry z?Raz|NprjyA+g%Y=msEqhGnF?$h7o?mxEO1*-aRC;w?qWH+AIW;ovBUa^B>D15Hyx zfB&r@4<4x^*x?jdLT*t#)v-$pt?VN% z>uCl?N}|r`zf_5^aqF&%!=}p>Ok>UKBGOlq#EnBC^%(u7EW3Sd4MPa2pPRbO5bD3# z3r-hCMWwG#nw6tw%Sg#WFP3m&&DMu3)dyjbu*y|?ja%Hq_faxO8`51D>cBs`1>LKL z;CBJJdl||Q41pKBQW)t6AQh}MLFKY_19KZ!z|QJ8)FE1a%x%VepwpMwNancx_HqY? z>Ah~&AjRxwKy#xjI}VsY!A0VZv1Wvy?V^3GD@nmdnYdVK_{>ztziV#u_-}DzHD)@6~PJFiWBZ5Nl`^I*KGITuo&YMka!?Y{QorxY>9RWxrW zj_3o9NHhg>GX Date: Tue, 10 Dec 2019 16:37:44 -0800 Subject: [PATCH 21/28] use-cases: improve usage section (adding API section) and move diagram lower For PR #818 but also related to #463 --- static/docs/use-cases/data-registry.md | 57 +++++++++++++++++++------- 1 file changed, 42 insertions(+), 15 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index a00fcc50be..9b0663d49e 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -8,10 +8,6 @@ with commands such as `dvc add`. With the aim to enable reusability of these can depend on data from an external DVC project, **similar to package management systems, but for data**. - - -![](/static/img/data-registry.png) - Keeping this in mind, we could build a DVC project dedicated to tracking and versioning _datasets_ (or any large data). This way we would have a repository with all the metadata and history of changes of different datasets. @@ -19,6 +15,8 @@ We could see who updated what, and when, and use pull requests to update data (the same way we do with code). This is what we call a **data registry**, which can work as data management _middleware_ between ML projects and cloud storage. +![](/static/img/data-registry.png) + > Note that a single dedicated repository is just one possible pattern to create > data registries with DVC. @@ -128,10 +126,26 @@ $ tree --filelimit=100 ## Using The main methods to consume data artifacts from a **data registry** -are the `dvc import` and `dvc update` commands. +are the `dvc import` and `dvc get` commands, as well as the `dvc.api.open()` +function. + +### Simple download (get) + +To download a dataset versioned in a DVC repository online, we can +run something like: + +```dvc +$ dvc get git@git-server.url:path/to/repository.git \ + path/to/dataset +``` + +This downloads `path/to/dataset` from the project's +[default remote](/doc/command-reference/remote/default) and places it in the +current working directory (anywhere in the file system with user write access). + +### Import workflow -To import a dataset versioned in a repository online, we can run -something like: +`dvc import` uses the same syntax as `dvc get`: ```dvc $ dvc import git@git-server.url:path/to/repository.git \ @@ -141,22 +155,35 @@ $ dvc import git@git-server.url:path/to/repository.git \ > Note that unlike `dvc get`, which can be used from any directory, `dvc import` > needs to run within an [initialized](/doc/command-reference/init) DVC project. -Importing saves the dependency of the local project towards the -data source (registry repository). This is achieved by creating a particular +Besides downloading, importing saves the dependency of the local project towards +the data source (registry repository). This is achieved by creating a particular kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import stage_). This file can be used staged and committed with Git. > For a sample DVC-file resulting from `dvc import`, refer to > [this example](/doc/command-reference/import#example-data-registry). -Given this saved dependency, whenever the the dataset changes in the source -project (data registry), we can easily bring it up to date in our consumer -project with: +As an addition to the import workflow, and enabled the saved dependency, we can +easily bring it up to date in our consumer project with `dvc update` whenever +the the dataset changes in the source project (data registry): ```dvc $ dvc update dataset.dvc ``` -`dvc update` downloads new and changed files or removed deleted ones from -`path/to/dataset` based on the latest version of the source project, and updates -the project dependency metadata in the import stage (DVC-file). +`dvc update` downloads new and changed files, or removes deleted ones, from +`path/to/dataset` based on the latest version of the source project. It also +updates the project dependency metadata in the import stage (DVC-file). + +### Programatic reusability of DVC data + +Our Python API, included with the `dvc` package installed with DVC, includes the +`open` function to load/stream data directly from remote DVC projects: + +```python +import dvc.api.open + +dvc.api.open('path/to/dataset', 'git@git-server.url:path/to/repository.git') +``` + +This opens `path/to/dataset` as a file descriptor. From 485fc49c9f15a3af5aa435454020faf588d0805f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 16:42:00 -0800 Subject: [PATCH 22/28] use-cases: add note about deployment via dvc.api.open to data registry case --- static/docs/use-cases/data-registry.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 9b0663d49e..063929a1f9 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -186,4 +186,5 @@ import dvc.api.open dvc.api.open('path/to/dataset', 'git@git-server.url:path/to/repository.git') ``` -This opens `path/to/dataset` as a file descriptor. +This opens `path/to/dataset` as a file descriptor. Such a method could be used +as a code-internal **deployment** method for ML models, for example. From 6ccc49f299edee886424c6014bc7ff2a72359dad Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 10 Dec 2019 17:58:18 -0800 Subject: [PATCH 23/28] use-cases: Some updates per private discussion with Ivan --- static/docs/use-cases/data-registry.md | 101 +++++++++++++------------ 1 file changed, 51 insertions(+), 50 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 063929a1f9..b5fc9de6f2 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -1,4 +1,4 @@ -# Data Registry +# Data Registries One of the main uses of DVC repositories is the [versioning of data and model files](/doc/use-cases/data-and-model-files-versioning), @@ -9,11 +9,12 @@ can depend on data from an external DVC project, **similar to package management systems, but for data**. Keeping this in mind, we could build a DVC project dedicated to -tracking and versioning _datasets_ (or any large data). This way we would have a -repository with all the metadata and history of changes of different datasets. -We could see who updated what, and when, and use pull requests to update data -(the same way we do with code). This is what we call a **data registry**, which -can work as data management _middleware_ between ML projects and cloud storage. +tracking and versioning _datasets_ (or any large data, even ML models). This way +we would have a repository with all the metadata and history of changes of +different datasets. We could see who updated what, and when, and use pull +requests to update data (the same way we do with code). This is what we call a +**data registry**, which can work as data management _middleware_ between ML +projects and cloud storage. ![](/static/img/data-registry.png) @@ -42,7 +43,7 @@ Advantages of using a DVC **data registry** project: HTTP location). Git versioning of [DVC-files](/doc/user-guide/dvc-file-format) allows us to track and audit data changes. -## Building +## Building registries Data registries can be created like any other DVC repositories with `git init` and `dvc init`. A good way to organize them is with different @@ -81,49 +82,7 @@ $ git commit -m "Track 1.8 GB 10,000 song dataset in music/" > [pushed](/doc/command-reference/push) to one or more > [remote storage](/doc/command-reference/remote) locations. -## Updating - -Datasets change, and DVC is prepared to handle it. Just add/remove or change the -contents of the data registry, and apply the updates by running `dvc add` again: - -```dvc -$ cp /path/to/1000/image/dir music/songs -$ dvc add music/songs -... -``` - -DVC then modifies the corresponding DVC-file to reflect the changes in the data, -and this will be noticed by Git: - -```dvc -$ git status -Changes not staged for commit: -... - modified: music/songs.dvc -``` - -Iterating on this process for several datasets can give shape to a robust -registry, which are basically repositories that mainly version a bunch of -DVC-files, as you can see in the hypothetical example below. - -```dvc -$ tree --filelimit=100 -. -├── images -│ ├── .gitignore -│ ├── cats-dogs [2800 entries] # Listed in .gitignore -│ ├── faces [10000 entries] # Listed in .gitignore -│ ├── cats-dogs.dvc -│ └── faces.dvc -├── music -│ ├── .gitignore -│ ├── songs [11000 entries] # Listed in .gitignore -│ └── songs.dvc -├── text -... -``` - -## Using +## Using registries The main methods to consume data artifacts from a **data registry** are the `dvc import` and `dvc get` commands, as well as the `dvc.api.open()` @@ -188,3 +147,45 @@ dvc.api.open('path/to/dataset', 'git@git-server.url:path/to/repository.git') This opens `path/to/dataset` as a file descriptor. Such a method could be used as a code-internal **deployment** method for ML models, for example. + +## Updating registries + +Datasets change, and DVC is prepared to handle it. Just add/remove or change the +contents of the data registry, and apply the updates by running `dvc add` again: + +```dvc +$ cp /path/to/1000/image/dir music/songs +$ dvc add music/songs +... +``` + +DVC then modifies the corresponding DVC-file to reflect the changes in the data, +and this will be noticed by Git: + +```dvc +$ git status +Changes not staged for commit: +... + modified: music/songs.dvc +``` + +Iterating on this process for several datasets can give shape to a robust +registry, which are basically repositories that mainly version a bunch of +DVC-files, as you can see in the hypothetical example below. + +```dvc +$ tree --filelimit=100 +. +├── images +│ ├── .gitignore +│ ├── cats-dogs [2800 entries] # Listed in .gitignore +│ ├── faces [10000 entries] # Listed in .gitignore +│ ├── cats-dogs.dvc +│ └── faces.dvc +├── music +│ ├── .gitignore +│ ├── songs [11000 entries] # Listed in .gitignore +│ └── songs.dvc +├── text +... +``` From de65290bdb6afbeac0cee561672f12166f1f6e20 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 17:41:29 -0800 Subject: [PATCH 24/28] use-cases: more feedback per private chat with Ivan --- static/docs/use-cases/data-registry.md | 31 +++++++++++++++----------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index b5fc9de6f2..5b938e055c 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -8,6 +8,8 @@ with commands such as `dvc add`. With the aim to enable reusability of these can depend on data from an external DVC project, **similar to package management systems, but for data**. +![](/static/img/data-registry.png) _Codify datasets and models with DVC_ + Keeping this in mind, we could build a DVC project dedicated to tracking and versioning _datasets_ (or any large data, even ML models). This way we would have a repository with all the metadata and history of changes of @@ -16,8 +18,6 @@ requests to update data (the same way we do with code). This is what we call a **data registry**, which can work as data management _middleware_ between ML projects and cloud storage. -![](/static/img/data-registry.png) - > Note that a single dedicated repository is just one possible pattern to create > data registries with DVC. @@ -61,7 +61,6 @@ track it, with `dvc add`. For example: $ mkdir -p music/Beatles $ cp ~/Downloads/millionsongsubset_full music/songs $ dvc add music/songs -100% Add 1/1 [... ``` > This example dataset actually exists. See @@ -86,12 +85,14 @@ $ git commit -m "Track 1.8 GB 10,000 song dataset in music/" The main methods to consume data artifacts from a **data registry** are the `dvc import` and `dvc get` commands, as well as the `dvc.api.open()` -function. +function (Python). ### Simple download (get) -To download a dataset versioned in a DVC repository online, we can -run something like: +This is analogous to using direct download tools like +[`wget`](https://www.gnu.org/software/wget/) (HTTP), +[`aws s3 cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) (S3), +etc. To get a dataset for example, we can run something like: ```dvc $ dvc get git@git-server.url:path/to/repository.git \ @@ -102,6 +103,9 @@ This downloads `path/to/dataset` from the project's [default remote](/doc/command-reference/remote/default) and places it in the current working directory (anywhere in the file system with user write access). +> Note that this command (as well as `dvc import`) has a `--revision` option to +> download specific versions of the data. + ### Import workflow `dvc import` uses the same syntax as `dvc get`: @@ -119,9 +123,6 @@ the data source (registry repository). This is achieved by creating a particular kind of [DVC-file](/doc/user-guide/dvc-file-format) (a.k.a. _import stage_). This file can be used staged and committed with Git. -> For a sample DVC-file resulting from `dvc import`, refer to -> [this example](/doc/command-reference/import#example-data-registry). - As an addition to the import workflow, and enabled the saved dependency, we can easily bring it up to date in our consumer project with `dvc update` whenever the the dataset changes in the source project (data registry): @@ -142,7 +143,12 @@ Our Python API, included with the `dvc` package installed with DVC, includes the ```python import dvc.api.open -dvc.api.open('path/to/dataset', 'git@git-server.url:path/to/repository.git') +data_path = 'path/to/dataset' +repo_url = 'git@git-server.url:path/to/repository.git' + +with dvc.api.open(data_path, repo_url) as dataset: + # process the data + # ... ``` This opens `path/to/dataset` as a file descriptor. Such a method could be used @@ -150,13 +156,12 @@ as a code-internal **deployment** method for ML models, for example. ## Updating registries -Datasets change, and DVC is prepared to handle it. Just add/remove or change the -contents of the data registry, and apply the updates by running `dvc add` again: +Datasets evolve, and DVC is prepared to handle it. Just change the data in the +registry, and apply the updates by running `dvc add` again: ```dvc $ cp /path/to/1000/image/dir music/songs $ dvc add music/songs -... ``` DVC then modifies the corresponding DVC-file to reflect the changes in the data, From 53ea7c6ebabcf151aec466dd48077f2960b5d098 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 11 Dec 2019 17:45:09 -0800 Subject: [PATCH 25/28] use-cases: updated img subscript for data registry --- static/docs/use-cases/data-registry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md index 5b938e055c..5533be1cd8 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registry.md @@ -8,7 +8,7 @@ with commands such as `dvc add`. With the aim to enable reusability of these can depend on data from an external DVC project, **similar to package management systems, but for data**. -![](/static/img/data-registry.png) _Codify datasets and models with DVC_ +![](/static/img/data-registry.png) _Data and models as code_ Keeping this in mind, we could build a DVC project dedicated to tracking and versioning _datasets_ (or any large data, even ML models). This way From 7887ca2d2cb293a21d10f25b1e40c35a16f186b5 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 12 Dec 2019 16:26:57 -0800 Subject: [PATCH 26/28] use-cases: address Alex' feedback on data registry 2nd iteration per https://github.com/iterative/dvc.org/pull/818#issuecomment-565083934 --- src/Documentation/sidebar.json | 2 +- .../{data-registry.md => data-registries.md} | 19 +++++++++---------- 2 files changed, 10 insertions(+), 11 deletions(-) rename static/docs/use-cases/{data-registry.md => data-registries.md} (94%) diff --git a/src/Documentation/sidebar.json b/src/Documentation/sidebar.json index 9dd847c917..c92333fbb3 100644 --- a/src/Documentation/sidebar.json +++ b/src/Documentation/sidebar.json @@ -109,7 +109,7 @@ "slug": "sharing-data-and-model-files" }, "shared-development-server", - "data-registry" + "data-registries" ] }, { diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registries.md similarity index 94% rename from static/docs/use-cases/data-registry.md rename to static/docs/use-cases/data-registries.md index 5533be1cd8..7d7ddef40e 100644 --- a/static/docs/use-cases/data-registry.md +++ b/static/docs/use-cases/data-registries.md @@ -4,9 +4,9 @@ One of the main uses of DVC repositories is the [versioning of data and model files](/doc/use-cases/data-and-model-files-versioning), with commands such as `dvc add`. With the aim to enable reusability of these data artifacts between different projects, DVC also provides the -`dvc get`, `dvc import`, and `dvc update` commands. This means that a project -can depend on data from an external DVC project, **similar to -package management systems, but for data**. +`dvc import` and `dvc get` commands, among others. This means that a project can +depend on data from an external DVC project, **similar to package +management systems, but for data science projects**. ![](/static/img/data-registry.png) _Data and models as code_ @@ -45,7 +45,7 @@ Advantages of using a DVC **data registry** project: ## Building registries -Data registries can be created like any other DVC repositories with +Data registries can be created like any other DVC repository with `git init` and `dvc init`. A good way to organize them is with different directories, to group the data into separate uses, such as `images/`, `natural-language/`, etc. For example, our @@ -77,7 +77,7 @@ $ git add music/songs.dvc music/.gitignore $ git commit -m "Track 1.8 GB 10,000 song dataset in music/" ``` -> The actual data is stored in the project's cache and can be +> The actual data is stored in the project's cache and should be > [pushed](/doc/command-reference/push) to one or more > [remote storage](/doc/command-reference/remote) locations. @@ -103,7 +103,7 @@ This downloads `path/to/dataset` from the project's [default remote](/doc/command-reference/remote/default) and places it in the current working directory (anywhere in the file system with user write access). -> Note that this command (as well as `dvc import`) has a `--revision` option to +> Note that this command (as well as `dvc import`) has a `--rev` option to > download specific versions of the data. ### Import workflow @@ -143,12 +143,11 @@ Our Python API, included with the `dvc` package installed with DVC, includes the ```python import dvc.api.open -data_path = 'path/to/dataset' +model_path = 'path/to/model' repo_url = 'git@git-server.url:path/to/repository.git' -with dvc.api.open(data_path, repo_url) as dataset: - # process the data - # ... +with dvc.api.open(model_path, repo_url) as model: + # Make some predictions... ``` This opens `path/to/dataset` as a file descriptor. Such a method could be used From 175b75ac3aa32ba7c2b729bcd379426c55898296 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 16 Dec 2019 11:00:15 -0800 Subject: [PATCH 27/28] use-cases: addressing more feedback from Ivan private as well as in https://github.com/iterative/dvc.org/pull/818#pullrequestreview-331611829 and below --- static/docs/use-cases/data-registries.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/static/docs/use-cases/data-registries.md b/static/docs/use-cases/data-registries.md index 7d7ddef40e..7ac3af9591 100644 --- a/static/docs/use-cases/data-registries.md +++ b/static/docs/use-cases/data-registries.md @@ -84,8 +84,8 @@ $ git commit -m "Track 1.8 GB 10,000 song dataset in music/" ## Using registries The main methods to consume data artifacts from a **data registry** -are the `dvc import` and `dvc get` commands, as well as the `dvc.api.open()` -function (Python). +are the `dvc import` and `dvc get` commands, as well as the `dvc.api` Python +API. ### Simple download (get) @@ -95,8 +95,8 @@ This is analogous to using direct download tools like etc. To get a dataset for example, we can run something like: ```dvc -$ dvc get git@git-server.url:path/to/repository.git \ - path/to/dataset +$ dvc get https://github.com/example/registry \ + music/songs/ ``` This downloads `path/to/dataset` from the project's @@ -111,8 +111,8 @@ current working directory (anywhere in the file system with user write access). `dvc import` uses the same syntax as `dvc get`: ```dvc -$ dvc import git@git-server.url:path/to/repository.git \ - path/to/dataset +$ dvc import https://github.com/example/registry \ + images/faces/ ``` > Note that unlike `dvc get`, which can be used from any directory, `dvc import` @@ -143,11 +143,11 @@ Our Python API, included with the `dvc` package installed with DVC, includes the ```python import dvc.api.open -model_path = 'path/to/model' -repo_url = 'git@git-server.url:path/to/repository.git' +model_path = 'model.pkl' +repo_url = 'https://github.com/example/registry' -with dvc.api.open(model_path, repo_url) as model: - # Make some predictions... +with dvc.api.open(model_path, repo_url) as fd: + # Consume model file descriptor... ``` This opens `path/to/dataset` as a file descriptor. Such a method could be used From 7a395f8ff3b6aa48516c6ff5cd91d384f988c5cc Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 16 Dec 2019 11:18:38 -0800 Subject: [PATCH 28/28] use-cases: address Alex's feedback from https://github.com/iterative/dvc.org/pull/818#issuecomment-565407339 --- static/docs/use-cases/data-registries.md | 33 +++++++++++++++++------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/static/docs/use-cases/data-registries.md b/static/docs/use-cases/data-registries.md index 7ac3af9591..a061a18e9b 100644 --- a/static/docs/use-cases/data-registries.md +++ b/static/docs/use-cases/data-registries.md @@ -77,9 +77,15 @@ $ git add music/songs.dvc music/.gitignore $ git commit -m "Track 1.8 GB 10,000 song dataset in music/" ``` -> The actual data is stored in the project's cache and should be -> [pushed](/doc/command-reference/push) to one or more -> [remote storage](/doc/command-reference/remote) locations. +The actual data is stored in the project's cache and should be +[pushed](/doc/command-reference/push) to one or more +[remote storage](/doc/command-reference/remote) locations, so the registry can +be accessed from other locations or by other people: + +``` +$ dvc remote add myremote s3://bucket/path +$ dvc push +``` ## Using registries @@ -99,7 +105,7 @@ $ dvc get https://github.com/example/registry \ music/songs/ ``` -This downloads `path/to/dataset` from the project's +This downloads `music/songs/` from the project's [default remote](/doc/command-reference/remote/default) and places it in the current working directory (anywhere in the file system with user write access). @@ -132,13 +138,13 @@ $ dvc update dataset.dvc ``` `dvc update` downloads new and changed files, or removes deleted ones, from -`path/to/dataset` based on the latest version of the source project. It also +`images/faces/`, based on the latest version of the source project. It also updates the project dependency metadata in the import stage (DVC-file). ### Programatic reusability of DVC data Our Python API, included with the `dvc` package installed with DVC, includes the -`open` function to load/stream data directly from remote DVC projects: +`open` function to load/stream data directly from external DVC projects: ```python import dvc.api.open @@ -147,11 +153,12 @@ model_path = 'model.pkl' repo_url = 'https://github.com/example/registry' with dvc.api.open(model_path, repo_url) as fd: - # Consume model file descriptor... + model = pickle.load(fd) + # ... Use the model! ``` -This opens `path/to/dataset` as a file descriptor. Such a method could be used -as a code-internal **deployment** method for ML models, for example. +This opens `model.pkl` as a file descriptor. The example above tries to +illustrate a hardcoded ML model **deployment** method. ## Updating registries @@ -171,6 +178,7 @@ $ git status Changes not staged for commit: ... modified: music/songs.dvc +$ git commit -am "Add 1,000 more songs to music/ dataset." ``` Iterating on this process for several datasets can give shape to a robust @@ -193,3 +201,10 @@ $ tree --filelimit=100 ├── text ... ``` + +And let's not forget to `dvc push` data changes to the +[remote storage](/doc/command-reference/remote), so others can obtain them! + +``` +$ dvc push +```