From 7accfe8cfe8a82ab86a2a40a326e9fcca381d04d Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Tue, 6 Dec 2022 19:19:57 +0900 Subject: [PATCH 1/7] ref: document cloud versioned remotes --- .../docs/command-reference/remote/modify.md | 41 ++++++++++ content/docs/sidebar.json | 3 +- .../data-management/cloud-versioning.md | 74 +++++++++++++++++++ 3 files changed, 117 insertions(+), 1 deletion(-) create mode 100644 content/docs/user-guide/data-management/cloud-versioning.md diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 6c4f081a32..5579c0c5d5 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -346,6 +346,19 @@ $ dvc push For more on the supported env vars, please see the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables) +- `version_aware` - Use + [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) + cloud versioning features for this S3 remote. This requires that + [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) + be enabled on the specified S3 bucket. + +- `worktree` - Use + [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) + cloud versioning features for this S3 remote. This requires that + [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) + be enabled on the specified S3 bucket. When both `version_aware` and + `worktree` are set, `worktree` takes precedence. +
@@ -548,6 +561,19 @@ can propagate from an Azure configuration file (typically managed with `container_name`. The default directory where it will be searched for is `~/.azure` but this can be customized with the `AZURE_CONFIG_DIR` env var. +- `version_aware` - Use + [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) + cloud versioning features for this Azure remote. This requires that + [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) + be enabled on the specified Azure storage account and container. + +- `worktree` - Use + [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) + cloud versioning features for this Azure remote. This requires that + [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) + be enabled on the specified Azure storage account and container. When both + `version_aware` and `worktree` are set, `worktree` takes precedence. +
@@ -722,6 +748,21 @@ set: $ export GOOGLE_APPLICATION_CREDENTIALS='.../project-XXX.json' ``` +- `version_aware` - Use + [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) + cloud versioning features for this Google Cloud Storage remote. This requires + that + [Object versioning](https://cloud.google.com/storage/docs/object-versioning) + be enabled on the specified bucket. + +- `worktree` - Use + [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) + cloud versioning features for this Google Cloud Storage remote. This requires + that + [Object versioning](https://cloud.google.com/storage/docs/object-versioning) + be enabled on the specified bucket. When both `version_aware` and `worktree` + are set, `worktree` takes precedence. +
diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index f8c9b18c2e..e4aab73d5c 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -126,7 +126,8 @@ "children": [ "large-dataset-optimization", "importing-external-data", - "managing-external-data" + "managing-external-data", + "cloud-versioning" ] }, { diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md new file mode 100644 index 0000000000..f2531c27dc --- /dev/null +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -0,0 +1,74 @@ +# Cloud Versioning + +`dvc remote` storage normally uses +[content-addressible storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) +to organize versioned data. Different versions of files are stored in the remote +according to hash of their data content instead of according to their original +filenames and directory location. This allows DVC to optimize certain remote +storage lookup and data sync operations, and provides data de-duplication at the +file level. However, this comes with the drawback of losing human-readable +filenames without the use of the DVC CLI (`dvc get --show-url`) or API +(`dvc.api.get_url()`). + +DVC supports the use of cloud object versioning for cases where users prefer to +retain their original filenames and directory hierarchy in remote storage, in +exchange for losing the de-duplication and performance benefits of +content-addressible storage. When cloud versioning is enabled, DVC will store +files in the remote according to their original directory location and +filenames. Different versions of a file will then be stored as separate versions +of the corresponding object in cloud storage. + +⚠️ Note that not all DVC functionality is supported when using cloud versioned +remotes. + +## Supported storage providers + +Cloud versioning features are only avaible for certain storage providers. +Currently, it is supported on the following `dvc remote` types: + +- Amazon S3 (requires + [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) + enabled buckets) +- Microsoft Azure Blob Storage (requires + [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) + enabled storage accounts and containers) +- Google Cloud Storage (requires + [Object versioning](https://cloud.google.com/storage/docs/object-versioning) + enabled buckets) + +## Version-aware remotes + +When the `version_aware` option is enabled on a `dvc remote`: + +- `dvc push` will utilize cloud versioning when storing data in the remote. Data + will retain its original directory structure and filenames, and each version + of a file tracked by DVC will be stored as a new version of the corresponding + object in cloud storage. +- `dvc fetch` and `dvc pull` will download the corresponding version of an + object from cloud storage. + +⚠️ Note that when `version_aware` is in use, DVC does not set DELETE flags on +objects in cloud storage, and does not make any attempt to ensure that the +latest version of an object in cloud storage matches the latest version of a +file in your DVC repository. + +## Worktree remotes + +When the `worktree` option is enabled on a `dvc remote`: + +- `dvc push` will utilize cloud versioning and ensure that the "latest" version + of the remote storage is a mirror of your current local DVC repository + workspace. Data in cloud storage will retain its original directory structure + and filenames, and each version of a file tracked by DVC will be stored as a + new version of the corresponding object in cloud storage. Additionally, DVC + will set the DELETE flag on any objects which were present in cloud storage + but that do not exist in your current DVC repository workspace. +- `dvc fetch` and `dvc pull` will download the corresponding version of an + object from cloud storage. +- `dvc update` can be used to update a DVC-tracked file or directory in your + current workspace to match the latest version of the corresponding object(s) + from cloud storage. + +⚠️ Note that setting DELETE flags does not delete any object versions (and does +not delete any data) from cloud storage, it only means that the "latest" version +of a given object will show that the object does not exist. From e695ade5fe8427820396e44c1a7ed800a8fe68e4 Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Tue, 6 Dec 2022 19:29:22 +0900 Subject: [PATCH 2/7] ref: document worktree update --- content/docs/command-reference/update.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md index 2ca0803c54..a335822a93 100644 --- a/content/docs/command-reference/update.md +++ b/content/docs/command-reference/update.md @@ -2,7 +2,9 @@ Update files or directories imported from external DVC repositories or [URLs](/doc/command-reference/import-url#description), and the corresponding -import `.dvc` files. +import `.dvc` files, or update files or directories from a +[worktree](/doc/user-guide/data-management/cloud-versioning#worktree-remotes) +remote. ## Synopsis @@ -38,6 +40,19 @@ to update an imported artifact to a different revision. $ dvc update --rev master ``` +### Worktree update + +When using a +[worktree](/doc/user-guide/data-management/cloud-versioning#worktree-remotes) +remote, `dvc update` will update the specified target to match the latest +version of the corresponding file or directory from the remote storage. If the +"latest" version of the specified target is a deleted file or an empty +directory, `dvc update` will fail (in order to avoid potential accidental local +data loss). + +⚠️ Note that the `--rev`, `--no-download` and `--to-remote` flags are not +compatible when updating from a worktree remote. + ## Options - `--rev ` - commit hash, branch or tag name, etc. (any From 72f7d11a01d5ae0c26ede51deff9f41f9661fd2e Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Mon, 26 Dec 2022 17:49:17 +0900 Subject: [PATCH 3/7] review updates --- .../docs/command-reference/remote/modify.md | 39 +++++++-- content/docs/command-reference/update.md | 15 ++-- .../data-management/cloud-versioning.md | 82 ++++++++++++------- 3 files changed, 92 insertions(+), 44 deletions(-) diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 5579c0c5d5..19976106f7 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -348,13 +348,20 @@ For more on the supported env vars, please see the - `version_aware` - Use [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) - cloud versioning features for this S3 remote. This requires that + cloud versioning features for this S3 remote. Files stored in the remote will + retain their original filenames and directory hierarchy, and different + versions of files will be stored as separate versions of the corresponding + object in the remote. This requires that [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) be enabled on the specified S3 bucket. - - `worktree` - Use [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) - cloud versioning features for this S3 remote. This requires that + cloud versioning features for this S3 remote. Files stored in the remote will + retain their original filenames and directory hierarchy, and different + versions of files will be stored as separate versions of the corresponding + object in cloud storage. DVC will also attempt to ensure that the current + version of objects in the remote match the latest version of files in the DVC + repository. This requires that [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) be enabled on the specified S3 bucket. When both `version_aware` and `worktree` are set, `worktree` takes precedence. @@ -563,13 +570,21 @@ can propagate from an Azure configuration file (typically managed with - `version_aware` - Use [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) - cloud versioning features for this Azure remote. This requires that + cloud versioning features for this Azure remote. Files stored in the remote + will retain their original filenames and directory hierarchy, and different + versions of files will be stored as separate versions of the corresponding + object in the remote. This requires that [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) be enabled on the specified Azure storage account and container. - `worktree` - Use [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) - cloud versioning features for this Azure remote. This requires that + cloud versioning features for this Azure remote. Files stored in the remote + will retain their original filenames and directory hierarchy, and different + versions of files will be stored as separate versions of the corresponding + object in cloud storage. DVC will also attempt to ensure that the current + version of objects in the remote match the latest version of files in the DVC + repository. This requires that [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) be enabled on the specified Azure storage account and container. When both `version_aware` and `worktree` are set, `worktree` takes precedence. @@ -750,15 +765,21 @@ $ export GOOGLE_APPLICATION_CREDENTIALS='.../project-XXX.json' - `version_aware` - Use [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) - cloud versioning features for this Google Cloud Storage remote. This requires - that + cloud versioning features for this Google Cloud Storage remote. Files stored + in the remote will retain their original filenames and directory hierarchy, + and different versions of files will be stored as separate versions of the + corresponding object in the remote. This requires that [Object versioning](https://cloud.google.com/storage/docs/object-versioning) be enabled on the specified bucket. - `worktree` - Use [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) - cloud versioning features for this Google Cloud Storage remote. This requires - that + cloud versioning features for this Google Cloud Storage remote. Files stored + in the remote will retain their original filenames and directory hierarchy, + and different versions of files will be stored as separate versions of the + corresponding object in cloud storage. DVC will also attempt to ensure that + the current version of objects in the remote match the latest version of files + in the DVC repository. This requires that [Object versioning](https://cloud.google.com/storage/docs/object-versioning) be enabled on the specified bucket. When both `version_aware` and `worktree` are set, `worktree` takes precedence. diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md index a335822a93..0d1a36f610 100644 --- a/content/docs/command-reference/update.md +++ b/content/docs/command-reference/update.md @@ -44,15 +44,18 @@ $ dvc update --rev master When using a [worktree](/doc/user-guide/data-management/cloud-versioning#worktree-remotes) -remote, `dvc update` will update the specified target to match the latest +remote, `dvc update` will update the specified target to match the current version of the corresponding file or directory from the remote storage. If the -"latest" version of the specified target is a deleted file or an empty -directory, `dvc update` will fail (in order to avoid potential accidental local -data loss). +current version of the specified target is a deleted file or an empty directory, +`dvc update` will fail. -⚠️ Note that the `--rev`, `--no-download` and `--to-remote` flags are not + + +Note that the `--rev`, `--no-download` and `--to-remote` flags are not compatible when updating from a worktree remote. + + ## Options - `--rev ` - commit hash, branch or tag name, etc. (any @@ -66,7 +69,7 @@ compatible when updating from a worktree remote. For stages created with `dvc import-url` and a [cloud-versioned URL](/doc/command-reference/import-url#--version-aware), `--rev` can be used to specify a object version ID to use. By default, the - import will be updated to the latest version from cloud storage. + import will be updated to the current version from cloud storage. - `-R`, `--recursive` - determines the files to update by searching each target directory and its subdirectories for import `.dvc` files to inspect. If there diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index f2531c27dc..5c51623b77 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -1,5 +1,26 @@ # Cloud Versioning +When cloud versioning is enabled, DVC will store files in the remote according +to their original directory location and filenames. Different versions of a file +will then be stored as separate versions of the corresponding object in cloud +storage. This is useful for cases where users prefer to retain their original +filenames and directory hierarchy in remote storage (instead of using DVC's +usual +[content-addressible storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) +format). + + + +Note that not all DVC functionality is supported when using cloud versioned +remotes, and using cloud versioning comes with the tradeoff of losing certain +benefits of content-addressible storage. + + + +
+ +### Expand for more details on the differences between cloud versioned and content-addressible storage + `dvc remote` storage normally uses [content-addressible storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) to organize versioned data. Different versions of files are stored in the remote @@ -10,16 +31,10 @@ file level. However, this comes with the drawback of losing human-readable filenames without the use of the DVC CLI (`dvc get --show-url`) or API (`dvc.api.get_url()`). -DVC supports the use of cloud object versioning for cases where users prefer to -retain their original filenames and directory hierarchy in remote storage, in -exchange for losing the de-duplication and performance benefits of -content-addressible storage. When cloud versioning is enabled, DVC will store -files in the remote according to their original directory location and -filenames. Different versions of a file will then be stored as separate versions -of the corresponding object in cloud storage. +When using cloud versioning, DVC does not provide de-duplication, and certain +remote storage performance optimizations will be unavailable. -⚠️ Note that not all DVC functionality is supported when using cloud versioned -remotes. +
## Supported storage providers @@ -47,28 +62,37 @@ When the `version_aware` option is enabled on a `dvc remote`: - `dvc fetch` and `dvc pull` will download the corresponding version of an object from cloud storage. -⚠️ Note that when `version_aware` is in use, DVC does not set DELETE flags on -objects in cloud storage, and does not make any attempt to ensure that the -latest version of an object in cloud storage matches the latest version of a -file in your DVC repository. + + +Note that when `version_aware` is in use, DVC does not delete current versions +or restore noncurrent versions of objects in cloud storage. So the current +version of an object in cloud storage version of a file in your DVC repository. + + ## Worktree remotes -When the `worktree` option is enabled on a `dvc remote`: +`worktree` remotes behave similarly to `version_aware` remotes, but with one key +difference. For `worktree` remotes, DVC will also attempt to ensure that the +current version of objects in cloud storage match the latest versions of files +in your DVC repository. -- `dvc push` will utilize cloud versioning and ensure that the "latest" version - of the remote storage is a mirror of your current local DVC repository - workspace. Data in cloud storage will retain its original directory structure - and filenames, and each version of a file tracked by DVC will be stored as a - new version of the corresponding object in cloud storage. Additionally, DVC - will set the DELETE flag on any objects which were present in cloud storage - but that do not exist in your current DVC repository workspace. -- `dvc fetch` and `dvc pull` will download the corresponding version of an - object from cloud storage. -- `dvc update` can be used to update a DVC-tracked file or directory in your - current workspace to match the latest version of the corresponding object(s) - from cloud storage. +So in addition to the command behaviors described for `version_aware` remotes, +when the `worktree` option is enabled on a `dvc remote`: + +- `dvc push` will also ensure that the current version of objects in remote + storage match the latest versions of files in your DVC repository repository. + Additionally, DVC will delete the current version of any objects which were + present in cloud storage but that do not exist in your current DVC repository + workspace. +- `dvc update` can be used to update a DVC-tracked file or directory in your DVC + repository to match the current version of the corresponding object(s) from + cloud storage. + + + +Note that deleting current versions in cloud storage does not delete any objects +(and does not delete) any data). It only means that the current version of a +given object will show that the object does not exist. -⚠️ Note that setting DELETE flags does not delete any object versions (and does -not delete any data) from cloud storage, it only means that the "latest" version -of a given object will show that the object does not exist. + From 1aa6109e534bc0e167b6baae47851aa9e42c31d9 Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Mon, 26 Dec 2022 17:55:35 +0900 Subject: [PATCH 4/7] add dev/experimental admon --- .../docs/user-guide/data-management/cloud-versioning.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index 5c51623b77..6e4ab98810 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -1,5 +1,14 @@ # Cloud Versioning + + +Cloud versioning features are currently under active development and should be +considered experimental. These features are subject to frequent change, and the +documentation may not always reflect changes available in the latest DVC +release. + + + When cloud versioning is enabled, DVC will store files in the remote according to their original directory location and filenames. Different versions of a file will then be stored as separate versions of the corresponding object in cloud From d28a41e6944d22f2514ff4f3a3e7687d6f4643ed Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Mon, 26 Dec 2022 18:03:51 +0900 Subject: [PATCH 5/7] add links to/from import-url for cloud versioning --- content/docs/command-reference/import-url.md | 16 +++++++++------- .../data-management/cloud-versioning.md | 6 ++++++ 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 6f0d632c31..3ef4759b60 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -108,13 +108,15 @@ DVC supports several types of external locations (protocols): [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) is necessary to track if the specified URL changed. -DVC also supports capturing cloud versioning information when importing data -from certain cloud storage providers. When the `--version-aware` option is -provided or when the `url` argument includes a supported cloud versioning ID, -DVC will import the specified version of the given data. When using versioned -storage, DVC will always [pull](/doc/command-reference/pull) the versioned data -from its original source location. Versioned data will also not be -[pushed](/doc/command-reference/push) to remote storage. +DVC also supports capturing +[cloud versioning](/doc/user-guide/data-management/cloud-versioning) information +when importing data from certain cloud storage providers. When the +`--version-aware` option is provided or when the `url` argument includes a +supported cloud versioning ID, DVC will import the specified version of the +given data. When using versioned storage, DVC will always +[pull](/doc/command-reference/pull) the versioned data from its original source +location. Versioned data will also not be [pushed](/doc/command-reference/push) +to remote storage. | Type | Description | Versioned `url` format example | | ------- | ---------------------------- | ------------------------------------------------------ | diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index 6e4ab98810..77620a9a2f 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -105,3 +105,9 @@ Note that deleting current versions in cloud storage does not delete any objects given object will show that the object does not exist. + +## Importing versioned data + +DVC supports importing cloud versioned data from supported storage providers. +Refer to the documentation for `dvc import-url` and `dvc update` for more +information. From a6b1fded428357b76afd8df0692c146272ee0867 Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Fri, 30 Dec 2022 11:36:25 +0900 Subject: [PATCH 6/7] review fixes --- .../docs/command-reference/remote/modify.md | 55 ++++++++++++------- .../data-management/cloud-versioning.md | 8 +-- 2 files changed, 38 insertions(+), 25 deletions(-) diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 19976106f7..d96251e104 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -351,9 +351,8 @@ For more on the supported env vars, please see the cloud versioning features for this S3 remote. Files stored in the remote will retain their original filenames and directory hierarchy, and different versions of files will be stored as separate versions of the corresponding - object in the remote. This requires that - [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) - be enabled on the specified S3 bucket. + object in the remote. + - `worktree` - Use [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) cloud versioning features for this S3 remote. Files stored in the remote will @@ -361,10 +360,16 @@ For more on the supported env vars, please see the versions of files will be stored as separate versions of the corresponding object in cloud storage. DVC will also attempt to ensure that the current version of objects in the remote match the latest version of files in the DVC - repository. This requires that - [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) - be enabled on the specified S3 bucket. When both `version_aware` and - `worktree` are set, `worktree` takes precedence. + repository. When both `version_aware` and `worktree` are set, `worktree` takes + precedence. + + + +The `version_aware` and `worktree` options require that +[S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) +be enabled on the specified S3 bucket. + +
@@ -573,9 +578,7 @@ can propagate from an Azure configuration file (typically managed with cloud versioning features for this Azure remote. Files stored in the remote will retain their original filenames and directory hierarchy, and different versions of files will be stored as separate versions of the corresponding - object in the remote. This requires that - [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) - be enabled on the specified Azure storage account and container. + object in the remote. - `worktree` - Use [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) @@ -584,10 +587,16 @@ can propagate from an Azure configuration file (typically managed with versions of files will be stored as separate versions of the corresponding object in cloud storage. DVC will also attempt to ensure that the current version of objects in the remote match the latest version of files in the DVC - repository. This requires that - [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) - be enabled on the specified Azure storage account and container. When both - `version_aware` and `worktree` are set, `worktree` takes precedence. + repository. When both `version_aware` and `worktree` are set, `worktree` takes + precedence. + + + +The `version_aware` and `worktree` options require that +[Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) +be enabled on the specified Azure storage account and container. + + @@ -768,9 +777,7 @@ $ export GOOGLE_APPLICATION_CREDENTIALS='.../project-XXX.json' cloud versioning features for this Google Cloud Storage remote. Files stored in the remote will retain their original filenames and directory hierarchy, and different versions of files will be stored as separate versions of the - corresponding object in the remote. This requires that - [Object versioning](https://cloud.google.com/storage/docs/object-versioning) - be enabled on the specified bucket. + corresponding object in the remote. - `worktree` - Use [worktree](/docs/user-guide/data-management/cloud-versioning#worktree-remotes) @@ -779,10 +786,16 @@ $ export GOOGLE_APPLICATION_CREDENTIALS='.../project-XXX.json' and different versions of files will be stored as separate versions of the corresponding object in cloud storage. DVC will also attempt to ensure that the current version of objects in the remote match the latest version of files - in the DVC repository. This requires that - [Object versioning](https://cloud.google.com/storage/docs/object-versioning) - be enabled on the specified bucket. When both `version_aware` and `worktree` - are set, `worktree` takes precedence. + in the DVC repository. When both `version_aware` and `worktree` are set, + `worktree` takes precedence. + + + +The `version_aware` and `worktree` options require that +[Object versioning](https://cloud.google.com/storage/docs/object-versioning) be +enabled on the specified bucket. + + diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index 77620a9a2f..9f71fa2034 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -15,23 +15,23 @@ will then be stored as separate versions of the corresponding object in cloud storage. This is useful for cases where users prefer to retain their original filenames and directory hierarchy in remote storage (instead of using DVC's usual -[content-addressible storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) +[content-addressable storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) format). Note that not all DVC functionality is supported when using cloud versioned remotes, and using cloud versioning comes with the tradeoff of losing certain -benefits of content-addressible storage. +benefits of content-addressable storage.
-### Expand for more details on the differences between cloud versioned and content-addressible storage +### Expand for more details on the differences between cloud versioned and content-addressable storage `dvc remote` storage normally uses -[content-addressible storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) +[content-addressable storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) to organize versioned data. Different versions of files are stored in the remote according to hash of their data content instead of according to their original filenames and directory location. This allows DVC to optimize certain remote From 8c148d0822c5ae1553e1c5baee8ecd22e28340ee Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Fri, 30 Dec 2022 09:08:01 -0500 Subject: [PATCH 7/7] Apply suggestions from code review --- content/docs/user-guide/data-management/cloud-versioning.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index 9f71fa2034..51f8f32b57 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -75,7 +75,7 @@ When the `version_aware` option is enabled on a `dvc remote`: Note that when `version_aware` is in use, DVC does not delete current versions or restore noncurrent versions of objects in cloud storage. So the current -version of an object in cloud storage version of a file in your DVC repository. +version of an object in cloud storage may not match the version of a file in your DVC repository. @@ -101,7 +101,7 @@ when the `worktree` option is enabled on a `dvc remote`: Note that deleting current versions in cloud storage does not delete any objects -(and does not delete) any data). It only means that the current version of a +(and does not delete any data). It only means that the current version of a given object will show that the object does not exist.