From e6020bad9d3d236dcb4be5333686f207c3942d8c Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 27 Feb 2023 12:33:18 -0600 Subject: [PATCH] guide: Azure and GCP remote pages (#4284) * ref: start Remote Reference (config) * Restyled by prettier (#4265) Co-authored-by: Restyled.io * guide: move Remote Storage ref into Data Mgmt * start: links to new Remotes guide and and some typo fixes * guide: finalize S3 storage page and and remove repeated content from cmd refs (link to guide) * guide: move "local remotes" to Remotes (index page) and update admonitions and links * ref: remove S3 examples * guide: Azure remote page and start GCS * guide: finish GCS page and improvements to the other ones (S3, Azure) * guide: small link fix in GDrive how-to * guide: emphasize that remotes use regular cloud storage config * Update content/docs/user-guide/data-management/remote-storage/amazon-s3.md * guide: drop `worktree` cloud versioning from Remotes Config per https://github.com/iterative/dvc.org/pull/4264#discussion_r1102047027 * guide: move cloud versioning near the top of Remote Config per https://github.com/iterative/dvc.org/pull/4264#pullrequestreview-1287877616 * fix a link * typo * reformat all storage types (Data Mgmt/ Remote Storage) * guide: move admon about pending Remote guides up rel. https://github.com/iterative/dvc.org/pull/4284#pullrequestreview-1312078354 * link all remote types (instead of admon) per https://github.com/iterative/dvc.org/pull/4284#pullrequestreview-1312078354 * Restyled by prettier (#4333) Co-authored-by: Restyled.io * Update content/docs/user-guide/data-management/remote-storage/amazon-s3.md Co-authored-by: Jorge Orpinel --------- Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io Co-authored-by: Dave Berenbaum --- content/docs/command-reference/remote/add.md | 50 +--- .../docs/command-reference/remote/modify.md | 272 +----------------- content/docs/sidebar.json | 6 +- .../data-management/cloud-versioning.md | 24 +- .../remote-storage/amazon-s3.md | 17 +- .../remote-storage/azure-blob-storage.md | 215 ++++++++++++++ .../remote-storage/google-cloud-storage.md | 100 +++++++ .../data-management/remote-storage/index.md | 38 +-- .../how-to/setup-google-drive-remote.md | 4 +- 9 files changed, 382 insertions(+), 344 deletions(-) create mode 100644 content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md create mode 100644 content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index c3d5313d3f..433862130c 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -126,32 +126,16 @@ The following are the supported types of storage protocols and platforms. ### Cloud providers - [Amazon S3] (AWS) and [S3-compatible] e.g. MinIO +- Microsoft [Azure Blob Storage] +- [Google Cloud Storage] (GCP) [amazon s3]: /doc/user-guide/data-management/remote-storage/amazon-s3 [s3-compatible]: /doc/user-guide/data-management/remote-storage/amazon-s3#s3-compatible-servers-non-amazon - -
- -### Microsoft Azure Blob Storage - -```cli -$ dvc remote add -d myremote azure://mycontainer/path -$ dvc remote modify myremote account_name 'myuser' -``` - -By default, DVC authenticates using an `account_name` and its [default -credential] (if any), which uses environment variables (e.g. set by `az cli`) or -a Microsoft application. - -[default credential]: - https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential - -To use a custom authentication method, use the parameters described in -`dvc remote modify`. See some -[examples](/doc/command-reference/remote/modify#example-some-azure-authentication-methods). - -
+[azure blob storage]: + /doc/user-guide/data-management/remote-storage/azure-blob-storage +[google cloud storage]: + /doc/user-guide/data-management/remote-storage/google-cloud-storage
@@ -189,28 +173,6 @@ modified.
-### Google Cloud Storage - -> 💡 Before adding a GC Storage remote, be sure to -> [Create a storage bucket](https://cloud.google.com/storage/docs/creating-buckets). - -```cli -$ dvc remote add -d myremote gs://mybucket/path -``` - -By default, DVC expects your GCP CLI is already -[configured](https://cloud.google.com/sdk/docs/authorizing). DVC will be using -default GCP key file to access Google Cloud Storage. To override some of these -parameters, use the parameters described in `dvc remote modify`. - -> Make sure to run `gcloud auth application-default login` unless you use -> `GOOGLE_APPLICATION_CREDENTIALS` and/or service account, or other ways to -> authenticate. See details [here](https://stackoverflow.com/a/53307505/298182). - -
- -
- ### Aliyun OSS First you need to set up OSS storage on Aliyun Cloud. Then, use an S3 style URL diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 1b3dc29214..a98bce6bee 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -136,210 +136,16 @@ details in the pages linked below. ### Cloud providers - [Amazon S3] (AWS) and [S3-compatible] e.g. MinIO +- Microsoft [Azure Blob Storage] +- [Google Cloud Storage] (GCP) [amazon s3]: /doc/user-guide/data-management/remote-storage/amazon-s3 [s3-compatible]: /doc/user-guide/data-management/remote-storage/amazon-s3#s3-compatible-servers-non-amazon - -
- -### Microsoft Azure Blob Storage - -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. - -- `url` (required) - remote location, in the `azure:///` - format: - - ```cli - $ dvc remote modify myremote url azure://mycontainer/path - ``` - - Note that if the given container name isn't found in your account, DVC will - attempt to create it. - -- `account_name` - storage account name. Required for every authentication - method except `connection_string` (which already includes it). - - ```cli - $ dvc remote modify myremote account_name 'myaccount' - ``` - - - -The `version_aware` option requires that -[Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) -be enabled on the specified Azure storage account and container. - - - -- `version_aware` - Use - [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) - cloud versioning features for this Azure remote. Files stored in the remote - will retain their original filenames and directory hierarchy, and different - versions of files will be stored as separate versions of the corresponding - object in the remote. - -**Authentication** - -By default, DVC authenticates using an `account_name` and its [default -credential] (if any), which uses environment variables (e.g. set by `az cli`) or -a Microsoft application. - -[default credential]: - https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential - -
- -#### For Windows users - -When using default authentication, you may need to enable some of these -exclusion parameters depending on your setup -([details][azure-default-cred-params]): - -[azure-default-cred-params]: - https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python#parameters - -```cli -$ dvc remote modify --system myremote - exclude_environment_credential true -$ dvc remote modify --system myremote - exclude_visual_studio_code_credential true -$ dvc remote modify --system myremote - exclude_shared_token_cache_credential true -$ dvc remote modify --system myremote - exclude_managed_identity_credential true -``` - -
- -To use a custom authentication method, you can either use this command to -configure the appropriate auth params, use environment variables, or rely on an -Azure config file (in that order). More details below. - -> See some [Azure auth examples](#example-some-azure-authentication-methods). - -#### Authenticate with DVC config parameters - -The following parameters are listed in the order they are used by DVC when -attempting to authenticate with Azure: - -1. `connection_string` is used for authentication if given (`account_name` is - ignored). -2. If `tenant_id` and `client_id`, `client_secret` are given, Active Directory - (AD) [service principal] auth is performed. -3. DVC will next try to connect with `account_key` or `sas_token` (in this - order) if either are provided. -4. If `allow_anonymous_login` is set to `True`, then DVC will try to connect - [anonymously]. - -[service principal]: - https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal -[anonymously]: - https://docs.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure - -- `connection_string` - Azure Storage - [connection string](http://azure.microsoft.com/en-us/documentation/articles/storage-configure-connection-string/) - (recommended). - - ```cli - $ dvc remote modify --local myremote \ - connection_string 'mysecret' - ``` - -* `tenant_id` - tenant ID for AD _service principal_ authentication (requires - `client_id` and `client_secret` along with this): - - ```cli - $ dvc remote modify --local myremote tenant_id 'mytenant' - ``` - -* `client_id` - client ID for _service principal_ authentication (when - `tenant_id` is set): - - ```cli - $ dvc remote modify --local myremote client_id 'myclient' - ``` - -* `client_secret` - client Secret for _service principal_ authentication (when - `tenant_id` is set): - - ```cli - $ dvc remote modify --local myremote client_secret 'mysecret' - ``` - -* `account_key` - storage account key: - - ```cli - $ dvc remote modify --local myremote account_key 'mykey' - ``` - -* `sas_token` - shared access signature token: - - ```cli - $ dvc remote modify --local myremote sas_token 'mysecret' - ``` - -* `allow_anonymous_login` - whether to fall back to anonymous login if no other - auth params are given (besides `account_name`). This will only work with - public buckets: - - ```cli - $ dvc remote modify myremote allow_anonymous_login true - ``` - -#### Authenticate with environment variables - -Azure remotes can also authenticate via env vars (instead of -`dvc remote modify`). These are tried if none of the params above are set. - -For Azure connection string: - -```cli -$ export AZURE_STORAGE_CONNECTION_STRING='mysecret' -``` - -For account name and key/token auth: - -```cli -$ export AZURE_STORAGE_ACCOUNT='myaccount' -# and -$ export AZURE_STORAGE_KEY='mysecret' -# or -$ export AZURE_STORAGE_SAS_TOKEN='mysecret' -``` - -For _service principal_ auth (via certificate file): - -```cli -$ export AZURE_TENANT_ID='directory-id' -$ export AZURE_CLIENT_ID='client-id' -$ export AZURE_CLIENT_CERTIFICATE_PATH='/path/to/certificate' -``` - -For simple username/password login: - -```cli -$ export AZURE_CLIENT_ID='client-id' -$ export AZURE_USERNAME='myuser' -$ export AZURE_PASSWORD='mysecret' -``` - -> See also description here for some -> [env vars](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential) -> available. - -#### Authenticate with an Azure config file - -As a final option (if no params or env vars are set), some of the auth methods -can propagate from an Azure configuration file (typically managed with -[az config](https://docs.microsoft.com/en-us/cli/azure/config)): -`connection_string`, `account_name`, `account_key`, `sas_token` and -`container_name`. The default directory where it will be searched for is -`~/.azure` but this can be customized with the `AZURE_CONFIG_DIR` env var. - - +[azure blob storage]: + /doc/user-guide/data-management/remote-storage/azure-blob-storage +[google cloud storage]: + /doc/user-guide/data-management/remote-storage/google-cloud-storage
@@ -470,68 +276,6 @@ more information.
-### Google Cloud Storage - -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. - -- `url` - remote location, in the `gs:///` format: - - ```cli - $ dvc remote modify myremote url gs://mybucket/path - ``` - -- `projectname` - override or provide a project name to use, if a default one is - not set. - - ```cli - $ dvc remote modify myremote projectname myproject - ``` - - - -The `version_aware` option requires that -[Object versioning](https://cloud.google.com/storage/docs/object-versioning) be -enabled on the specified bucket. - - - -- `version_aware` - Use - [version-aware](/docs/user-guide/data-management/cloud-versioning#version-aware-remotes) - cloud versioning features for this Google Cloud Storage remote. Files stored - in the remote will retain their original filenames and directory hierarchy, - and different versions of files will be stored as separate versions of the - corresponding object in the remote. - -**For service accounts:** - -A service account is a Google account associated with your GCP project, and not -a specific user. Please refer to -[Using service accounts](https://cloud.google.com/iam/docs/service-accounts) for -more information. - -- `credentialpath` - path to the file that contains the - [service account key](https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account). - Make sure that the service account has read/write access (as needed) to the - file structure in the remote `url`. - - ```cli - $ dvc remote modify --local myremote \ - credentialpath '/home/.../project-XXX.json' - ``` - -Alternatively, the `GOOGLE_APPLICATION_CREDENTIALS` environment variable can be -set: - -```cli -$ export GOOGLE_APPLICATION_CREDENTIALS='.../project-XXX.json' -``` - - - -
- ### Aliyun OSS > If any values given to the parameters below contain sensitive user info, add @@ -1007,6 +751,8 @@ by HDFS. Read more about by expanding the WebHDFS section in ```
+<<<<<<< HEAD +======= ## Example: Some Azure authentication methods @@ -1046,3 +792,5 @@ $ dvc remote modify --local myremote account_name 'myaccount' $ dvc remote modify --local myremote sas_token 'mysecret' $ dvc push ``` + +> > > > > > > main diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index e544fbf7d5..66abe4d681 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -127,7 +127,11 @@ { "slug": "remote-storage", "source": "remote-storage/index.md", - "children": ["amazon-s3"] + "children": [ + "amazon-s3", + "azure-blob-storage", + "google-cloud-storage" + ] }, "cloud-versioning", "importing-external-data", diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index 8278ea3b3d..5b79a037f7 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -50,15 +50,21 @@ remote storage performance optimizations will be unavailable. Cloud versioning features are only avaible for certain storage providers. Currently, it is supported on the following `dvc remote` types: -- Amazon S3 (requires - [S3 Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) - enabled buckets) -- Microsoft Azure Blob Storage (requires - [Blob versioning](https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview) - enabled storage accounts and containers) -- Google Cloud Storage (requires - [Object versioning](https://cloud.google.com/storage/docs/object-versioning) - enabled buckets) +- [Amazon S3] (requires [S3 Versioning] enabled buckets) +- Microsoft [Azure Blob Storage] (requires [Blob versioning] enabled storage + accounts and containers) +- [Google Cloud Storage] (requires [Object versioning] enabled buckets) + +[amazon s3]: /doc/user-guide/data-management/remote-storage/amazon-s3 +[s3 versioning]: + https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html +[azure blob storage]: + /doc/user-guide/data-management/remote-storage/azure-blob-storage +[blob versioning]: + https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview +[google cloud storage]: + /doc/user-guide/data-management/remote-storage/google-cloud-storage +[object versioning]: https://cloud.google.com/storage/docs/object-versioning ## Version-aware remotes diff --git a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md index 395982db00..5376c9f3f1 100644 --- a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md +++ b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md @@ -13,7 +13,7 @@ $ dvc remote add -d myremote s3:/// - `` - name of an [existing S3 bucket] - `` - optional path to a [folder key] in your bucket -Upon `dvc push` (or when needed) DVC will try to authenticate using your [AWS +Upon `dvc push` (or when needed), DVC will try to authenticate using your [AWS CLI config]. This reads the default AWS credentials file (if available) or [env vars](#environment-variables). @@ -65,7 +65,7 @@ change the auth method for some reason. The `dvc remote modify --local` flag is needed to write sensitive user info to a Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked -through Git. See `dvc config` for more info. +through Git. See `dvc config`. @@ -79,7 +79,7 @@ $ dvc remote modify --local myremote \ $ dvc remote modify --local myremote \ credentialpath 'path/to/credentials' # and (optional) -$ dvc remote modify --local myremote profile 'myprofile' +$ dvc remote modify myremote profile 'myprofile' ``` [aws-cli-config-files]: @@ -126,7 +126,7 @@ they're effective depends on each storage platform. [digitalocean space]: https://www.digitalocean.com/products/spaces [ibm cloud object storage]: https://www.ibm.com/cloud/object-storage -## More configuration options +## More configuration parameters @@ -134,6 +134,8 @@ See `dvc remote modify` for more command usage details. +- `url` - modify the remote location ([scroll up](#amazon-s3) for details) + - `region` - specific AWS region ```cli @@ -221,13 +223,12 @@ See `dvc remote modify` for more command usage details. ## Environment variables -Authentication and other configuration can also be set via [`boto3` env vars]. -These are tried if no config params are set in the project. -Example: +Authentication and other config can also be set via [`boto3` env vars]. These +are tried if no config params are set. Example: ```cli $ dvc remote add -d myremote s3://mybucket -$ export AWS_ACCESS_KEY_ID='mysecret' +$ export AWS_ACCESS_KEY_ID='myid' $ export AWS_SECRET_ACCESS_KEY='mysecret' $ dvc push ``` diff --git a/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md b/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md new file mode 100644 index 0000000000..873b3a2442 --- /dev/null +++ b/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md @@ -0,0 +1,215 @@ +# Microsoft Azure Blob Storage + + + +Start with `dvc remote add` to define the remote. Set a name and a valid [Azure +Blob Storage] URL: + +```cli +$ dvc remote add -d myremote azure:/// +``` + +- `` - name of a [blob container]. DVC will attempt to create it if + needed. +- `` - optional path to a [virtual directory] in your bucket + +[azure blob storage]: https://azure.microsoft.com/en-us/products/storage/blobs +[blob container]: + https://learn.microsoft.com/en-us/azure/storage/blobs/blob-containers-portal +[virtual directory]: + https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#blob-names + +To set up authentication or other configuration, set any supported config param +with `dvc remote modify`. + +## Cloud versioning + + + +Requires [Blob versioning] enabled on the storage account and container. + + + +```cli +$ dvc remote modify myremote version_aware true +``` + +`version_aware` (`true` or `false`) enables [cloud versioning] features for this +remote. This lets you explore the bucket files under the same structure you see +in your project directory locally. + +[blob versioning]: + https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview +[cloud versioning]: /docs/user-guide/data-management/cloud-versioning + +## Authentication + + + +This may require the **Storage Blob Data Contributor** and other [roles] on the +account. + +[roles]: https://learn.microsoft.com/en-us/azure/role-based-access-control/ + + + +A storage account name (`account_name`) is always needed. DVC tries to +authenticate with its [default credential] by default. This uses environment +variables (usually set during [Azure CLI configuration]) or data from certain +Microsoft applications. + +```cli +$ dvc remote modify myremote --local account_name 'myuser' +``` + +[default credential]: + https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential +[azure cli configuration]: + https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration + + + +The `dvc remote modify --local` flag is needed to write sensitive user info to a +Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked +through Git. See `dvc config`. + + + +
+ +### Windows users: click here for more info. + +When using default authentication, you may need to enable some of these +exclusion parameters depending on your setup ([details]): + +```cli +$ dvc remote modify --system myremote \ + exclude_environment_credential true +$ dvc remote modify --system myremote \ + exclude_visual_studio_code_credential true +$ dvc remote modify --system myremote \ + exclude_shared_token_cache_credential true +$ dvc remote modify --system myremote \ + exclude_managed_identity_credential true +``` + +[details]: + https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python#parameters + +
+ +For custom authentication, you can set the following config params with +`dvc remote modify --local`, use +[environment variables](#authenticate-with-environment-variables), or an +[Azure CLI config file](#authenticate-with-an-azure-cli-config-file) (in that +order). + +### Authenticate with DVC configuration parameters + +The following params are listed in the order in which they are tried. + +- A [connection string] (`connection_string`) is used if given (recommended) + (`account_name` is ignored since it's included in the connection string). + + ```cli + $ dvc remote modify --local myremote \ + connection_string 'mysecret' + ``` + +- If `tenant_id`, `client_id`, and `client_secret` are given, Active Directory + (AD) [service principal] auth is used. + + ```cli + $ dvc remote modify --local myremote tenant_id 'mytenant' + $ dvc remote modify --local myremote client_id 'myclient' + $ dvc remote modify --local myremote client_secret 'mysecret' + ``` + +- A storage account key (`account_key`) or a shared access signature token + (`sas_token`), in this order. + + ```cli + $ dvc remote modify --local myremote account_key 'mysecret' + ``` + + ```cli + $ dvc remote modify --local myremote sas_token 'mysecret' + ``` + +- If `allow_anonymous_login` is set, then [anonymous read access] will be tried + as a last resort. An `account_name` is still needed. Only works with public + containers. + + ```cli + $ dvc remote modify myremote allow_anonymous_login true + ``` + +[connection string]: + https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string +[service principal]: + https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal +[anonymous read access]: + https://docs.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure + +### Authenticate with environment variables + +Some of [these env vars] can be used instead. + +[these env vars]: + https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python + +For Azure connection string: + +```cli +$ export AZURE_STORAGE_CONNECTION_STRING='mysecret' +``` + +For account name and key/token auth: + +```cli +$ export AZURE_STORAGE_ACCOUNT='myaccount' +# and +$ export AZURE_STORAGE_KEY='mysecret' +# or +$ export AZURE_STORAGE_SAS_TOKEN='mysecret' +``` + +For _service principal_ auth (via certificate file): + +```cli +$ export AZURE_TENANT_ID='directory-id' +$ export AZURE_CLIENT_ID='client-id' +$ export AZURE_CLIENT_CERTIFICATE_PATH='/path/to/certificate' +``` + +For simple username/password login: + +```cli +$ export AZURE_CLIENT_ID='client-id' +$ export AZURE_USERNAME='myuser' +$ export AZURE_PASSWORD='mysecret' +``` + +### Authenticate with an Azure CLI config file + +If no params or env vars are set explicitly, the following values can propagate +from an [Azure CLI configuration file] (typically managed with [az config]): +`connection_string`, `account_name`, `account_key`, `sas_token` and +`container_name`. + +[azure cli configuration file]: + https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration#cli-configuration-file +[az config]: https://docs.microsoft.com/en-us/cli/azure/config + +## More configuration parameters + + + +See `dvc remote modify` for more command usage details. + + + +- `url` - modify the remote location ([scroll up](#microsoft-azure-blob-storage) + for details) diff --git a/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md b/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md new file mode 100644 index 0000000000..3d092cd1b5 --- /dev/null +++ b/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md @@ -0,0 +1,100 @@ +# Google Cloud Storage + + + +Start with `dvc remote add` to define the remote. Set a name and a valid [Google +Cloud Storage] URL: + +```cli +$ dvc remote add -d myremote gs:/// +``` + +- `` - name of an [existing storage bucket] +- `` - optional path to a [folder] in your bucket + +Upon `dvc push` (or when needed), DVC will try to authenticate using your +[gcloud CLI authorization]. This reads the default GCP key file. + + + +Make sure to run [gcloud auth application-default login] unless you use a +service account or other ways to authenticate ([more info]). + + + +[google cloud storage]: https://cloud.google.com/storage +[existing storage bucket]: + https://cloud.google.com/storage/docs/creating-buckets +[folder]: https://cloud.google.com/storage/docs/folders +[gcloud cli authorization]: https://cloud.google.com/sdk/docs/authorizing +[gcloud auth application-default login]: + https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login +[more info]: https://stackoverflow.com/a/53307505/298182 + +To use [custom auth](#custom-authentication) or further configure your DVC +remote, set any supported config param with `dvc remote modify`. + +## Cloud versioning + + + +Requires [Object versioning] enabled on the bucket. + + + +```cli +$ dvc remote modify myremote version_aware true +``` + +`version_aware` (`true` or `false`) enables [cloud versioning] features for this +remote. This lets you explore the bucket files under the same structure you see +in your project directory locally. + +[object versioning]: https://cloud.google.com/storage/docs/object-versioning +[cloud versioning]: /docs/user-guide/data-management/cloud-versioning + +## Custom authentication + +For [service accounts] (a Google account associated to your GCP project instead +of a user), you can set the path to the file that contains a [service account +key]: + +[service accounts]: https://cloud.google.com/iam/docs/service-accounts +[service account key]: + https://cloud.google.com/iam/docs/creating-managing-service-account-keys + +```cli +$ dvc remote modify --local myremote \ + credentialpath 'path/to/project-XXX.json' +``` + + + +The `dvc remote modify --local` flag is needed to write sensitive user info to a +Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked +through Git. See `dvc config`. + + + +Alternatively, the `GOOGLE_APPLICATION_CREDENTIALS` environment variable can be +set: + +```cli +$ export GOOGLE_APPLICATION_CREDENTIALS='.../project-XXX.json' +``` + +## More configuration parameters + + + +See `dvc remote modify` for more command usage details. + + + +- `url` - modify the remote location ([scroll up](#google-cloud-storage) for + details) + +- `projectname` - override or provide a project name to use if a default one is + not set. diff --git a/content/docs/user-guide/data-management/remote-storage/index.md b/content/docs/user-guide/data-management/remote-storage/index.md index 61183be785..cb69dc2c8c 100644 --- a/content/docs/user-guide/data-management/remote-storage/index.md +++ b/content/docs/user-guide/data-management/remote-storage/index.md @@ -92,34 +92,36 @@ team. ## Supported storage types - - -Guides for each storage type are in progress. For storage types that do not link -to a specific guide, see -[`dvc remote add`](/doc/command-reference/remote/add#supported-storage-types) -and -[`dvc remote modify`](/doc/command-reference/remote/modify#supported-storage-types). - - - ### Cloud providers - [Amazon S3] (AWS) and [S3-compatible] e.g. MinIO -- Microsoft Azure Blob Storage -- Google Drive -- Google Cloud Storage (GCP) -- Aliyun OSS +- Microsoft [Azure Blob Storage] +- [Google Drive] +- [Google Cloud Storage] (GCP) +- [Aliyun OSS] [amazon s3]: /doc/user-guide/data-management/remote-storage/amazon-s3 [s3-compatible]: /doc/user-guide/data-management/remote-storage/amazon-s3#s3-compatible-servers-non-amazon +[azure blob storage]: + /doc/user-guide/data-management/remote-storage/azure-blob-storage +[google drive]: /doc/command-reference/remote/modify#google-drive +[google cloud storage]: + /doc/user-guide/data-management/remote-storage/google-cloud-storage +[aliyun oss]: /doc/command-reference/remote/modify#aliyun-oss ### Self-hosted / On-premises -- SSH servers; Like `scp` -- HDFS & WebHDFS -- HTTP -- WebDAV +- [SSH servers]; Like `scp` +- [HDFS] & [WebHDFS] +- [HTTP] +- [WebDAV] + +[ssh servers]: /doc/command-reference/remote/modify#ssh +[hdfs]: /doc/command-reference/remote/modify#hdfs +[webhdfs]: /doc/command-reference/remote/modify#webhdfs +[http]: /doc/command-reference/remote/modify#http +[webdav]: /doc/command-reference/remote/modify#webdav ## File systems (local remotes) diff --git a/content/docs/user-guide/how-to/setup-google-drive-remote.md b/content/docs/user-guide/how-to/setup-google-drive-remote.md index 13f827250a..58c67d0352 100644 --- a/content/docs/user-guide/how-to/setup-google-drive-remote.md +++ b/content/docs/user-guide/how-to/setup-google-drive-remote.md @@ -270,8 +270,8 @@ heavy usage, it is recommended to rely on > This requires having your own -> [GC project](/doc/user-guide/how-to/setup-google-drive-remote#using-a-custom-google-cloud-project-recommended) -> as explained above. +> [GC project](#using-a-custom-google-cloud-project-recommended) as explained +> above. 1. To [create a service account](https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account),