From 2c540fc820874081d0c802c486d6419ac489329a Mon Sep 17 00:00:00 2001 From: Max Risuhin Date: Wed, 4 Dec 2019 00:51:27 -0800 Subject: [PATCH 1/2] gdrive support --- pages/features.js | 4 +- src/Diagram/index.js | 4 +- static/docs/command-reference/get-url.md | 4 +- static/docs/command-reference/import-url.md | 4 +- static/docs/command-reference/remote/add.md | 49 +++++++++++++++---- static/docs/command-reference/remote/index.md | 6 +-- .../docs/command-reference/remote/modify.md | 24 ++++++++- static/docs/get-started/configure.md | 7 +-- static/docs/install/linux.md | 2 +- static/docs/install/macos.md | 2 +- static/docs/install/windows.md | 2 +- .../docs/understanding-dvc/core-features.md | 4 +- static/docs/understanding-dvc/how-it-works.md | 2 +- .../use-cases/sharing-data-and-model-files.md | 8 +-- .../versioning-data-and-model-files.md | 3 +- static/docs/user-guide/contributing/core.md | 20 ++++++++ 16 files changed, 109 insertions(+), 36 deletions(-) diff --git a/pages/features.js b/pages/features.js index 3b7acbd94f..12d4d87b44 100644 --- a/pages/features.js +++ b/pages/features.js @@ -53,8 +53,8 @@ export default function FeaturesPage() { Storage agnostic - Use S3, Azure, GCP, SSH, SFTP, Aliyun OSS rsync or any - network-attached storage to store data. The list of supported + Use S3, Azure, Google Drive, GCP, SSH, SFTP, Aliyun OSS rsync or + any network-attached storage to store data. The list of supported protocols is constantly expanding. diff --git a/src/Diagram/index.js b/src/Diagram/index.js index ff25ee4e4f..b3ddbe95a8 100644 --- a/src/Diagram/index.js +++ b/src/Diagram/index.js @@ -36,8 +36,8 @@ const ColumnOne = () => (

Version control machine learning models, data sets and intermediate - files. DVC connects them with code and uses S3, Azure, GCP, SSH, Aliyun - OSS or to store file contents. + files. DVC connects them with code and uses S3, Azure, Google Drive, + GCP, SSH, Aliyun OSS or to store file contents.

Full code and data provenance help track the complete evolution of every diff --git a/static/docs/command-reference/get-url.md b/static/docs/command-reference/get-url.md index 91abdecf2d..2f6ef0a027 100644 --- a/static/docs/command-reference/get-url.md +++ b/static/docs/command-reference/get-url.md @@ -47,8 +47,8 @@ DVC supports several types of (local or) remote locations (protocols): > Depending on the remote locations type you plan to download data from you > might need to specify one of the optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, and `[oss]` (or `[all]` to include them all) when -> [installing DVC](/doc/install) with `pip`. +> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all) +> when [installing DVC](/doc/install) with `pip`. Another way to understand the `dvc get-url` command is as a tool for downloading data files. diff --git a/static/docs/command-reference/import-url.md b/static/docs/command-reference/import-url.md index 611b2fab6c..3de23da252 100644 --- a/static/docs/command-reference/import-url.md +++ b/static/docs/command-reference/import-url.md @@ -60,8 +60,8 @@ DVC supports several types of (local or) remote locations (protocols): > Depending on the remote locations type you plan to download data from you > might need to specify one of the optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, and `[oss]` (or `[all]` to include them all) when -> [installing DVC](/doc/install) with `pip`. +> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all) +> when [installing DVC](/doc/install) with `pip`. diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index e40571b4b4..948c20a194 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -24,19 +24,19 @@ positional arguments: ## Description `name` and `url` are required. `url` specifies a location to store your data. It -can be an SSH, S3 path, Azure, Google Cloud address, Aliyun OSS, local -directory, etc. (See all the supported remote storage types in the examples -below.) If `url` is a local relative path, it will be resolved relative to the -current working directory but saved **relative to the config file location** -(see LOCAL example below). Whenever possible DVC will create a remote directory -if it doesn't exists yet. It won't create an S3 bucket though and will rely on -default access settings. +can be an SSH, S3 path, Azure, Google Drive path, Google Cloud address, Aliyun +OSS, local directory, etc. (See all the supported remote storage types in the +examples below.) If `url` is a local relative path, it will be resolved relative +to the current working directory but saved **relative to the config file +location** (see LOCAL example below). Whenever possible DVC will create a remote +directory if it doesn't exists yet. It won't create an S3 bucket though and will +rely on default access settings. > If you installed DVC via `pip`, depending on the remote storage type you plan > to use you might need to install optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, and `[oss]`; or `[all]` to include them all. The command -> should look like this: `pip install "dvc[s3]"`. This installs `boto3` library -> along with DVC to support Amazon S3 storage. +> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. +> The command should look like this: `pip install "dvc[s3]"`. This installs +> `boto3` library along with DVC to support Amazon S3 storage. This command creates a section in the DVC project's [config file](/doc/command-reference/config) and optionally assigns a default @@ -234,6 +234,35 @@ $ dvc remote add myremote "azure://"

+### Click for Google Drive + +> Since Google Drive has tight API usage quotas, creation and configuration of +> your own `Google Project` is required: + +> 1. Log into the [Google Cloud Platform](https://console.developers.google.com) +> account associated with Google Drive you want to use as remote. +> 2. Create `New Project` or select available one. +> 3. Click `ENABLE APIS AND SERVICES` and search for `drive` to enable +> `Google Drive API` from search results. +> 4. Navigate to +> [All Credentials](https://console.developers.google.com/apis/credentials) +> page and click `Create Credentials` to select `OAuth client ID`. It might +> ask you to setup a product name on the consent screen. +> 5. Select `Other` for `Application type` and click `Create` to proceed with +> default `Name`. +> 6. `client id` and `client secret` should be showed to you. Use them for +> further DVC's configuration. + +```dvc +$ dvc remote add myremote gdrive://root/my-dvc-root +$ dvc remote modify myremote gdrive_client_id my_gdrive_client_id +$ dvc remote modify myremote gdrive_client_secret gdrive_client_secret +``` + +
+ +
+ ### Click for Google Cloud Storage ```dvc diff --git a/static/docs/command-reference/remote/index.md b/static/docs/command-reference/remote/index.md index a69939adb0..ec1c504230 100644 --- a/static/docs/command-reference/remote/index.md +++ b/static/docs/command-reference/remote/index.md @@ -39,9 +39,9 @@ more details. > If you installed DVC via `pip`, depending on the remote storage type you plan > to use you might need to install optional dependencies: `[s3]`, `[ssh]`, -> `[gs]`, `[azure]`, and `[oss]`; or `[all]` to include them all. The command -> should look like this: `pip install "dvc[s3]"`. This installs `boto3` library -> along with DVC to support S3 storage. +> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. +> The command should look like this: `pip install "dvc[s3]"`. This installs +> `boto3` library along with DVC to support S3 storage. Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually the `.dvc/cache` directory). This diff --git a/static/docs/command-reference/remote/modify.md b/static/docs/command-reference/remote/modify.md index b9606dfa98..30ab55426a 100644 --- a/static/docs/command-reference/remote/modify.md +++ b/static/docs/command-reference/remote/modify.md @@ -28,7 +28,7 @@ positional arguments: Remote `name` and `option` name are required. Option names are remote type specific. See below examples and a list of remote storage types: Amazon S3, -Google Cloud, Azure, SSH, ALiyun OSS, among others. +Google Cloud, Azure, Google Drive, SSH, ALiyun OSS, among others. This command modifies a `remote` section in the project's [config file](/doc/command-reference/config). Alternatively, `dvc config` or @@ -185,6 +185,28 @@ For more information on configuring Azure Storage connection strings, visit
+### Click for Google Drive available options + +- `url` - remote location URL. + + ```dvc + $ dvc remote modify myremote url "gdrive://root/my-dvc-root" + ``` + +- `gdrive_client_id` - Google Project's OAuth 2.0 client id. + + ```dvc + $ dvc remote modify myremote gdrive_client_id my_gdrive_client_id + ``` + +- `gdrive_client_secret` - Google Project's OAuth 2.0 client secret. + + ```dvc + $ dvc remote modify myremote gdrive_client_secret gdrive_client_secret + ``` + + +
### Click for Google Cloud Storage available options diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 659ba1ef36..1d84e22092 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -38,15 +38,16 @@ DVC currently supports seven types of remotes: - `s3`: Amazon Simple Storage Service - `gs`: Google Cloud Storage - `azure`: Azure Blob Storage +- `gdrive` : Google Drive - `ssh`: Secure Shell - `hdfs`: Hadoop Distributed File System - `http`: HTTP and HTTPS protocols > If you installed DVC via `pip`, depending on the remote type you plan to use > you might need to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`, -> `[azure]`, and `[oss]`; or `[all]` to include them all. The command should -> look like this: `pip install "dvc[s3]"`. This installs `boto3` library along -> with DVC to support Amazon S3 storage. +> `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. The +> command should look like this: `pip install "dvc[s3]"`. This installs `boto3` +> library along with DVC to support Amazon S3 storage. For example, to setup an S3 remote we would use something like this (make sure that `mybucket` exists): diff --git a/static/docs/install/linux.md b/static/docs/install/linux.md index 090954be79..5d779f03ba 100644 --- a/static/docs/install/linux.md +++ b/static/docs/install/linux.md @@ -14,7 +14,7 @@ $ pip install dvc Depending on the type of the [remote storage](/doc/command-reference/remote) you plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`, -`[gs]`, `[azure]`, and `[oss]`. Use `[all]` to include them all. +`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all.
diff --git a/static/docs/install/macos.md b/static/docs/install/macos.md index a62885b29a..428a26816a 100644 --- a/static/docs/install/macos.md +++ b/static/docs/install/macos.md @@ -37,7 +37,7 @@ $ pip install dvc Depending on the type of the [remote storage](/doc/command-reference/remote) you plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`, -`[gs]`, `[azure]`, and `[oss]`. Use `[all]` to include them all. +`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all.
diff --git a/static/docs/install/windows.md b/static/docs/install/windows.md index cf9cab99ee..47fffd24a9 100644 --- a/static/docs/install/windows.md +++ b/static/docs/install/windows.md @@ -38,7 +38,7 @@ $ pip install dvc Depending on the type of the [remote storage](/doc/command-reference/remote) you plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`, -`[gs]`, `[azure]`, and `[oss]`. Use `[all]` to include them all. +`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all.
diff --git a/static/docs/understanding-dvc/core-features.md b/static/docs/understanding-dvc/core-features.md index a87103a062..8b0fa2be09 100644 --- a/static/docs/understanding-dvc/core-features.md +++ b/static/docs/understanding-dvc/core-features.md @@ -15,5 +15,5 @@ - It's **Open-source** and **Self-serve**: DVC is free and doesn't require any additional services. -- DVC supports cloud storage (Amazon S3, Azure Blob Storage, and Google Cloud - Storage) for **data sources and pre-trained model sharing**. +- DVC supports cloud storage (Amazon S3, Azure Blob Storage, Google Drive, and + Google Cloud Storage) for **data sources and pre-trained model sharing**. diff --git a/static/docs/understanding-dvc/how-it-works.md b/static/docs/understanding-dvc/how-it-works.md index 985146e856..32a525e763 100644 --- a/static/docs/understanding-dvc/how-it-works.md +++ b/static/docs/understanding-dvc/how-it-works.md @@ -73,7 +73,7 @@ ``` - The cache of a DVC project can be shared with colleagues through Amazon S3, - Azure Blob Storage, and Google Cloud Storage, among others: + Azure Blob Storage, Google Drive, and Google Cloud Storage, among others: ```dvc $ git push diff --git a/static/docs/use-cases/sharing-data-and-model-files.md b/static/docs/use-cases/sharing-data-and-model-files.md index fc5dc395ab..c351fc3519 100644 --- a/static/docs/use-cases/sharing-data-and-model-files.md +++ b/static/docs/use-cases/sharing-data-and-model-files.md @@ -5,10 +5,10 @@ easy to consistently get all your data files and directories into any machine, along with matching source code. All you need to do is to setup [remote storage](/doc/command-reference/remote) for your DVC project, and push the data there, so others can reach it. Currently DVC -supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, SSH, -HDFS, and other remote locations, and the list is constantly growing. (For a -complete list and configuration instructions, take a look at the examples in -`dvc remote add`.) +supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, Google +Drive, SSH, HDFS, and other remote locations, and the list is constantly +growing. (For a complete list and configuration instructions, take a look at the +examples in `dvc remote add`.) ![](/static/img/model-sharing-digram.png) diff --git a/static/docs/use-cases/versioning-data-and-model-files.md b/static/docs/use-cases/versioning-data-and-model-files.md index 134d2e4f29..24e5be449e 100644 --- a/static/docs/use-cases/versioning-data-and-model-files.md +++ b/static/docs/use-cases/versioning-data-and-model-files.md @@ -20,7 +20,8 @@ In this basic scenario, DVC is a better replacement for `git-lfs` (see ad-hoc scripts on top of Amazon S3 (or any other cloud) used to manage ML data artifacts like raw data, models, etc. Unlike `git-lfs`, DVC doesn't require installing a dedicated server; It can be used on-premises (NAS, -SSH, for example) or with any major cloud provider (S3, Google Cloud, Azure). +SSH, for example) or with any major cloud provider (S3, Google Cloud, Azure, +Google Drive). Let's say you already have a Git repository that uses a bunch of images stored in the `images/` directory and has a `model.pkl` file – a model file deployed to diff --git a/static/docs/user-guide/contributing/core.md b/static/docs/user-guide/contributing/core.md index 9c53c6d8f8..df99138626 100644 --- a/static/docs/user-guide/contributing/core.md +++ b/static/docs/user-guide/contributing/core.md @@ -143,6 +143,7 @@ Install requirements for whatever remotes you are going to test: $ pip install -e ".[s3]" $ pip install -e ".[gs]" $ pip install -e ".[azure]" +$ pip install -e ".[gdrive]" $ pip install -e ".[ssh]" # or $ pip install -e ".[all]" @@ -250,6 +251,25 @@ $ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountN
+### Click for Google Drive testing instructions + +#### WARNING: Do not share Google Drive access token with anyone to avoid unauthorized usage of your Google Drive. + +To avoid tests flow interruption by manual login, do authorization once and +backup obtained Google Drive access token which is stored by default under +`.dvc/tmp/gdrive-user-credentials.json`. Restore `gdrive-user-credentials.json` +from backup for any new DVC repo setup to avoid manual login. + +Or add this to your env (use encryption for CI setup): + +```dvc +$ export GDRIVE_USER_CREDENTIALS_DATA='CONTENT_of_gdrive-user-credentials.json' +``` + +
+ +
+ ### Click for HDFS testing instructions Tests currently only work on Linux. First you need to set up passwordless ssh From 0bec1d58a6af37879ecc444103fad05df97e8d33 Mon Sep 17 00:00:00 2001 From: Max Risuhin Date: Thu, 5 Dec 2019 00:49:06 -0800 Subject: [PATCH 2/2] GDrive auth flow description --- static/docs/command-reference/remote/add.md | 45 ++++++++++++--------- static/docs/user-guide/contributing/core.md | 3 +- 2 files changed, 29 insertions(+), 19 deletions(-) diff --git a/static/docs/command-reference/remote/add.md b/static/docs/command-reference/remote/add.md index 948c20a194..891d150db0 100644 --- a/static/docs/command-reference/remote/add.md +++ b/static/docs/command-reference/remote/add.md @@ -24,8 +24,8 @@ positional arguments: ## Description `name` and `url` are required. `url` specifies a location to store your data. It -can be an SSH, S3 path, Azure, Google Drive path, Google Cloud address, Aliyun -OSS, local directory, etc. (See all the supported remote storage types in the +can be an SSH, S3 path, Azure, Google Drive path, Google Cloud path, Aliyun OSS, +local directory, etc. (See all the supported remote storage types in the examples below.) If `url` is a local relative path, it will be resolved relative to the current working directory but saved **relative to the config file location** (see LOCAL example below). Whenever possible DVC will create a remote @@ -236,22 +236,22 @@ $ dvc remote add myremote "azure://" ### Click for Google Drive -> Since Google Drive has tight API usage quotas, creation and configuration of -> your own `Google Project` is required: - -> 1. Log into the [Google Cloud Platform](https://console.developers.google.com) -> account associated with Google Drive you want to use as remote. -> 2. Create `New Project` or select available one. -> 3. Click `ENABLE APIS AND SERVICES` and search for `drive` to enable -> `Google Drive API` from search results. -> 4. Navigate to -> [All Credentials](https://console.developers.google.com/apis/credentials) -> page and click `Create Credentials` to select `OAuth client ID`. It might -> ask you to setup a product name on the consent screen. -> 5. Select `Other` for `Application type` and click `Create` to proceed with -> default `Name`. -> 6. `client id` and `client secret` should be showed to you. Use them for -> further DVC's configuration. +Since Google Drive has tight API usage quotas, creation and configuration of +your own `Google Project` is required: + +1. Log into the [Google Cloud Platform](https://console.developers.google.com) + account associated with Google Drive you want to use as remote. +2. Create `New Project` or select available one. +3. Click `ENABLE APIS AND SERVICES` and search for `drive` to enable + `Google Drive API` from search results. +4. Navigate to + [All Credentials](https://console.developers.google.com/apis/credentials) + page and click `Create Credentials` to select `OAuth client ID`. It might + ask you to setup a product name on the consent screen. +5. Select `Other` for `Application type` and click `Create` to proceed with + default `Name`. +6. `client id` and `client secret` should be showed to you. Use them for + further DVC's configuration. ```dvc $ dvc remote add myremote gdrive://root/my-dvc-root @@ -259,6 +259,15 @@ $ dvc remote modify myremote gdrive_client_id my_gdrive_client_id $ dvc remote modify myremote gdrive_client_secret gdrive_client_secret ``` +On first usage of remote you will be prompted to visit access token generation +link in browser. It will ask you to log into Google account associated with +Google Drive, which you want to use as DVC's remote. Login process will guide +you through granting Google Drive access permissions to the used Google Project. + +On successful access token generation, token data will be cached in git ignored +directory with path `.dvc/tmp/gdrive-user-credentials.json`. Do not share token +data with anyone else to prevent unauthorized access to your Google Drive. +
diff --git a/static/docs/user-guide/contributing/core.md b/static/docs/user-guide/contributing/core.md index df99138626..7198a5663c 100644 --- a/static/docs/user-guide/contributing/core.md +++ b/static/docs/user-guide/contributing/core.md @@ -253,7 +253,8 @@ $ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountN ### Click for Google Drive testing instructions -#### WARNING: Do not share Google Drive access token with anyone to avoid unauthorized usage of your Google Drive. +❗Do not share Google Drive access token with anyone to avoid unauthorized usage +of your Google Drive. To avoid tests flow interruption by manual login, do authorization once and backup obtained Google Drive access token which is stored by default under