Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdrive support #833

Merged
merged 2 commits into from
Dec 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pages/features.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ export default function FeaturesPage() {
</Icon>
<Name>Storage agnostic</Name>
<Description>
Use S3, Azure, GCP, SSH, SFTP, Aliyun OSS rsync or any
network-attached storage to store data. The list of supported
Use S3, Azure, Google Drive, GCP, SSH, SFTP, Aliyun OSS rsync or
any network-attached storage to store data. The list of supported
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
protocols is constantly expanding.
</Description>
</Feature>
Expand Down
4 changes: 2 additions & 2 deletions src/Diagram/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ const ColumnOne = () => (
<Description fullWidth>
<p>
Version control machine learning models, data sets and intermediate
files. DVC connects them with code and uses S3, Azure, GCP, SSH, Aliyun
OSS or to store file contents.
files. DVC connects them with code and uses S3, Azure, Google Drive,
GCP, SSH, Aliyun OSS or to store file contents.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
</p>
<p>
Full code and data provenance help track the complete evolution of every
Expand Down
4 changes: 2 additions & 2 deletions static/docs/command-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ DVC supports several types of (local or) remote locations (protocols):

> Depending on the remote locations type you plan to download data from you
> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`,
> `[gs]`, `[azure]`, and `[oss]` (or `[all]` to include them all) when
> [installing DVC](/doc/install) with `pip`.
> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all)
> when [installing DVC](/doc/install) with `pip`.

Another way to understand the `dvc get-url` command is as a tool for downloading
data files.
Expand Down
4 changes: 2 additions & 2 deletions static/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ DVC supports several types of (local or) remote locations (protocols):

> Depending on the remote locations type you plan to download data from you
> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`,
> `[gs]`, `[azure]`, and `[oss]` (or `[all]` to include them all) when
> [installing DVC](/doc/install) with `pip`.
> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]` (or `[all]` to include them all)
> when [installing DVC](/doc/install) with `pip`.

<!-- Separate MD quote: -->

Expand Down
58 changes: 48 additions & 10 deletions static/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,19 @@ positional arguments:
## Description

`name` and `url` are required. `url` specifies a location to store your data. It
can be an SSH, S3 path, Azure, Google Cloud address, Aliyun OSS, local
directory, etc. (See all the supported remote storage types in the examples
below.) If `url` is a local relative path, it will be resolved relative to the
current working directory but saved **relative to the config file location**
(see LOCAL example below). Whenever possible DVC will create a remote directory
if it doesn't exists yet. It won't create an S3 bucket though and will rely on
default access settings.
can be an SSH, S3 path, Azure, Google Drive path, Google Cloud path, Aliyun OSS,
local directory, etc. (See all the supported remote storage types in the
examples below.) If `url` is a local relative path, it will be resolved relative
to the current working directory but saved **relative to the config file
location** (see LOCAL example below). Whenever possible DVC will create a remote
directory if it doesn't exists yet. It won't create an S3 bucket though and will
rely on default access settings.

> If you installed DVC via `pip`, depending on the remote storage type you plan
> to use you might need to install optional dependencies: `[s3]`, `[ssh]`,
> `[gs]`, `[azure]`, and `[oss]`; or `[all]` to include them all. The command
> should look like this: `pip install "dvc[s3]"`. This installs `boto3` library
> along with DVC to support Amazon S3 storage.
> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all.
> The command should look like this: `pip install "dvc[s3]"`. This installs
> `boto3` library along with DVC to support Amazon S3 storage.

This command creates a section in the <abbr>DVC project</abbr>'s
[config file](/doc/command-reference/config) and optionally assigns a default
Expand Down Expand Up @@ -234,6 +234,44 @@ $ dvc remote add myremote "azure://"

<details>

### Click for Google Drive

Since Google Drive has tight API usage quotas, creation and configuration of
your own `Google Project` is required:

1. Log into the [Google Cloud Platform](https://console.developers.google.com)
account associated with Google Drive you want to use as remote.
2. Create `New Project` or select available one.
3. Click `ENABLE APIS AND SERVICES` and search for `drive` to enable
`Google Drive API` from search results.
4. Navigate to
[All Credentials](https://console.developers.google.com/apis/credentials)
page and click `Create Credentials` to select `OAuth client ID`. It might
ask you to setup a product name on the consent screen.
5. Select `Other` for `Application type` and click `Create` to proceed with
default `Name`.
6. `client id` and `client secret` should be showed to you. Use them for
further DVC's configuration.

```dvc
$ dvc remote add myremote gdrive://root/my-dvc-root
$ dvc remote modify myremote gdrive_client_id my_gdrive_client_id
$ dvc remote modify myremote gdrive_client_secret gdrive_client_secret
```

On first usage of remote you will be prompted to visit access token generation
link in browser. It will ask you to log into Google account associated with
Google Drive, which you want to use as DVC's remote. Login process will guide
you through granting Google Drive access permissions to the used Google Project.

On successful access token generation, token data will be cached in git ignored
directory with path `.dvc/tmp/gdrive-user-credentials.json`. Do not share token
data with anyone else to prevent unauthorized access to your Google Drive.

</details>

<details>

### Click for Google Cloud Storage

```dvc
Expand Down
6 changes: 3 additions & 3 deletions static/docs/command-reference/remote/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ more details.

> If you installed DVC via `pip`, depending on the remote storage type you plan
> to use you might need to install optional dependencies: `[s3]`, `[ssh]`,
> `[gs]`, `[azure]`, and `[oss]`; or `[all]` to include them all. The command
> should look like this: `pip install "dvc[s3]"`. This installs `boto3` library
> along with DVC to support S3 storage.
> `[gs]`, `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all.
> The command should look like this: `pip install "dvc[s3]"`. This installs
> `boto3` library along with DVC to support S3 storage.

Using DVC with a remote data storage is optional. By default, DVC is configured
to use a local data storage only (usually the `.dvc/cache` directory). This
Expand Down
24 changes: 23 additions & 1 deletion static/docs/command-reference/remote/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ positional arguments:

Remote `name` and `option` name are required. Option names are remote type
specific. See below examples and a list of remote storage types: Amazon S3,
Google Cloud, Azure, SSH, ALiyun OSS, among others.
Google Cloud, Azure, Google Drive, SSH, ALiyun OSS, among others.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

This command modifies a `remote` section in the project's
[config file](/doc/command-reference/config). Alternatively, `dvc config` or
Expand Down Expand Up @@ -185,6 +185,28 @@ For more information on configuring Azure Storage connection strings, visit

</details>

### Click for Google Drive available options

- `url` - remote location URL.

```dvc
$ dvc remote modify myremote url "gdrive://root/my-dvc-root"
```

- `gdrive_client_id` - Google Project's OAuth 2.0 client id.

```dvc
$ dvc remote modify myremote gdrive_client_id my_gdrive_client_id
```

- `gdrive_client_secret` - Google Project's OAuth 2.0 client secret.

```dvc
$ dvc remote modify myremote gdrive_client_secret gdrive_client_secret
```

</details>

<details>

### Click for Google Cloud Storage available options
Expand Down
7 changes: 4 additions & 3 deletions static/docs/get-started/configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,16 @@ DVC currently supports seven types of remotes:
- `s3`: Amazon Simple Storage Service
- `gs`: Google Cloud Storage
- `azure`: Azure Blob Storage
- `gdrive` : Google Drive
- `ssh`: Secure Shell
- `hdfs`: Hadoop Distributed File System
- `http`: HTTP and HTTPS protocols

> If you installed DVC via `pip`, depending on the remote type you plan to use
> you might need to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`,
> `[azure]`, and `[oss]`; or `[all]` to include them all. The command should
> look like this: `pip install "dvc[s3]"`. This installs `boto3` library along
> with DVC to support Amazon S3 storage.
> `[azure]`, `[gdrive]`, and `[oss]`; or `[all]` to include them all. The
> command should look like this: `pip install "dvc[s3]"`. This installs `boto3`
> library along with DVC to support Amazon S3 storage.

For example, to setup an S3 remote we would use something like this (make sure
that `mybucket` exists):
Expand Down
2 changes: 1 addition & 1 deletion static/docs/install/linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ $ pip install dvc

Depending on the type of the [remote storage](/doc/command-reference/remote) you
plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`,
`[gs]`, `[azure]`, and `[oss]`. Use `[all]` to include them all.
`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all.

<details>

Expand Down
2 changes: 1 addition & 1 deletion static/docs/install/macos.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ $ pip install dvc

Depending on the type of the [remote storage](/doc/command-reference/remote) you
plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`,
`[gs]`, `[azure]`, and `[oss]`. Use `[all]` to include them all.
`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all.

<details>

Expand Down
2 changes: 1 addition & 1 deletion static/docs/install/windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ $ pip install dvc

Depending on the type of the [remote storage](/doc/command-reference/remote) you
plan to use, you might need to install optional dependencies: `[s3]`, `[ssh]`,
`[gs]`, `[azure]`, and `[oss]`. Use `[all]` to include them all.
`[gs]`, `[azure]`, `[gdrive]`, and `[oss]`. Use `[all]` to include them all.

<details>

Expand Down
4 changes: 2 additions & 2 deletions static/docs/understanding-dvc/core-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@
- It's **Open-source** and **Self-serve**: DVC is free and doesn't require any
additional services.

- DVC supports cloud storage (Amazon S3, Azure Blob Storage, and Google Cloud
Storage) for **data sources and pre-trained model sharing**.
- DVC supports cloud storage (Amazon S3, Azure Blob Storage, Google Drive, and
Google Cloud Storage) for **data sources and pre-trained model sharing**.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion static/docs/understanding-dvc/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
```

- The cache of a DVC project can be shared with colleagues through Amazon S3,
Azure Blob Storage, and Google Cloud Storage, among others:
Azure Blob Storage, Google Drive, and Google Cloud Storage, among others:

```dvc
$ git push
Expand Down
8 changes: 4 additions & 4 deletions static/docs/use-cases/sharing-data-and-model-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ easy to consistently get all your data files and directories into any machine,
along with matching source code. All you need to do is to setup
[remote storage](/doc/command-reference/remote) for your <abbr>DVC
project</abbr>, and push the data there, so others can reach it. Currently DVC
supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, SSH,
HDFS, and other remote locations, and the list is constantly growing. (For a
complete list and configuration instructions, take a look at the examples in
`dvc remote add`.)
supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, Google
Drive, SSH, HDFS, and other remote locations, and the list is constantly
growing. (For a complete list and configuration instructions, take a look at the
examples in `dvc remote add`.)

![](/static/img/model-sharing-digram.png)

Expand Down
3 changes: 2 additions & 1 deletion static/docs/use-cases/versioning-data-and-model-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ In this basic scenario, DVC is a better replacement for `git-lfs` (see
ad-hoc scripts on top of Amazon S3 (or any other cloud) used to manage ML
<abbr>data artifacts</abbr> like raw data, models, etc. Unlike `git-lfs`, DVC
doesn't require installing a dedicated server; It can be used on-premises (NAS,
SSH, for example) or with any major cloud provider (S3, Google Cloud, Azure).
SSH, for example) or with any major cloud provider (S3, Google Cloud, Azure,
Google Drive).

Let's say you already have a Git repository that uses a bunch of images stored
in the `images/` directory and has a `model.pkl` file – a model file deployed to
Expand Down
21 changes: 21 additions & 0 deletions static/docs/user-guide/contributing/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ Install requirements for whatever remotes you are going to test:
$ pip install -e ".[s3]"
$ pip install -e ".[gs]"
$ pip install -e ".[azure]"
$ pip install -e ".[gdrive]"
$ pip install -e ".[ssh]"
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
# or
$ pip install -e ".[all]"
Expand Down Expand Up @@ -250,6 +251,26 @@ $ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountN

<details>

### Click for Google Drive testing instructions

❗Do not share Google Drive access token with anyone to avoid unauthorized usage
of your Google Drive.
Comment on lines +256 to +257
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuation of #833 (review)

But is it DVC's responsibility to alert about G Drive security? If not, maybe just a note (md quote starting with >) will suffice after all @shcheklein?

Regardless, please use this text:

Please remember that Google Drive access tokens are personal credentials and should not be shared with anyone, otherwise risking unauthorized usage of the Google Drive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have this for GD - it not a common knowledge with all these tokens involved into GD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quotes with > are not visible. they defeat the purpose to some extent for notes like this. I'm fine with > though. I don't like when we mix > with some Note! or something -they look not very clean to me.

But tbh - I'm fine with all of these option. Jorge, I think we'll let you decide on this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree notes with > are not visible and perhaps we shouldn't mix with bold "Note!". That's kind of contradictory! Opened #848 for this.

in this case though, while I like having the note, I do see it as a side-note that could be completely omitted and not something that DVC can really control. So for that reason and since we don't really have emojis ATM, I'm going to take it back to a > note...

Should we open an issue to apply some basic emoji symbols throughout with some consistency though? This could be a nice improvement to our docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d9ab97f


To avoid tests flow interruption by manual login, do authorization once and
backup obtained Google Drive access token which is stored by default under
`.dvc/tmp/gdrive-user-credentials.json`. Restore `gdrive-user-credentials.json`
from backup for any new DVC repo setup to avoid manual login.

Or add this to your env (use encryption for CI setup):

```dvc
$ export GDRIVE_USER_CREDENTIALS_DATA='CONTENT_of_gdrive-user-credentials.json'
```

</details>

<details>

### Click for HDFS testing instructions

Tests currently only work on Linux. First you need to set up passwordless ssh
Expand Down