diff --git a/README.md b/README.md
index 194956f..c9cd445 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
[![Build Status](https://travis-ci.org/tumugi/tumugi-plugin-bigquery.svg?branch=master)](https://travis-ci.org/tumugi/tumugi-plugin-bigquery) [![Code Climate](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery/badges/gpa.svg)](https://codeclimate.com/github/tumugi/tumugi-plugin-bigquery) [![Coverage Status](https://coveralls.io/repos/github/tumugi/tumugi-plugin-bigquery/badge.svg?branch=master)](https://coveralls.io/github/tumugi/tumugi-plugin-bigquery) [![Gem Version](https://badge.fury.io/rb/tumugi-plugin-bigquery.svg)](https://badge.fury.io/rb/tumugi-plugin-bigquery)
-# tumugi-plugin-bigquery
+# Google BigQuery plugin for [tumugi](https://github.com/tumugi/tumugi)
-tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [Tumugi](https://github.com/tumugi/tumugi).
+tumugi-plugin-bigquery is a plugin for integrate [Google BigQuery](https://cloud.google.com/bigquery/) and [tumugi](https://github.com/tumugi/tumugi).
## Installation
@@ -12,17 +12,7 @@ Add this line to your application's Gemfile:
gem 'tumugi-plugin-bigquery'
```
-And then execute:
-
-```sh
-$ bundle
-```
-
-Or install it yourself as:
-
-```sb
-$ gem install tumugi-plugin-bigquery
-```
+And then execute `bundle install`.
## Target
@@ -30,21 +20,65 @@ $ gem install tumugi-plugin-bigquery
`Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery dataset.
+#### Parameters
+
+| Name | type | required? | default | description |
+|------------|--------|-----------|---------|------------------------------------------------------------------|
+| dataset_id | string | required | | Dataset ID |
+| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
+
+#### Examples
+
+```rb
+task :task1 do
+ output target(:bigquery_dataset, dataset_id: "your_dataset_id")
+end
+```
+
+```rb
+task :task1 do
+ output target(:bigquery_dataset, project_id: "project_id", dataset_id: "dataset_id")
+end
+```
+
#### Tumugi::Plugin::BigqueryTableTarget
`Tumugi::Plugin::BigqueryDatasetTarget` is target for BigQuery table.
+#### Parameters
+
+| name | type | required? | default | description |
+|------------|--------|-----------|---------|------------------------------------------------------------------|
+| table_id | string | required | | Table ID |
+| dataset_id | string | required | | Dataset ID |
+| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
+
+#### Examples
+
+```rb
+task :task1 do
+ output target(:bigquery_table, table_id: "table_id", dataset_id: "your_dataset_id")
+end
+```
+
## Task
### Tumugi::Plugin::BigqueryDatasetTask
`Tumugi::Plugin::BigqueryDatasetTask` is task to create a dataset.
-#### Usage
+#### Parameters
+
+| name | type | required? | default | description |
+|------------|--------|-----------|---------|------------------------------------------------------------------|
+| dataset_id | string | required | | Dataset ID |
+| project_id | string | optional | | [Project](https://cloud.google.com/compute/docs/projects) ID |
+
+#### Examples
```rb
task :task1, type: :bigquery_dataset do
- param_set :dataset_id, 'test'
+ dataset_id 'test'
end
```
@@ -52,28 +86,41 @@ end
`Tumugi::Plugin::BigqueryQueryTask` is task to run `query` and save the result into the table which specified by parameter.
-#### Usage
+#### Parameters
+
+| name | type | required? | default | description |
+|-----------------|---------|-----------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
+| query | string | required | | query to execute |
+| table_id | string | required | | destination table ID |
+| dataset_id | string | required | | destination dataset ID |
+| project_id | string | optional | | destination project ID |
+| mode | string | optional | "truncate" | specifies the action that occurs if the destination table already exists. [see](#parameters_mode) |
+| flatten_results | boolean | optional | true | when you query nested data, BigQuery automatically flattens the table data or not. [see](https://cloud.google.com/bigquery/docs/data#flatten) |
+| use_legacy_sql | bool | optional | true | use legacy SQL syntanx for BigQuery or not |
+| wait | integer | optional | 60 | wait time (seconds) for query execution |
+
+#### Examples
##### truncate mode (default)
```rb
task :task1, type: :bigquery_query do
- param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
- param_set :dataset_id, 'test'
- param_set :table_id, "dest_table#{Time.now.to_i}"
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+ table_id "dest_table#{Time.now.to_i}"
+ dataset_id "test"
end
```
##### append mode
-If you set `mode` to `'append'`, query result append to existing table.
+If you set `mode` to `append`, query result append to existing table.
```rb
task :task1, type: :bigquery_query do
- param_set :query, "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
- param_set :dataset_id, 'test'
- param_set :table_id, "dest_table#{Time.now.to_i}"
- param_set :mode, 'append'
+ query "SELECT COUNT(*) AS cnt FROM [bigquery-public-data:samples.wikipedia]"
+ table_id "dest_table#{Time.now.to_i}"
+ dataset_id "test"
+ mode "append"
end
```
@@ -81,7 +128,20 @@ end
`Tumugi::Plugin::BigqueryCopyTask` is task to copy table which specified by parameter.
-#### Usage
+#### Parameters
+
+| name | type | required? | default | description |
+|-----------------|--------|-----------|---------|---------------------------------------------------------|
+| src_table_id | string | required | | source table ID |
+| src_dataset_id | string | required | | source dataset ID |
+| src_project_id | string | optional | | source project ID |
+| dest_table_id | string | required | | destination table ID |
+| dest_dataset_id | string | required | | destination dataset ID |
+| dest_project_id | string | optional | | destination project ID |
+| force_copy | bool | optional | false | force copy when destination table already exists or not |
+| wait | integer| optional | 60 | wait time (seconds) for query execution |
+
+#### Examples
Copy `test.src_table` to `test.dest_table`.
@@ -89,24 +149,25 @@ Copy `test.src_table` to `test.dest_table`.
```rb
task :task1, type: :bigquery_copy do
- param_set :src_dataset_id, 'test'
- param_set :src_table_id, 'src_table'
- param_set :dest_dataset_id, 'test'
- param_set :dest_table_id, 'dest_table'
+ src_table_id "src_table"
+ src_dataset_id "test"
+ dest_table_id "dest_table"
+ dest_dataset_id "test"
end
```
##### force_copy
-If `force_copy` is `true`, copy operation always execute even if target table is existed. Data of target table is truncate.
+If `force_copy` is `true`, copy operation always execute even if destination table exists.
+This means data of destination table data is deleted, so be carefull to enable this parameter.
```rb
task :task1, type: :bigquery_copy do
- param_set :src_dataset_id, 'test'
- param_set :src_table_id, 'src_table'
- param_set :dest_dataset_id, 'test'
- param_set :dest_table_id, 'dest_table'
- param_set :force_copy, true
+ src_table_id "src_table"
+ src_dataset_id "test"
+ dest_table_id "dest_table"
+ dest_dataset_id "test"
+ force_copy true
end
```
@@ -114,25 +175,98 @@ end
`Tumugi::Plugin::BigqueryLoadTask` is task to load structured data from GCS into BigQuery.
-#### Usage
+#### Parameters
+
+| name | type | required? | default | description |
+|-----------------------|-----------------|------------------------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
+| bucket | string | required | | source GCS bucket name |
+| key | string | required | | source path of file like "/path/to/file.csv" |
+| table_id | string | required | | destination table ID |
+| dataset_id | string | required | | destination dataset ID |
+| project_id | string | optional | | destination project ID |
+| schema | array of object | required when mode is not "append" | | see [schema format](#parameters_schema) |
+| mode | string | optional | "append" | specifies the action that occurs if the destination table already exists. [see](#parameters_mode) |
+| source_format | string | optional | "CSV" | source file format. [see](#parameters_format) |
+| ignore_unknown_values | bool | optional | false | indicates if BigQuery should allow extra values that are not represented in the table schema |
+| max_bad_records | integer | optional | 0 | maximum number of bad records that BigQuery can ignore when running the job |
+| field_delimiter | string | optional | "," | separator for fields in a CSV file. used only when source_format is "CSV" |
+| allow_jagged_rows | bool | optional | false | accept rows that are missing trailing optional columns. The missing values are treated as null. used only when source_format is "CSV" |
+| allow_quoted_newlines | bool | optional | false | indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. used only when source_format is "CSV" |
+| quote | string | optional | "\"" (double-quote) | value that is used to quote data sections in a CSV file. used only when source_format is "CSV" |
+| skip_leading_rows | integer | optional | 0 | .number of rows at the top of a CSV file that BigQuery will skip when loading the data. used only when source_format is "CSV" |
+| wait | integer | optional | 60 | wait time (seconds) for query execution |
+#### Example
Load `gs://test_bucket/load_data.csv` into `dest_project:dest_dataset.dest_table`
```rb
task :task1, type: :bigquery_load do
- param_set :bucket, 'test_bucket'
- param_set :key, 'load_data.csv'
- param_set :project_id, 'dest_project'
- param_set :datset_id, 'dest_dataset'
- param_set :table_id, 'dest_table'
+ bucket "test_bucket"
+ key "load_data.csv"
+ table_id "dest_table"
+ datset_id "dest_dataset"
+ project_id "dest_project"
end
```
-### Config Section
+## Common parameter value
+
+
+### mode
+
+| value | description |
+|----------|-------------|
+| truncate | If the table already exists, BigQuery overwrites the table data. |
+| append | If the table already exists, BigQuery appends the data to the table. |
+| empty | If the table already exists and contains data, a 'duplicate' error is returned in the job result. |
+
+
+### format
+
+| value | description |
+|------------------------|--------------------------------------------|
+| CSV | CSV |
+| NEWLINE_DELIMITED_JSON | Each line is JSON + new line |
+| AVRO | [see](https://avro.apache.org/docs/1.2.0/) |
+
+
+### schema
+
+Format of `schema` parameter is array of nested object like below:
+
+```js
+[
+ {
+ "name": "column1",
+ "type": "string"
+ },
+ {
+ "name": "column2",
+ "type": "integer",
+ "mode": "repeated"
+ },
+ {
+ "name": "record1",
+ "type": "record",
+ "fields": [
+ {
+ "name": "key1",
+ "type": "integer",
+ },
+ {
+ "name": "key2",
+ "type": "integer"
+ }
+ ]
+ }
+]
+```
+
+## Config Section
tumugi-plugin-bigquery provide config section named "bigquery" which can specified BigQuery autenticaion info.
-#### Authenticate by client_email and private_key
+### Authenticate by client_email and private_key
```rb
Tumugi.configure do |config|
@@ -144,7 +278,7 @@ Tumugi.configure do |config|
end
```
-#### Authenticate by JSON key file
+### Authenticate by JSON key file
```rb
Tumugi.configure do |config|