Skip to content

Latest commit

 

History

History
55 lines (34 loc) · 3.38 KB

README.md

File metadata and controls

55 lines (34 loc) · 3.38 KB

spanner-loader

This directory contains a python script that can be used to import data into Cloud Spanner. The script reads a gzipped csv file from a Google Cloud Storage bucket and a local schema file, and then inserts the data into a specified Spanner table in batches.

Table of Contents

  1. Create a Cloud Spanner Table
  2. Create a schema for your Spanner Table
  3. Create a Service Account
  4. Usage

1. Create a Cloud Spanner Table

Follow the steps on the Spanner Quickstart to create your spanner instance, database and table.

2. Create a schema for your Spanner Table

Use the sample.schema to define the schema for the table that you are going to load. Use a colon ( : ) to specify the data type for each of your columns and a comma ( , ) to separate each of the columns on your table.

For example for a table with two STRING columns, named one and two, this would be the corresponding schema.

one:STRING,two:STRING

3. Create a Service Account (optional)

Note: This step is not required in the event that you have configured appropriate account and project configuration using the gcloud SDK, or are running the tool from a GCE instance within the target project with a service account that has appropriate permissions for the Spanner instance being targeted. In these cases, the tool will pick-up the configuration from the environment automatically.

Optionally, create a service account to be used by the spanner client library for authentication against your Spanner instance.

Follow the steps described in Creating a Service Account to create a Service Account for this purpose. Once you have created your service account follow the steps described in Creating a Service Account Key to create a key for the service account you just created and finally follow these steps to grant permissions to the service account.

Make sure to use a role with read and write access to Spanner, like Cloud Spanner Database User for example. You can have more information on the Cloud Spanner Roles here.

4. Usage

Note: this tool requires Python 3

Install the requirements for the python script by executing the following command:

pip3 install -r requirements.txt

Execute the spanner-loader python script with the required arguments.

python spanner-loader.py --instance_id=[Your Cloud Spanner instance ID] --database_id=[Your Cloud Spanner database ID] --table_id=[Your table name] --batchsize=[The number of rows to insert in a batch] --bucket_name=[The name of the bucket for the source file] --file_name=[The csv input data file] --schema_file=[The format file describing the input data file]

Optional parameters:

--delimiter=[The delimiter used between columns in source file]
--project_id=[Your Google Cloud Project id]
--path_to_credentials=[Path to the json file with the credentials]