Skip to content

adrianwit/dstransfer

Repository files navigation

dstransfer - simple cross datastore ETL SQL based transfer

Motivation

Traditionally transferring data between different vendor data sources involved data export to text file i.e csv, followed by importing it to destination database. While this may work for the most scenarios, representing null values and converting incompatible data types like DATE/TIMESTAMP could be challenging.

This project provides a simple SQL based alternative addressing these concerns. On top of that it copies data between arbitrary database/datastore (RDBMS/NoSQL) in a way that is both memory and writes optimized. While the first streamlining is achieved with using compacted slices as opposed to generic slice of a map, the latter uses batch insert and concurrent writers.

Installation

With docker

  • Building app
 cd /tmp/ && git clone https://github.com/adrianwit/dstransfer 
cd dstransfer
docker build -t adrianwit/dstransfer:0.1.0 . 
  • Starting app
cd /tmp/dstransfer/config/ && docker-compose up  -d  

Standalone

  • Building app

Prerequisites: go 1.11+

go install github.com/adrianwit/dstransfer/dstransfer
  • Starting app
export GOPATH=~/go/bin/
$GOPATH/bin/dstransfer -port=8080

Cloud functions

 cd /tmp/ && git clone https://github.com/adrianwit/dstransfer
 cd dstransfer/gcp
 go mod vendor
 gcloud functions deploy Transfer --entry-point TransferFn --runtime go111 --trigger-http
 # OR 
 endly -r=deploy 

Usage

 curl  --header "Content-type: text/json" -d "@transfer.json" -X POST http://localhost:8080/v1/api/transfer
 
 curl http://127.0.0.1:8080/v1/api/tasks
 
 while :; do clear; curl http://127.0.0.1:8080/v1/api/tasks; sleep 2; done

@transfer.json

{

  "BatchSize": 256,
  "WriterThreads": 4,
  "Mode": "insert",

  "Source": {
    "Credentials": "source_mysql",
    "Descriptor": "[username]:[password]@tcp(xxxxx:3306)/[dbname]?parseTime=true",
    "DriverName": "mysql",
    "Parameters": {
      "dbname": "db1"
    },
    "Query": "SELECT * FROM source_table"
  },


  "Dest": {
    "Credentials": "bq",
    "DriverName": "bigquery",
    "Parameters": {
      "datasetId": "db2"
    },
    "Table": "target_table"
  }

}

Credentials

Credential are stored in ~/.secret/CREDENTIAL_NAME.json using toolobx/cred/config.go format.

For example:

@source_mysql

{"Username":"root","Password":"dev"}

To generate encrypted credentials download and install the latest endly and run the following

endly -c=source_mysql

For BigQuery: use service account generated JSON key type credentials.

Supported datastores:

Already imported drivers:

  • mysql
  • postgresql
  • aerospike
  • bigquery
  • mongodb
  • casandra
  • dynamodb
  • firebase
  • firestore

Supported but not imported drivers (CGO dependency)

  • oracle
  • vertica (via odbc)

Transfer mode

  • insert use only INSERT INTO statement (suitable as append)
  • persist determine which record needs to be updated or inserted(slower option)