Singer tap that extracts data from a MySQL database and produces JSON-formatted data following the Singer spec.
$ mkvirtualenv -p python3 tap-mysql
$ pip install tap-mysql
$ tap-mysql --config config.json --discover
$ tap-mysql --config config.json --properties properties.json --state state.json
This section dives into basic usage of tap-mysql
by walking through extracting
data from a table. It assumes that you can connect to and read from a MySQL
database.
$ mkvirtualenv -p python3 tap-mysql
$ pip install tap-mysql
or
$ git clone [email protected]:singer-io/tap-mysql.git
$ cd tap-mysql
$ mkvirtualenv -p python3 tap-mysql
$ python install .
There's some important business data siloed in this MySQL database -- we need to extract it. Here's the table we'd like to sync:
mysql> select * from example_db.animals;
+----|----------|----------------------+
| id | name | likes_getting_petted |
+----|----------|----------------------+
| 1 | aardvark | 0 |
| 2 | bear | 0 |
| 3 | cow | 1 |
+----|----------|----------------------+
3 rows in set (0.00 sec)
Create a config file containing the database connection credentials, e.g.:
{
"host": "localhost",
"port": "3306",
"user": "root",
"password": "password"
}
These are the same basic configuration properties used by the MySQL command-line
client (mysql
).
The tap can be invoked in discovery mode to find the available tables and columns in the database:
$ tap-mysql --config config.json --discover
A discovered catalog is output, with a JSON-schema description of each table. A source table directly corresponds to a Singer stream.
{
"streams": [
{
"key_properties": [
"id"
],
"database_name": "example_db",
"is_view": false,
"tap_stream_id": "example_db-animals",
"row_count": 3,
"schema": {
"properties": {
"likes_getting_petted": {
"type": [
"null",
"boolean"
],
"inclusion": "available"
},
"name": {
"maxLength": 255,
"type": [
"null",
"string"
],
"inclusion": "available"
},
"id": {
"type": [
"null",
"integer"
],
"maximum": 2147483647,
"minimum": -2147483648,
"inclusion": "automatic"
}
},
"type": "object"
},
"table_name": "animals",
"metadata": [
{
"metadata": {
"selected-by-default": true,
"sql-datatype": "int(11)"
},
"breadcrumb": [
"properties",
"id"
]
},
{
"metadata": {
"selected-by-default": false
},
"breadcrumb": []
},
{
"metadata": {
"selected-by-default": true,
"sql-datatype": "varchar(255)"
},
"breadcrumb": [
"properties",
"name"
]
},
{
"metadata": {
"selected-by-default": true,
"sql-datatype": "tinyint(1)"
},
"breadcrumb": [
"properties",
"likes_getting_petted"
]
}
],
"stream": "animals"
}
]
}
In sync mode, tap-mysql
consumes a modified version of the catalog where
tables and fields have been marked as selected.
Redirect output from the tap's discovery mode to a file so that it can be modified:
$ tap-mysql -c config.json --discover > properties.json
Then edit properties.json
to make selections. In this example we want the
animals
table. The stream's schema gets a top-level selected
flag, as does
its columns' schemas:
{
"selected": "true",
"properties": {
"likes_getting_petted": {
"selected": "true",
"inclusion": "available",
"type": [
"null",
"boolean"
]
},
"name": {
"selected": "true",
"maxLength": 255,
"inclusion": "available",
"type": [
"null",
"string"
]
},
"id": {
"selected": "true",
"minimum": -2147483648,
"inclusion": "automatic",
"maximum": 2147483647,
"type": [
"null",
"integer"
]
}
},
"type": "object"
}
With an annotated properties catalog, the tap can be invoked in sync mode:
$ tap-mysql -c config.json --properties properties.json
Messages are written to standard output following the Singer specification. The resultant stream of JSON data can be consumed by a Singer target.
{"value": {"currently_syncing": "example_db-animals"}, "type": "STATE"}
{"key_properties": ["id"], "stream": "animals", "schema": {"properties": {"name": {"inclusion": "available", "maxLength": 255, "type": ["null", "string"]}, "likes_getting_petted": {"inclusion": "available", "type": ["null", "boolean"]}, "id": {"inclusion": "automatic", "minimum": -2147483648, "type": ["null", "integer"], "maximum": 2147483647}}, "type": "object"}, "type": "SCHEMA"}
{"stream": "animals", "version": 1509133344771, "type": "ACTIVATE_VERSION"}
{"record": {"name": "aardvark", "likes_getting_petted": false, "id": 1}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}
{"record": {"name": "bear", "likes_getting_petted": false, "id": 2}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}
{"record": {"name": "cow", "likes_getting_petted": true, "id": 3}, "stream": "animals", "version": 1509133344771, "type": "RECORD"}
{"stream": "animals", "version": 1509133344771, "type": "ACTIVATE_VERSION"}
{"value": {"currently_syncing": "example_db-animals", "bookmarks": {"example_db-animals": {"version": null}}}, "type": "STATE"}
{"value": {"currently_syncing": null, "bookmarks": {"example_db-animals": {"version": null}}}, "type": "STATE"}
In the above example, we invoked tap-mysql
without providing a state file
and without specifying a replication method. The two ways to replicate a given
table are FULL_TABLE
and INCREMENTAL
. FULL_TABLE
replication is used by
default.
Full-table replication extracts all data from the source table each time the tap is invoked.
Incremental replication works in conjunction with a state file to only extract new records each time the tap is invoked.
Let's sync the animals
table again, but this time using incremental
replication. The replication method and replication key are set in the
properties file:
{
"streams": [
{
"replication_method": "INCREMENTAL",
"replication_key": "id",
"key_properties": [
"id"
],
"database_name": "example_db",
"is_view": false,
"tap_stream_id": "example_db-animals",
"row_count": 3,
"schema": {
"selected": "true",
"properties": {
"likes_getting_petted": {
"selected": "true",
"type": [
"null",
"boolean"
],
"inclusion": "available"
},
"name": {
"selected": "true",
"maxLength": 255,
"type": [
"null",
"string"
],
"inclusion": "available"
},
"id": {
"selected": "true",
"type": [
"null",
"integer"
],
"maximum": 2147483647,
"minimum": -2147483648,
"inclusion": "automatic"
}
},
"type": "object"
},
"table_name": "animals",
"metadata": [...],
"stream": "animals"
}
]
}
We have no meaningful state so far, so just invoke the tap in sync mode again without a state file:
$ tap-mysql -c config.json --properties properties.json
The output messages look very similar to when the table was replicated using the
default FULL_TABLE
replication method. One important difference is that the
STATE
messages now contain a replication_key_value
-- a bookmark or
high-water mark -- for data that was extracted:
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id"}}, "currently_syncing": "example_db-animals"}}
{"stream": "animals", "type": "SCHEMA", "schema": {"type": "object", "properties": {"id": {"type": ["null", "integer"], "minimum": -2147483648, "maximum": 2147483647, "inclusion": "automatic"}, "name": {"type": ["null", "string"], "inclusion": "available", "maxLength": 255}, "likes_getting_petted": {"type": ["null", "boolean"], "inclusion": "available"}}}, "key_properties": ["id"]}
{"stream": "animals", "type": "ACTIVATE_VERSION", "version": 1509135204169}
{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 1, "name": "aardvark", "likes_getting_petted": false}}
{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 2, "name": "bear", "likes_getting_petted": false}}
{"stream": "animals", "type": "RECORD", "version": 1509135204169, "record": {"id": 3, "name": "cow", "likes_getting_petted": true}}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"version": 1509135204169, "replication_key_value": 3, "replication_key": "id"}}, "currently_syncing": "example_db-animals"}}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"version": 1509135204169, "replication_key_value": 3, "replication_key": "id"}}, "currently_syncing": null}}
Note that the final STATE
message has a replication_key_value
of 3
,
reflecting that the extraction ended on a record that had an id
of 3
.
Subsequent invocations of the tap will pick up from this bookmark.
Normally, the target will echo the last STATE
after it's finished processing
data. For this example, let's manually write a state.json
file using the
STATE
message:
{
"bookmarks": {
"example_db-animals": {
"version": 1509135204169,
"replication_key_value": 3,
"replication_key": "id"
}
},
"currently_syncing": null
}
Let's add some more animals to our farm:
mysql> insert into animals (name, likes_getting_petted) values ('dog', true), ('elephant', true), ('frog', false);
$ tap-mysql -c config.json --properties properties.json --state state.json
This invocation extracts any data since (and including) the
replication_key_value
:
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 3}}, "currently_syncing": "example_db-animals"}}
{"key_properties": ["id"], "schema": {"properties": {"name": {"maxLength": 255, "inclusion": "available", "type": ["null", "string"]}, "id": {"maximum": 2147483647, "minimum": -2147483648, "inclusion": "automatic", "type": ["null", "integer"]}, "likes_getting_petted": {"inclusion": "available", "type": ["null", "boolean"]}}, "type": "object"}, "type": "SCHEMA", "stream": "animals"}
{"type": "ACTIVATE_VERSION", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "cow", "id": 3, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "dog", "id": 4, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "elephant", "id": 5, "likes_getting_petted": true}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"record": {"name": "frog", "id": 6, "likes_getting_petted": false}, "type": "RECORD", "version": 1509135204169, "stream": "animals"}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 6}}, "currently_syncing": "example_db-animals"}}
{"type": "STATE", "value": {"bookmarks": {"example_db-animals": {"replication_key": "id", "version": 1509135204169, "replication_key_value": 6}}, "currently_syncing": null}}
Copyright © 2017 Stitch