Skip to content

Commit

Permalink
Initial MySQL support (#31)
Browse files Browse the repository at this point in the history
  • Loading branch information
vinceatbluelabs authored Apr 21, 2020
1 parent 421b3d0 commit 98638e1
Show file tree
Hide file tree
Showing 81 changed files with 2,579 additions and 312 deletions.
73 changes: 63 additions & 10 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,29 @@ commands:
key: deps-v1-<<parameters.python_version>>-<<parameters.pandas_version>>-<<parameters.extras>>-{{ .Branch }}-{{ checksum "requirements.txt" }}-{{ checksum "setup.py" }}
paths:
- "venv"
wait_for_db:
description: "Pause until database answers allowing time to startup. Abort if startup got hung in CircleCI."
parameters:
db_name:
type: string
connect_command:
type: string
steps:
- run:
name: Waiting for <<parameters.db_name>>
command: |
# Bail out trying after 30 seconds
end=$((SECONDS+30))
echo "Starting at second ${SECONDS:?} - ending at ${end:?}"
db_connect() {
<<parameters.connect_command>>
}
while ! db_connect && [[ "${SECONDS:?}" -lt "${end:?}" ]]
do
echo "Waiting for <<parameters.db_name>>..."
sleep 5
done
db_connect
jobs:
test:
Expand Down Expand Up @@ -136,6 +159,17 @@ jobs:
- image: postgres:latest
environment:
POSTGRES_PASSWORD: 'hunter2'
# MySQL after 5 (they bumped version to 8) uses a new auth protocol
# that is not well supported by clients - including the
# Debian-installable client packages.
#
# https://mysqlserverteam.com/mysql-8-0-4-new-default-authentication-plugin-caching_sha2_password/
- image: mysql:5
environment:
MYSQL_ROOT_PASSWORD: 'hunter2root'
MYSQL_DATABASE: 'mysqlitest'
MYSQL_USER: mysqluser
MYSQL_PASSWORD: 'hunter2'
steps:
- checkout
- add_ssh_keys:
Expand Down Expand Up @@ -175,17 +209,18 @@ jobs:
command: |
sudo apt-get update -y && sudo apt-get install -y postgresql-client
- run:
name: Wait for vertica
name: Install mysql
command: |
# Bail out trying after 30 seconds
end=$((SECONDS+30))
echo "Starting at second ${SECONDS:?} - ending at ${end:?}"
while ! vsql -h 127.0.0.1 -U dbadmin -c 'select 1;' && [[ "${SECONDS:?}" -lt "${end:?}" ]]
do
echo "Waiting for vertica..."
sleep 5
done
vsql -h 127.0.0.1 -U dbadmin -c 'select 1;'
sudo apt-get update -y && sudo apt-get install -y default-mysql-client
- wait_for_db:
db_name: Vertica
connect_command: vsql -h 127.0.0.1 -U dbadmin -c 'select 1;'
- wait_for_db:
db_name: MySQL
connect_command: echo 'select 1;' | mysql --password=hunter2 --host=127.0.0.1 -u mysqluser mysqlitest
- wait_for_db:
db_name: Postgres
connect_command: psql -h 127.0.0.1 -U postgres -c 'select 1;'
- run:
name: Run tests
command: "<<parameters.command>>"
Expand Down Expand Up @@ -363,6 +398,23 @@ workflows:
filters:
tags:
only: /v\d+\.\d+\.\d+(-[\w]+)?/
- integration_test_with_dbs:
name: mysql-itest
extras: '[mysql,itest]'
python_version: "3.6"
command: |
. venv/bin/activate
export PATH=${PATH}:${PWD}/tests/integration/bin:/opt/vertica/bin
export DB_FACTS_PATH=${PWD}/tests/integration/circleci-dbfacts.yml
export RECORDS_MOVER_SESSION_TYPE=env
mkdir -p test-reports/itest
cd tests/integration/records/single_db
with-db dockerized-mysql nosetests --with-xunit --xunit-file=../../../../test-reports/itest/junit.xml .
requires:
- redshift-itest
filters:
tags:
only: /v\d+\.\d+\.\d+(-[\w]+)?/
- integration_test_with_dbs:
name: vertica-s3-itest
extras: '[vertica,aws,itest]'
Expand Down Expand Up @@ -502,6 +554,7 @@ workflows:
- redshift-itest-old-pandas
- redshift-itest-no-pandas
- postgres-itest
- mysql-itest
- cli-1-itest
- cli-2-itest
- cli-3-itest
Expand Down
156 changes: 156 additions & 0 deletions DRIVERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Database drivers

Adding a database driver to records_mover can be divided up into three steps:

* Add integration testing and subclass DBDriver to get tests to pass
* Add code for native bulk import support
* Add code for native bulk export support

Here are the basic things you'll need to do to get through the
process. Every database is different, and this document only gets
updated periodically, so you may run into additional things you need
to do or figure out. If they seem like things people will hit in the
future, add them to this document!

## Basic support

1. Create a feature branch
2. Get a test database set up
* Modify `tests/integration/docker-compose.yml` to include a Docker
image for your new database. If your database can't be run in a
Docker image, you'll need some way to get to a database that can
be used during integration testing. Be sure to add a link from
the 'records_mover' container to your new container.
* Run `./itest-dc up -d` to bring up the docker-compose environment.
* Run `./itest-dc start` to start the docker-compose environment.
* Watch logs with `./itest-dc logs -f`
* Fix any issues and repeat until it is successful and the logs look right.
2. Set up the `itest` script to be able to test your new database.
* Modify the `local_dockerized_dbfacts` function in `./itest` to
point to the new database.
* Create a `wait-for-${your-new-db-type:?}.sh` script matching
`wait-for-postgres.sh`.
* Modify `tests/integration/inside-docker-dbfacts.yml` to include an
entry for your new database.
* Modify `tests/integration/bin/db-connect` to handle your new
database type if needed.
* Modify `Dockerfile` to add any new client binaries needed for your
database and run `./itest --docker build` to build the new image.
* Run `./itest shell`, which will start the docker-compose and start
a shell with the db-facts you just created set.
* Run `db ${your-new-db-name:?}` within that shell and verify it
connects.
* Exit out of the `./itest shell` session.
* Run `./itest ${your-new-db-type:?}` and verify it doesn't
recognize the argument.
* Search down for instances of 'postgres' in the `itest` script and
come up with the equivalent for your new database.
* Run `./itest ${your-new-db-type:?}` again and verify thigns fail
somewhere else (e.g., a Python package not being installed or a
test failing most likely)
* Push up your changes to the feature branch.
2. Now work to get the same failure out of CircleCI:
* Replicate the current `postgres_itest` in `.circleci/config.yml`,
including matching all of the references to it.
* Be sure to change the `with-db dockerized-postgres` line to refer
to your database type.
* Push up changes and verify that tests fail because your new
database "is not a valid DB name".
* Note that you can (temporarily!) allow your new integration test
to run without waiting for unit and Redshift tests to run by
commenting out the dependency like this - just be sure to leave an
annotation comment reminding you to fix it before the PR is
merged!
```yaml
# requires: # T ODO restore this
# - redshift-itest
```
* Modify the `integration_test_with_dbs` job to include a Docker
image for your new database, similar to `docker-compose.yml`
above.
* Modify `tests/integration/circleci-dbfacts.yml` to point to your
new integration test database account, whether in Docker or
cloud-hosted.
* Iterate on the errors until you get the same errors you got in
your `./itest` runs.
3. Fix these "singledb" tests! Now that you have tests running (and
failing), you can address the problems one by one. Here are things
you are likely to need to do--I'd suggest waiting for the problem
to come up via the test and then applying the fix until the tests
pass. If you encounter things not on the list below, add them here
for the next person (unless the fix you put in will address for all
future databses with the same issue).
* Add Python driver (either SQLAlchemy or if SQLAlchemy supports it
natively, maybe just the DBAPI driver) as a transtive dependency
in `setup.py`. Rerun `./deps.sh` and then `./itest --docker
build` to re-install locally.
* If database connections aren't working, you may want to insert
some debugging into `records_mover/db/connect.py` to figure out
what's going on.
* Access errors trying to drop a table in the `public` schema:
Probably means whatever default schema comes with your database
user doesn't match the default assumption - modify
`tests/integration/records/single_db/base_records_test.py` to
match.
* `NotImplementedError: Please teach me how to integration test
mysql`: Add information for your new database in
`tests/integration/records/expected_column_types.py`,
`tests/integration/records/mover_test_case.py`,
`tests/integration/records/records_database_fixture.py` and
`tests/integration/records/records_numeric_database_fixture.py`.
This is where you'll start to get familiar with the different
column types available for your database. Be sure to be as
thorough as practical for your database so we can support both
exporting a wide variety of column types and so that we can
support space-efficient use on import.

For the numeric tests, when re-running you'll probably need to
start filling out a subclass of DBDriver. Relevant methods:
`type_for_fixed_point()`, `type_for_floating_point()`,
`fp_constraints()`, and `integer_limits()`.
* `KeyError: 'mysql'` in
`tests/integration/records/single_db/test_records_numeric.py`:
There are test expectations to set here based on the numeric types
supported by your database. Once you set them, you'll probably
need to add ad `type_for_integer()` method covering things
correctly.
* `AssertionError: ['INTEGER(11)', 'VARCHAR(3)', 'VARCHAR(3)',
'VARCHAR(1)', 'VARCHAR(1)', 'VARCHAR(3)', 'VARCHAR(111)', 'DATE',
'TIME', 'DATETIME', 'DATETIME']`: Double check the types assigned.
You may need to subclass DBDriver and implement to convince
records mover to create the types you expect.
* Errors from `tests/integration/records/directory_validator.py`:
```console
AssertionError:
received ['integer', 'string', 'string', 'string', 'string', 'string', 'string', 'date', 'time', 'datetime', 'datetime'],
expected [['integer', 'string', 'string', 'string', 'string', 'string', 'string', 'date', 'time', 'datetime', 'datetimetz'], ['integer', 'string', 'string', 'string', 'string', 'string', 'string', 'date', 'string', 'datetime', 'datetimetz']]
```

To address, make sure the types returned are as expected for this database.
* `KeyError: 'mysql'`:
`tests/integration/records/single_db/test_records_numeric.py`
needs to be modified to set expectations for this database type.
You can set this to 'bluelabs' as we haven't yet taught
records-mover to do bulk imports, so we have no idea what the
ideal records format variant is for that yet.
* `AssertionError` in
`tests/integration/records/table_validator.py`: There are various
checks here, including things dealing with how datetimes get
rendered. Examine carefully the existing predicates defined
within and add new ones judiciously if it appears the behavior
you are seeing is correct but not currently anticipated.
4. If there are things you see below that you know are needed from the
above list, but the tests are passing, consider adding an
integration test to match.
5. Edit
`tests/integration/records/multi_db/test_records_table2table.py` to
include the new test database and run `./itest table2table` to run
tests. Fix errors as they pop up.
7. Add support for bulk import if the database supports it (and add
more detail here on how to do that!).
* `tests/integration/records/single_db/test_records_numeric.py`
needs to be modified to set the best loading records type for
this database type - pick a type which can be loaded natively
without using Pandas.
8. Add support for bulk export if the database supports it (and add
more detail here on how to do that!).
6 changes: 2 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
FROM python:3.6

#
# database connection scripts, psql CLI client for postgres and
# Redshift, Vertica vsql client and misc shell tools for
# Redshift, Vertica vsql client, MySQL client and misc shell tools for
# integration tests
#

RUN apt-get update && apt-get install -y netcat jq postgresql-client curl
RUN apt-get update && apt-get install -y netcat jq postgresql-client curl default-mysql-client

# google-cloud-sdk for dbcli and bigquery in integration tests
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && apt-get update -y && apt-get install google-cloud-sdk -y
Expand Down
6 changes: 5 additions & 1 deletion deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,8 @@ pyenv virtualenv "${python_version:?}" records-mover-"${python_version:?}" || tr
pyenv local records-mover-"${python_version:?}"

pip3 install --upgrade pip
pip3 install -r requirements.txt -e '.[unittest,itest]'
#
# It's nice to unit test, integration test, and run the CLI in
# a development pyenv.
#
pip3 install -r requirements.txt -e '.[unittest,itest,cli]'
41 changes: 38 additions & 3 deletions itest
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ def local_dockerized_dbfacts():
subprocess.check_output([
'./itest-dc', 'port', 'postgresdb', '5432'
]).decode('utf8').rstrip().split(':')[1]
mysql_port =\
subprocess.check_output([
'./itest-dc', 'port', 'mysqldb', '3306'
]).decode('utf8').rstrip().split(':')[1]
db_facts = {
'dbs': {
'dockerized-vertica': {
Expand All @@ -82,7 +86,23 @@ def local_dockerized_dbfacts():
'user': 'postgres',
'password': 'hunter2',
}
}
},
'dockerized-mysql': {
'exports': {
# This needs to be 127.0.0.1, because if
# this is the string 'localhost', the
# MySQL driver wants to use Unix domain
# sockets to connect, which won't work
# because this is a tunnelled port.
'host': '127.0.0.1',
'port': mysql_port,
'database': 'mysqlitest',
'type': 'mysql',
'protocol': 'mysql',
'user': 'mysqluser',
'password': 'hunter2',
}
},
}
}
yaml_output = yaml.dump(db_facts)
Expand Down Expand Up @@ -122,13 +142,15 @@ def docker_compose_shell() -> None:


def docker_compose_start() -> None:
print("Running docker_compose start verticadb postgresdb...", file=sys.stderr)
print("Running docker_compose start verticadb postgresdb mysqldb...", file=sys.stderr)
docker_compose(["up", "--no-start"])
docker_compose(["start", "verticadb", "postgresdb"])
docker_compose(["start", "verticadb", "postgresdb", "mysqldb"])
docker_compose_run(['./wait-for-vertica.sh'])
print("Verified Vertica is up and listening", file=sys.stderr)
docker_compose_run(['./wait-for-postgres.sh'])
print("Verified Postgres is up and listening", file=sys.stderr)
docker_compose_run(['./wait-for-mysql.sh'])
print("Verified MySQL is up and listening", file=sys.stderr)


def run_test(args, target, parser):
Expand Down Expand Up @@ -196,6 +218,18 @@ def run_test(args, target, parser):
"with-aws-creds", "circleci",
"nosetests", "--xunit-file=nosetests.xml", "."],
cwd="tests/integration/records/single_db")
elif (target == 'mysql'):
with dockerized_dbs():
if (args.docker):
docker_compose_run(['with-db', 'dockerized-mysql',
'nosetests', '--xunit-file=nosetests.xml', '.'],
prefixes=["with-aws-creds", "circleci"],
cwd="/usr/src/app/tests/integration/records/single_db")
else:
with local_dockerized_dbfacts():
subprocess.check_call(["with-db", "dockerized-mysql",
"nosetests", "--xunit-file=nosetests.xml", "."],
cwd="tests/integration/records/single_db")
elif (target == 'postgres'):
with dockerized_dbs():
if (args.docker):
Expand Down Expand Up @@ -250,6 +284,7 @@ def run_test(args, target, parser):
def main():
tests = {
'cli': 'Run bash-based multi-source/target copy tests',
'mysql': 'Run load/unload suite against Dockerized MySQL',
'postgres': 'Run load/unload suite against Dockerized PostgreSQL',
'vertica-s3': 'Run load/unload suite against Dockerized Vertica, using S3',
'vertica-no-s3': 'Run load/unload suite against Dockerized Vertica, using streams',
Expand Down
2 changes: 1 addition & 1 deletion metrics/bigfiles_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
897
967
2 changes: 1 addition & 1 deletion metrics/coverage_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
93.6200
93.700
2 changes: 1 addition & 1 deletion metrics/flake8_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
189
177
2 changes: 1 addition & 1 deletion metrics/mdl_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3
5
2 changes: 1 addition & 1 deletion metrics/mypy_high_water_mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
90.1200
91.5500
4 changes: 4 additions & 0 deletions records_mover/db/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,9 @@ def db_driver(db: Union[sqlalchemy.engine.Engine,
from .postgres.postgres_db_driver import PostgresDBDriver

return PostgresDBDriver(db, **kwargs)
elif engine.name == 'mysql':
from .mysql.mysql_db_driver import MySQLDBDriver

return MySQLDBDriver(db, **kwargs)
else:
return DBDriver(db, **kwargs)
Empty file.
Loading

0 comments on commit 98638e1

Please sign in to comment.