GitHub - bluelabsio/records-mover at a6220473f0e8bdd5fb0f0e14275c5a6cd731348b

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name	Name	Last commit message	Last commit date
Latest commit vinceatbluelabs Ensure dataframes are closed in test Dec 4, 2020 a622047 · Dec 4, 2020 History 124 Commits
.circleci	.circleci	Restore test ordering	Dec 3, 2020
docs	docs	Add additional valid date formats to spec	Dec 3, 2020
metrics	metrics	Add Avro support for Redshift import (#137 )	Dec 2, 2020
records_mover	records_mover	Ensure we can close dataframe iterators once done	Dec 3, 2020
tests	tests	Ensure dataframes are closed in test	Dec 4, 2020
types/stubs	types/stubs	Add support for BigQuery bulk export (to Avro, for now) (#136 )	Dec 1, 2020
.gitignore	.gitignore	Clean up root directory (#62 )	May 11, 2020
.markdown.style.rb	.markdown.style.rb	Import records and records schema specs (#59 )	May 11, 2020
.mdlrc	.mdlrc	Initial import	Feb 18, 2020
.pronto.yml	.pronto.yml	Retire temporary file management, context managers in job contexts, s…	Feb 23, 2020
.python-version	.python-version	Add Python 3.9 testing and local development, upgrade mypy for Python…	Nov 16, 2020
.readthedocs.yml	.readthedocs.yml	Create, publish public API documentation on readthedocs.io (#98 )	Jul 13, 2020
LICENSE	LICENSE	Create LICENSE (#10 )	Feb 22, 2020
Makefile	Makefile	Add type checking to integration test	Dec 3, 2020
README.md	README.md	Fix README.md code sample errors (#106 )	Oct 2, 2020
Rakefile.quality	Rakefile.quality	Introduce component test suite (#108 )	Sep 30, 2020
deps.sh	deps.sh	Add Python 3.9 testing and local development, upgrade mypy for Python…	Nov 16, 2020
publish.sh	publish.sh	Change 'csv' source/target name in mvrec CLI to 'file' (#6 )	Feb 20, 2020
requirements.txt	requirements.txt	Add cli-extra-test (#58 )	Jun 26, 2020
setup.cfg	setup.cfg	Also downcast constraints and statistics when downcasting field types (…	Oct 2, 2020
setup.py	setup.py	Add Python 3.9 testing and local development, upgrade mypy for Python…	Nov 16, 2020

Repository files navigation

Records mover is a command-line tool and Python library you can use to move relational data from one place to another.

Relational data here means anything roughly "rectangular" - with columns and rows. For example, it supports reading and writing from:

Databases, including using native high-speed methods of import/export of bulk data. Redshift, Vertica and PostgreSQL are well-supported, with some support for BigQuery and MySQL.
CSV files
Parquet files (initial support)
Google Sheets
Pandas DataFrames
Records directories - a structured directory of CSV/Parquet/etc files containing some JSON metadata about their format and origins. Records directories are especially helpful for the ever-ambiguous CSV format, where they solve the problem of 'hey, this may be a CSV - but what's the schema? What's the format of the CSV itself? How is it escaped?'

Records mover can be exended expand to handle additional databases and data file types. Databases are supported by building on top of their SQLAlchemy drivers. Records mover is able to auto-negotiate the most efficient way of moving data from one to the other.

CLI use example

Installing:

pip3 install 'records_mover[cli,postgres-binary,redshift-binary]'

Loading a CSV into a database:

mvrec file2table foo.csv redshiftdb1 myschema1 mytable1

Copying a table from a PostgreSQL to a Redshift database:

mvrec --help
mvrec table2table postgresdb1 myschema1 mytable1 redshiftdb2 myschema2 mytable2

Note records mover will automatically build an appropriate CREATE TABLE statement on the target end if the table doesn't already exist.

Note that the connection details for the database names here must be configured using db-facts.

For more installation notes, see INSTALL.md. To understand the security model here, see SECURITY.md.

CLI use demo (table creation and loading)

Python library use example

First, install records_mover. We'll also use Pandas, so we'll install that, too, as well as a driver for Postgres.

pip3 install records_mover[pandas,postgres-source]

Now we can run this code:

#!/usr/bin/env python3

# Pull in the records-mover library - be sure to run the pip install above first!
from records_mover import sources, targets, move
from pandas import DataFrame
import sqlalchemy
import os

sqlalchemy_url = f"postgresql+psycopg2://username:{os.environ['DB_PASSWORD']}@hostname/database_name"
db_engine = sqlalchemy.create_engine(sqlalchemy_url)

df = DataFrame.from_dict([{'a': 1}])  # or make your own!

source = sources.dataframe(df=df)
target = targets.table(schema_name='myschema',
                       table_name='mytable',
                       db_engine=db_engine)
results = move(source, target)

When moving data, the sources supported can be found here, and the targets supported can be found here.

Advanced Python library use example

Here's another example, using some additional features:

Loading from an existing dataframe.
Secrets management using db-facts, which is a way to configure credentials in YAML files or even fetch them dynamically from your secrets store.
Logging configuration to show the internal processing steps (helpful in optimizing performance or debugging issues)

you can use this:

#!/usr/bin/env python3

# Pull in the records-mover library - be sure to run the pip install above first!
from records_mover import Session
from pandas import DataFrame

session = Session()
session.set_stream_logging()
records = session.records

db_engine = session.get_default_db_engine()

df = DataFrame.from_dict([{'a': 1}])  # or make your own!

source = records.sources.dataframe(df=df)
target = records.targets.table(schema_name='myschema',
                               table_name='mytable',
                               db_engine=db_engine)
results = records.move(source, target)

Python library API documentation

You can can find more API documentation here. In particular, note:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLI use example

CLI use demo (table creation and loading)

Python library use example

Advanced Python library use example

Python library API documentation

About

Releases 25

Packages

Contributors 9

Languages

License

bluelabsio/records-mover

Folders and files

Latest commit

History

Repository files navigation

CLI use example

CLI use demo (table creation and loading)

Python library use example

Advanced Python library use example

Python library API documentation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 25

Packages 0

Contributors 9

Languages

Packages