-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deal with, document dependencies issue, add first README.md (#22)
* Add README.md, INSTALL.md and move psycopg2 to mover-cli extra * Add psycopg2-binary to movercli extra * deps-v2 -> deps-v1 * Make PyYAML constraint transitive
- Loading branch information
1 parent
058decb
commit b1997bb
Showing
7 changed files
with
163 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Installing Records Mover | ||
|
||
You can install records-mover with the following 'extras': | ||
|
||
* `pip3 install records-mover` - Install minimal version, not | ||
including `pandas` (needed only for local data copy), `psycopg2` | ||
(needed for Redshift or PostgreSQL connections) or `pyarrow` (needed | ||
for local Parquet manipulation). | ||
* `pip3 install records-mover[gsheets]` - Minimal install plus API | ||
libraries to access Google Sheets. | ||
* `pip3 install records-mover[movercli]` - Install everything and | ||
make assumptions compatible with using mvrec on the command line. | ||
Installs `pandas`, `psycopg2-binary` and `pyarrow`. | ||
|
||
Don't use this extra if you plan on using the library because of the | ||
`psycopg2-binary` risk below. | ||
|
||
## Why this is complicated | ||
|
||
Records mover relies on a number of external libraries. Here are some | ||
things to keep in mind when using `pip install`: | ||
|
||
### pandas | ||
|
||
Only when installing with `pip3 install 'records-mover[movercli]'` | ||
will you get pandas installed by default. | ||
|
||
Pandas a large dependency which is needed in cases where we need to | ||
process data locally. If you are using cloud-native import/export | ||
functionality only, you shouldn't need it and can avoid the bloat. | ||
|
||
### psycopg2 | ||
|
||
psycopg2 is a library used for access to both Redshift and PostgreSQL databases. | ||
|
||
The project is | ||
[dealing](https://www.postgresql.org/message-id/CA%2Bmi_8bd6kJHLTGkuyHSnqcgDrJ1uHgQWvXCKQFD3tPQBUa2Bw%40mail.gmail.com) | ||
[with](https://www.psycopg.org/articles/2018/02/08/psycopg-274-released/) | ||
a thorny compatibility issue with native code and threading. They've | ||
published three separate versions of their library to PyPI as a | ||
result: | ||
|
||
* `psycopg2` - requires local compilation, and as such you need certain | ||
tools and maybe configuration set up. This is the hardest one to | ||
install as a result. | ||
* `psycopg2-binary` - pre-compiled version that might have threading | ||
issues if you try to use it in a multi-threaded environment with | ||
other code that might be using libssl from a different source. | ||
* `psycopg2cffi` - The version to use if you use `pypy` | ||
|
||
If you are using the mvrec command line only, you can use `pip3 | ||
install 'records-mover[movercli]` and it just uses `psycopg2-binary`. | ||
|
||
### pyarrow | ||
|
||
`pyarrow` is a Python wrapper around the Apache Arrow native library. | ||
It's used by records mover to manipulate Parquet files locally. The | ||
Apache Arrow native library can require build tools to install and is | ||
large; if you don't need to deal with Parquet files in the local | ||
environment you can work without it. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,85 @@ | ||
# Records Mover | ||
# Records Mover - mvrec | ||
|
||
Records Mover is a command-line tool and Python library you can | ||
use to move relational data from one place to another. | ||
|
||
Relational data here means anything roughly "rectangular" - with | ||
columns and rows. For example, CSV it supports reading and writing | ||
data in: | ||
|
||
* Databases, including using native high-speed methods of | ||
import/export of bulk data. Redshift and Vertica are | ||
well-supported, with some support for BigQuery and PostgreSQL. | ||
* Google Sheets | ||
* Pandas DataFrames | ||
* CSV files, either alone or in a records directory - a structured | ||
directory of CSV/Parquet/etc files containing some JSON metadata | ||
about their format and origins. Records directories are especially | ||
helpful for the ever-ambiguous CSV format, where they solve the | ||
problem of 'hey, this may be a CSV - but what's the schema? What's | ||
the format of the CSV itself? How is it escaped?' | ||
|
||
The record mover can be exended expand to handle additional database | ||
and data file types by building on top of their | ||
[SQLAlchemy](https://www.sqlalchemy.org/) drivers, and is able to | ||
auto-negotiate the most efficient way of moving data from one to the | ||
other. | ||
|
||
Example CLI use: | ||
|
||
```sh | ||
pip3 install 'records_mover[movercli]' | ||
mvrec --help | ||
mvrec table2table mydb1 myschema1 mytable1 mydb2 myschema2 mytable2 | ||
``` | ||
|
||
For more installation notes, see [INSTALL.md](./INSTALL.md) | ||
|
||
Note that the connection details for the database names here must be | ||
configured using | ||
[db-facts](https://github.com/bluelabsio/db-facts/blob/master/CONFIGURATION.md). | ||
|
||
Example Python library use: | ||
|
||
First, install records_mover. We'll also use Pandas, so we'll install | ||
that, too: | ||
|
||
```sh | ||
pip3 install records_mover pandas | ||
``` | ||
|
||
Now we can run this code: | ||
|
||
```python | ||
#!/usr/bin/env python3 | ||
|
||
# Pull in the job lib library - be sure to run the pip install above first! | ||
from records_mover import Session | ||
from pandas import DataFrame | ||
|
||
session = Session() | ||
records = session.records | ||
|
||
# This is a SQLAlchemy database engine. | ||
# | ||
# You can instead call session.get_db_engine('cred name'). | ||
# | ||
# On your laptop, 'cred name' is the same thing passed to dbcli (mapping to something in LastPass). | ||
# | ||
# In Airflow, 'cred name' maps to the connection ID in the admin Connnections UI. | ||
# | ||
# Or you can build your own and pass it in! | ||
db_engine = session.get_default_db_engine() | ||
|
||
df = DataFrame.from_dict([{'a': 1}]) # or make your own! | ||
|
||
source = records.sources.dataframe(df=df) | ||
target = records.targets.table(schema_name='myschema', | ||
table_name='mytable', | ||
db_engine=db_engine) | ||
results = records.move(source, target) | ||
``` | ||
|
||
When moving data, the sources supported can be found | ||
[here](./records_mover/records/sources/factory.py), and the | ||
targets supported can be found [here](./records_mover/records/targets/factory.py). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
88.8400 | ||
88.8500 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters