You can install records-mover with a number of 'extras'. See
setup.py and the extras_require
section for a full
list. In general, you'll need to explicitly spell out which databases
you want to be able to move between (check setup.py
for the list
available), as well as for most sources and targets.
Example:
-
pip3 install records-mover
- Install minimal version. Not able to connect to databases, deal with dataframes either as input or as an intermediary format, etc. Generally not terribly useful. -
pip3 install records-mover[gsheets,pandas]
- Minimal install plus libraries to access Google Sheets and manipulate Pandas DataFrames. -
pip3 install records-mover[cli,redshift-binary,pandas,parquet]
- Install enough things to be able to use themvrec
command line, talk to the Redshift database, and use Parquet internally and/or as an input/output. Installspandas
,psycopg2-binary
andpyarrow
, among others.You might consider
redshift-source
if you plan on using records-mover as a library because of thepsycopg2-binary
risk below.
Records mover relies on a number of external libraries. Each database comes with its own driver, some of which depend on binary libraries which may need to be installed in your OS.
Indeed, some of those are even difficult to install. Some examples:
Only when installing with pip3 install 'records-mover[pandas]'
will
you get pandas installed by default.
Pandas a large dependency which is needed in cases where we need to process data locally. If you are using cloud-native import/export functionality only, you shouldn't need it and can avoid the bloat.
psycopg2 is a library used for access to both Redshift and PostgreSQL databases.
The project is dealing with a thorny compatibility issue with native code and threading. They've published three separate versions of their library to PyPI as a result:
psycopg2
- requires local compilation, and as such you need certain tools and maybe configuration set up. This is the hardest one to install as a result.psycopg2-binary
- pre-compiled version that might have threading issues if you try to use it in a multi-threaded environment with other code that might be using libssl from a different source.psycopg2cffi
- The version to use if you usepypy
If you are using the mvrec
command line only, you can use pip3 install 'records-mover[cli,postgres-binary,redshift-binary]
and it
will just use psycopg2-binary
.
pyarrow
is a Python wrapper around the Apache Arrow native library.
It's used by records mover to manipulate Parquet files locally. The
Apache Arrow native library can require build tools to install and is
large; if you don't need to deal with Parquet files in the local
environment you can work without it.