-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration to be set for session type #70
Changes from all commits
9a0deab
f83a817
94d3a00
b43d1eb
c690248
3097c23
f552b5c
c022e20
d77e45f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ There are key areas where records mover needs configuration: | |
|
||
1. Database connection details | ||
2. Temporary locations | ||
3. Cloud credentials for object stores | ||
3. Cloud credentials (e.g., S3/GCS/Google Sheets) | ||
|
||
## Database connection details | ||
|
||
|
@@ -87,7 +87,7 @@ object store - see below for how to configure that. | |
|
||
To specify the temporary location for Redshift exports and imports, | ||
you can either set the environment variable `SCRATCH_S3_URL` to your | ||
URL or configure a TOML-style file in one of the following locations: | ||
URL or configure a INI-style file in one of the following locations: | ||
|
||
* `/etc/bluelabs/records_mover/app.ini` | ||
* `/etc/xdg/bluelabs/records_mover/app.ini` | ||
|
@@ -96,7 +96,7 @@ URL or configure a TOML-style file in one of the following locations: | |
|
||
Example file: | ||
|
||
```toml | ||
```ini | ||
[aws] | ||
s3_scratch_url = "s3://mybucket/path/" | ||
``` | ||
|
@@ -124,3 +124,72 @@ downloaded for local processing) will be stored per Python's | |
which allow for configuration via the `TMPDIR`, `TEMP` or `TMP` env | ||
variables, and generally default to | ||
[something reasonable per your OS](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir). | ||
|
||
## Cloud credentials (e.g., S3/GCS/Google Sheets) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since database credentials come out of db-facts, the major difference made in switching the default creds provider from CredsViaLastPass and CredsViaEnv is how non-default GCP credentials are passed--this text should address that and help provide guidance. |
||
|
||
To be able to access cloud resources, including S3, GCS and Google | ||
Sheets, Records Mover requires credentials. | ||
|
||
There are multiple ways to configure these: | ||
|
||
1. Vendor system configuration (Python and mvrec) | ||
2. Setting environment variables (Python only) | ||
3. Passing in pre-configured default credential objects (Python only) | ||
4. Using a third-party secrets manager (Python and mvrec) | ||
5. Airflow connections (Python via Airflow) | ||
|
||
### Vendor system configuration (Python and mvrec) | ||
|
||
Both AWS and GCP have Python libraries which support using credentials | ||
you configure in different ways. Unless told otherwise, Records Mover | ||
will use these credentials as the "default credentials" available via | ||
the 'creds' property under the Session object. | ||
|
||
### Setting environment variables (Python and mvrec) | ||
|
||
AWS natively supports setting credentials using the | ||
`AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`/`AWS_SESSION_TOKEN` | ||
environment variables. | ||
|
||
Similarly, GCP supports pointing to a file with application | ||
credentials via the `GOOGLE_APPLICATION_CREDENTIALS` environment | ||
variable. | ||
|
||
When using the default 'env' session type, Records Mover also supports | ||
providing a base64ed version of the GCP service account credentials via | ||
the `GCP_SERVICE_ACCOUNT_JSON_BASE64` env variable. | ||
|
||
### Passing in pre-configured default credential objects (Python only) | ||
|
||
You can pass in credentials objects directly to a Session() object | ||
using the `default_gcs_client`, `default_gcp_creds` and/or | ||
`default_boto3_session` arguments. | ||
|
||
### Using a third-party secrets manager (Python and mvrec) | ||
|
||
To use a secrets manager of some type, you can instruct Records Mover | ||
to use a different instance of the 'BaseCreds' class which knows how | ||
to use your specific type of secrets manager. | ||
|
||
An [example implementation](https://github.com/bluelabsio/records-mover/blob/master/records_mover/creds/creds_via_lastpass.py) | ||
ships with Records Mover to use LastPass' CLI tool to fetch (for | ||
instance) GCP credentials via LastPass. | ||
|
||
You can either pass in a instance of a BaseCreds subclass as the | ||
'creds' argument to the Session() constructor in Python, pass in the | ||
string 'lpass' as the value of the 'session_type' parameter to the | ||
Session() constructor, or provide the following config in the `.ini` | ||
file referenced above: | ||
|
||
```ini | ||
[session] | ||
session_type = "lpass" | ||
``` | ||
|
||
### Airflow connections (Python via Airflow) | ||
|
||
Similarly, Records Mover ships with a BaseCreds instance which knows | ||
how to fetch credentials using Airflow connections. While Records | ||
Mover will attempt to auto-detect to determine if it is running under | ||
Airflow, you can explicitly tell Records Mover to use this mode by | ||
setting session_type to "airflow" using one of the above methods. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
93.6900 | ||
93.700 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
92.1900 | ||
92.2700 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from config_resolver import get_config | ||
from db_facts.db_facts_types import DBFacts | ||
from .creds.base_creds import BaseCreds | ||
from .records.records import Records | ||
|
@@ -27,11 +28,20 @@ def _infer_session_type() -> str: | |
if 'RECORDS_MOVER_SESSION_TYPE' in os.environ: | ||
return os.environ['RECORDS_MOVER_SESSION_TYPE'] | ||
|
||
config_result = get_config('records_mover', 'bluelabs') | ||
cfg = config_result.config | ||
if 'session' in cfg: | ||
session_cfg = cfg['session'] | ||
session_type: Optional[str] = session_cfg.get('session_type') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could |
||
if session_type is not None: | ||
logger.info(f"Using session_type={session_type} from config file") | ||
return session_type | ||
|
||
if 'AIRFLOW__CORE__EXECUTOR' in os.environ: | ||
# Guess based on an env variable sometimes set by Airflow | ||
return 'airflow' | ||
|
||
return 'cli' | ||
return 'env' | ||
|
||
|
||
def _infer_default_aws_creds_name(session_type: str) -> Optional[str]: | ||
|
@@ -73,13 +83,14 @@ def _infer_creds(session_type: str, | |
default_gcs_client=default_gcs_client, | ||
scratch_s3_url=scratch_s3_url) | ||
elif session_type == 'cli': | ||
# | ||
# https://app.asana.com/0/1128138765527694/1163219515343393 | ||
# | ||
# Most people don't use LastPass; other secrets managements | ||
# should be supported and configurable at the system- and | ||
# user- level. | ||
# | ||
return CredsViaEnv(default_db_creds_name=default_db_creds_name, | ||
default_aws_creds_name=default_aws_creds_name, | ||
default_gcp_creds_name=default_gcp_creds_name, | ||
default_db_facts=default_db_facts, | ||
default_boto3_session=default_boto3_session, | ||
default_gcp_creds=default_gcp_creds, | ||
default_gcs_client=default_gcs_client) | ||
elif session_type == 'lpass': | ||
return CredsViaLastPass(default_db_creds_name=default_db_creds_name, | ||
default_aws_creds_name=default_aws_creds_name, | ||
default_gcp_creds_name=default_gcp_creds_name, | ||
|
@@ -107,7 +118,7 @@ def _infer_creds(session_type: str, | |
default_gcs_client=default_gcs_client, | ||
scratch_s3_url=scratch_s3_url) | ||
elif session_type is not None: | ||
raise ValueError("Valid session types: cli, airflow, itest, env - " | ||
raise ValueError("Valid session types: cli, lpass, airflow, itest, env - " | ||
"consider upgrading records-mover if you're looking for " | ||
f"{session_type}.") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out config_resolver uses https://docs.python.org/3/library/configparser.html which uses a simplified INI format which is not actually TOML. TIL!