-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new config mechanism and use for S3 scratch bucket (#47)
This is more generic configuration mechanism for records-mover that can be set system-, user-, or session-wide as needed. It'd replace the current `/usr/local/bin/scratch-s3-url` mechanism.
- Loading branch information
1 parent
ef502cb
commit 6ec62d5
Showing
15 changed files
with
291 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Configuring records mover | ||
|
||
There are key areas where records mover needs configuration: | ||
|
||
1. Database connection details | ||
2. Temporary locations | ||
3. Cloud credentials for object stores | ||
|
||
## Database connection details | ||
|
||
There are ways to configure database connection details--some are | ||
applicable only when using records mover as a Python library: | ||
|
||
1. Setting environment variables (Python only) | ||
2. Passing in pre-configured SQLAlchemy Engine objects (Python only) | ||
3. Configuring db-facts (Python and mvrec) | ||
4. Airflow connections (Python via Airflow) | ||
|
||
### Setting environment variables (Python only) | ||
|
||
The `Session` object contains a method called | ||
`get_default_db_engine()` which will return a database engine as | ||
configured by a set of env variables. Note that using this method | ||
limits you to dealing with one database at a time, and often requires | ||
that env variables exist in your OS environment; if these trade-offs | ||
aren't acceptable please see the other options below. | ||
|
||
The default environment variables match the semantics of | ||
[sqlalchemy.engine.url.URL](https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.engine.url.URL) | ||
and are as follows: | ||
|
||
* `DB_HOST` | ||
* `DB_DATABASE` | ||
* `DB_PORT` | ||
* `DB_USERNAME` | ||
* `DB_PASSWORD` | ||
* `DB_TYPE` | ||
|
||
Redshift adds the following optional env variable(s): | ||
|
||
* `REDSHIFT_SPECTRUM_BASE_URL_`: (optional) Specifies an `s3://` URL | ||
where Redshift spectrum files should be stored when using the | ||
'spectrum' records target. | ||
|
||
BigQuery has an alternate set of env variables that should be used | ||
instead of the `DB_` values above: | ||
|
||
* `BQ_DEFAULT_PROJECT_ID`: Google Cloud Storage Project to be accessed. | ||
* `BQ_DEFAULT_DATASET_ID`: BigQuery Dataset which should be used if | ||
not otherwise overridden. | ||
* `BQ_SERVICE_ACCOUNT_JSON`: (optional): JSON (not a filename) | ||
representing BigQuery service account credentials. | ||
|
||
### Passing in pre-configured SQLAlchemy Engine objects (Python only) | ||
|
||
The `database` factory methods for records sources and targets allow a | ||
SQLALchemy Engine to be passed in directly. | ||
|
||
### Configuring db-facts (Python and mvrec) | ||
|
||
[db-facts](https://github.com/bluelabsio/db-facts) is a complementary | ||
project used to configure database credentials. Please see | ||
[db-facts documentation](https://github.com/bluelabsio/db-facts/blob/master/CONFIGURATION.md) | ||
for details on configuration. | ||
|
||
### Airflow connections (Python via Airflow) | ||
|
||
If you are running under Airflow, the | ||
`session.creds.get_db_engine(name)` method will look up `name` in your | ||
Airflow connections rather than use `db-facts`. This can be | ||
configured via the `session_type` parameter passed to the `Session()` | ||
constructor. | ||
|
||
## Temporary locations | ||
|
||
Cloud-based databases are often more efficient exporting to | ||
cloud-native object stores (e.g., S3) than otherwise. Indeed, some | ||
(e.g., Redshift) *only* support exporting to and importing from an | ||
object store. In order to support moves between such databases and | ||
incompatible targets, records mover must first export to the | ||
compatible object store in a temporary location. | ||
|
||
Note that you'll need credentials with permission to write to this | ||
object store - see below for how to configure that. | ||
|
||
### S3 (Redshift) | ||
|
||
To specify the temporary location for Redshift exports and imports, | ||
you can either set the environment variable `SCRATCH_S3_URL` to your | ||
URL or configure a TOML-style file in one of the following locations: | ||
|
||
* `/etc/bluelabs/records_mover/app.ini` | ||
* `/etc/xdg/bluelabs/records_mover/app.ini` | ||
* `$HOME/.config/bluelabs/records_mover/app.ini` | ||
* `./.bluelabs/records_mover/app.ini` | ||
|
||
Example file: | ||
|
||
```toml | ||
[aws] | ||
s3_scratch_url = "s3://mybucket/path/" | ||
``` | ||
|
||
### Filesystem | ||
|
||
Temporary files written to the filesystem (including large data files | ||
downloaded for local processing) will be stored per Python's | ||
[tempfile](https://docs.python.org/3/library/tempfile.html) default, | ||
which allow for configuration via the `TMPDIR`, `TEMP` or `TMP` env | ||
variables, and generally default to | ||
[something reasonable per your OS](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
969 | ||
968 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
93.9600 | ||
93.9600 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
91.8600 | ||
91.8200 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .core import from_string as from_string, get_config as get_config # noqa | ||
from .exc import NoVersionError as NoVersionError # noqa |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
from .exc import NoVersionError as NoVersionError # noqa | ||
from .handler.base import Handler as Handler | ||
from .util import PrefixFilter as PrefixFilter # noqa | ||
from collections import namedtuple | ||
from config_resolver.handler.ini import IniHandler as IniHandler # noqa | ||
from logging import Filter, Logger | ||
from packaging.version import Version | ||
from typing import Any, Dict, Generator, List, Optional, Tuple, Type | ||
|
||
ConfigID = namedtuple("ConfigID", "group app") | ||
|
||
LookupResult = namedtuple("LookupResult", "config meta") | ||
|
||
LookupMetadata = namedtuple( | ||
"LookupMetadata", ["active_path", "loaded_files", "config_id", "prefix_filter"] | ||
) | ||
|
||
FileReadability = namedtuple("FileReadability", "is_readable filename reason version") | ||
|
||
|
||
def from_string(data: str, handler: Optional[Handler[Any]] = ...) -> LookupResult: ... | ||
|
||
|
||
def get_config( | ||
app_name: str, | ||
group_name: str = ..., | ||
lookup_options: Optional[Dict[str, Any]] = ..., | ||
handler: Optional[Type[Handler[Any]]] = ..., | ||
) -> LookupResult: ... | ||
|
||
|
||
def prefixed_logger( | ||
config_id: Optional[ConfigID], | ||
) -> Tuple[Logger, Optional[Filter]]: ... | ||
|
||
|
||
def get_xdg_dirs(config_id: ConfigID) -> List[str]: ... | ||
|
||
|
||
def get_xdg_home(config_id: ConfigID) -> str: ... | ||
|
||
|
||
def effective_path(config_id: ConfigID, search_path: str = ...) -> List[str]: ... | ||
|
||
|
||
def find_files( | ||
config_id: ConfigID, search_path: Optional[List[str]] = ..., filename: str = ... | ||
) -> Generator[str, None, None]: ... | ||
|
||
|
||
def effective_filename(config_id: ConfigID, config_filename: str) -> str: ... | ||
|
||
|
||
def env_name(config_id: ConfigID) -> str: ... | ||
|
||
|
||
def is_readable( | ||
config_id: ConfigID, | ||
filename: str, | ||
version: Optional[Version] = ..., | ||
secure: bool = ..., | ||
handler: Optional[Type[Handler[Any]]] = ..., | ||
) -> FileReadability: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
class NoVersionError(Exception): | ||
... |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
from packaging.version import Version as Version | ||
from typing import Any, Optional, TypeVar, Generic | ||
|
||
TConfig = TypeVar('TConfig', bound=Any) | ||
|
||
|
||
class Handler(Generic[TConfig]): | ||
DEFAULT_FILENAME: str = ... | ||
@staticmethod | ||
def empty() -> TConfig: ... | ||
@staticmethod | ||
def from_string(data: str) -> TConfig: ... | ||
@staticmethod | ||
def from_filename(filename: str) -> TConfig: ... | ||
@staticmethod | ||
def get_version(config: TConfig) -> Optional[Version]: ... | ||
@staticmethod | ||
def update_from_file(config: TConfig, filename: str) -> None: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
from .base import Handler as Handler | ||
from configparser import ConfigParser | ||
from packaging.version import Version | ||
from typing import Optional | ||
|
||
|
||
class IniHandler(Handler[ConfigParser]): | ||
DEFAULT_FILENAME: str = ... | ||
@staticmethod | ||
def empty() -> ConfigParser: ... | ||
@staticmethod | ||
def from_string(data: str) -> ConfigParser: ... | ||
@staticmethod | ||
def from_filename(filename: str) -> ConfigParser: ... | ||
@staticmethod | ||
def get_version(config: ConfigParser) -> Optional[Version]: ... | ||
@staticmethod | ||
def update_from_file(config: ConfigParser, filename: str) -> None: ... |
Oops, something went wrong.