-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new config mechanism and use for S3 scratch bucket #47
Conversation
records_mover/session.py
Outdated
@@ -41,6 +42,33 @@ def _infer_scratch_s3_url(session_type: str) -> Optional[str]: | |||
if "SCRATCH_S3_URL" in os.environ: | |||
return os.environ["SCRATCH_S3_URL"] | |||
|
|||
# TODO: do I want this in 'app.ini'? | |||
# TODO: where else to document this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncompleted punchlist item detected--consider resolving or moving this to your issue tracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of thinking we create a CONFIG.md describing (for now) how to configure the S3 bucket, including the content below, and point to it on github.com in the error message when we raise a NoTemporaryBucketConfiguration or CredsDoNotSupportS3Export. Seem reasonable?
Hey @cwegrzyn, could you take a look and provide some feedback on this so far? I haven't yet looked into the db-facts aspect of this, but it seems to capture the essence of what we talked about for configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a first glance, this seems pretty reasonable. It did get me to think about WHEN we'd expect to need to override the scratch URL, and, in a multi-environment setup like we have, it seems like something that is roughly associated with a specific database. To be slightly more specific, imagine our staff who might be working against an internal redshift or a client-hosted Redshift-- they probably need (or at least ought) to use different intermediate S3 buckets for each. What's the way that we'd expect to handle that?
Just spitballing, but two ideas for extensions that might solve this and other general:
-
Let db_facts supply additional config for records mover in maybe a records_mover key in the JSON blob. Could use similar paths to what you have right now (e.g.,
records_mover.aws.s3_scratch_url
) -
Alternatively, add a mechanism to the app.ini config so that settings can be overridden on a per-dbname basis, e.g., so you can create a
[db.<dbname>.aws]
section to set the scratch url for that particular db.
Haven't really thought much about the pros and cons of those (or other possible solutions yet)
Totally! Given AWS takes credentials alongside the S3 bucket, (except maybe in GovCloud?), I could certainly see reasons for folks to want to have that level of configurability. One flag is that AWS credentials would often need to change along with that S3 bucket configuration - having a consistent way to address this would be great, too, instead of relying on folks handing records-mover AWS creds that match the database they pointed it to. Of course, sometimes this problem is thorny on its own - e.g., with third party tools to provide temporary AWS credentials. This sounds a little more in the realm of what something like db-facts would provide with the ability to call out to scripts to fetch back creds and all. |
metrics/bigfiles_high_water_mark
Outdated
@@ -1 +1 @@ | |||
968 | |||
969 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One extra line in setup.py for the new dependency
return s3_scratch_url | ||
else: | ||
logger.debug('No config ini file found') | ||
|
||
if session_type == 'cli': | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a second PR to retire this feature once we have internal uses switched over to the config file.
# | ||
# https://github.com/exhuma/config_resolver/blob/master/doc/intro.rst#logging | ||
# | ||
# https://github.com/exhuma/config_resolver/issues/69 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR open - hopefully it gets accepted: exhuma/config_resolver#70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine now. Let me know if there are any remaining issues.
OK! Updated and ready to merge. |
This is more generic configuration mechanism for records-mover that can be set system-, user-, or session-wide as needed. It'd replace the current
/usr/local/bin/scratch-s3-url
mechanism.