Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support transient tables on Snowflake #1252

Merged
merged 2 commits into from
Feb 13, 2019

Conversation

drewbanin
Copy link
Contributor

Closes #946

This PR adds support for transient tables on Snowflake using the transient config. Example usage:

# dbt_project.yml

models:
  transient: true

By default, models will be created as "transient". If users wish, they can override the setting to be false, which will cause these tables to participate in Snowflake's Time Travel mechanism.

In addition to adding the transient config, I also ripped adapter-specific configs out of the SourceConfig class. These configs are now supplied by adapters. I think this is probably a good idea, but I'm unsure how sound my implementation is. Super open to tweaking things to make this fit nicely into the adapter plugin architecture :)

@drewbanin drewbanin requested a review from beckjake January 21, 2019 15:52
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I like the adapter-specific configs. One thing I found surprising in here was the lack of cluster_by/partition_by for bigquery. Is that because we only allow those fields in config blocks, or something?

@@ -110,6 +110,8 @@ class BaseAdapter(object):
# This should be an implementation of BaseConnectionManager
ConnectionManager = None

AdapterSpecificConfigs = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this (and redshift/snowflake's versions of this) should be a frozenset to avoid any mutability issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a good idea, i will update

@@ -107,7 +101,7 @@ def update_in_model_config(self, config):
)
current.update(value)
self.in_model_config[key] = current
else: # key in self.ClobberFields
else: # key in self.ClobberFields or self.AdapterSpecificConfigs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to mean that adapter specific configurations will always be clobber-only. If we wanted to change that in the future, would it be easy/possible without breaking existing adapters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a tiny bit of thought, and I'm really not sure what an append/extend config could possibly look like for adapter configs. I think "clobber" will be the most common paradigm for new configs, but I do also like the idea of not needing to change this API in the future. I think it will be easy to just add a couple more class attributes like AdapterSpecificExtendConfigs as the need arises, and 3rd party adapters should just inherit these attributes from the base adapter. Do you buy that? Or do you think there's a better way of doing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that does sound ok, in the worst case we end up with 6 of these which isn't terrible.

I do think adapter-specific append configs are potentially pretty reasonable - as a lazy example, imagine an adapter where the underlying database supports a concept like tags. It would feel pretty natural for those to append instead of clobbering.

@drewbanin
Copy link
Contributor Author

This looks good, I like the adapter-specific configs. One thing I found surprising in here was the lack of cluster_by/partition_by for bigquery. Is that because we only allow those fields in config blocks, or something?

Really, really good point! I think this was probably an oversight when we initially added those configs, and I believe that they won't work if defined in dbt_project.yml at present. I can fix this as a part of this change for sure.

@drewbanin
Copy link
Contributor Author

@beckjake no rush, but 🏓 when you're ready!

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@drewbanin drewbanin merged commit 9a74abf into dev/stephen-girard Feb 13, 2019
@drewbanin drewbanin deleted the feature/snowflake-transient-tables branch February 13, 2019 16:42
@jon-rtr
Copy link
Contributor

jon-rtr commented Feb 13, 2019

This is awesome! Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants