Add support for Snowflake's transient tables #946

jon-rtr · 2018-08-21T23:07:11Z

Feature

Feature description

DBT currently has support for permanent and temporary tables. As an option to save space consumed by Snowflake tables, we should also support transient tables: https://docs.snowflake.net/manuals/user-guide/tables-temp-transient.html#comparison-of-table-types

permanent: CREATE TABLE ... AS
transient: CREATE TRANSIENT TABLE ... AS
temporary: CREATE TEMPORARY TABLE ... AS

Who will this benefit?

Anyone using the Snowflake data warehouse.

Because transient tables do not have a Fail-safe period, they provide a good option for managing the cost of very large tables used to store transitory data; however, the data in these tables cannot be recovered after the Time Travel retention period passes.

The text was updated successfully, but these errors were encountered:

drewbanin · 2018-08-22T13:04:36Z

@jon-rtr this is really cool! We've noticed that catalog queries get very slow on Snowflake after a series of dbt runs. Do you think this flag would also have any bearing on the catalog queries? My suspicion is that after permanent tables are dropped, their metadata still kicks around in the catalog.

Regardless, I think this is worth implementing on the basis of cost alone, and it should be a pretty straightforward update the to table materialization. Thanks for the report!

jon-rtr · 2018-08-22T14:41:39Z

I can ask our Snowflake rep. Could you give some more details on what you mean by "catalog queries"? Do you have a good example?

What are your thoughts on how the solution should be architected? Do you think it should be a snowflake customization of the table materialization? Should there be a new flag to dbt run like --transient?
https://github.com/fishtown-analytics/dbt/blob/bb5d211c942a0c2ea4892d620c248c83c77d6475/dbt/include/global_project/macros/materializations/table/table.sql#L5
https://github.com/fishtown-analytics/dbt/blob/bb5d211c942a0c2ea4892d620c248c83c77d6475/dbt/include/global_project/macros/materializations/table/table.sql#L23

drewbanin · 2018-08-22T15:07:50Z

dbt queries information_schema.tables to figure out if relations already exist. If a relation doesn't exist, then dbt will create it. If the relation does exist, then dbt will do a "smart" thing based on the type of the relation (view vs. table), flags provided to dbt (--non-destructive/--full-refresh), and the type of model being built (table/view/incremental).

I'm curious if selecting from information_schema.tables is made slower if a bunch of permanent tables are created and deleted in a given warehouse. I could imagine that transient tables might work differently under the hood, but that's just a guess on my end.

I think the right way to implement would be to add a config option for models. If transient is set to true, then dbt will use the transient option while building tables. dbt uses this macro to create tables on Snowflake. We can update this macro to account for model configs as we do for Redshift sort/dist keys here: https://github.com/fishtown-analytics/dbt/blob/development/dbt/include/global_project/macros/adapters/redshift.sql#L30

drewbanin · 2018-08-22T15:08:41Z

I'll add: we're moving away from flags like --non-destructive and --full-refresh because they apply to the whole graph. Model configs are a way more granular way of apply custom configs to a specific set of models.

jrandrews · 2019-01-04T17:33:21Z

Making everything transient on Snowflake is probably the right option, although I can see cases for very large models that are being generated and then incrementally updated it might be nice to have an override option.

e.g. if I have some giant 100 billion row table that I could in theory regenerate from raw tables but it might take 24 hours to do it, I might want an option to make it permanent so the DBAs (maybe I am the DBA 🙂 ?) can use fail-safe for that particular table for a faster recovery

jon-rtr changed the title ~~add support for snowflake's transient tables~~ Add support for Snowflake's transient tables Aug 21, 2018

drewbanin added snowflake good_first_issue Straightforward + self-contained changes, good for new contributors! labels Aug 22, 2018

drewbanin modified the milestones: Wilt Chamberlain, Stephen Girard Dec 12, 2018

drewbanin mentioned this issue Jan 20, 2019

Support transient tables on Snowflake #1252

Merged

drewbanin self-assigned this Jan 20, 2019

drewbanin closed this as completed in #1252 Feb 13, 2019

krishbox mentioned this issue Jul 17, 2019

Support snapshot configs in dbt_project.yml #1613

Closed

KeeonTabrizi mentioned this issue Nov 10, 2022

[CT-1487] [Feature] Allow DBT Models to Reference Created Temporary Tables prior to final result set & materialization. #6234

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Snowflake's transient tables #946

Add support for Snowflake's transient tables #946

jon-rtr commented Aug 21, 2018

drewbanin commented Aug 22, 2018

jon-rtr commented Aug 22, 2018 •

edited

Loading

drewbanin commented Aug 22, 2018

drewbanin commented Aug 22, 2018

jrandrews commented Jan 4, 2019

Add support for Snowflake's transient tables #946

Add support for Snowflake's transient tables #946

Comments

jon-rtr commented Aug 21, 2018

Feature

Feature description

Who will this benefit?

drewbanin commented Aug 22, 2018

jon-rtr commented Aug 22, 2018 • edited Loading

drewbanin commented Aug 22, 2018

drewbanin commented Aug 22, 2018

jrandrews commented Jan 4, 2019

jon-rtr commented Aug 22, 2018 •

edited

Loading