(#525) drop existing relation at end of full-refresh incremental build #1682

drewbanin · 2019-08-14T01:46:04Z

fixes #525

Primary goal: minimize downtime for incremental models run in full-refresh mode
Secondary goal: encapsulate incremental upsert logic across adapters so it can be repurposed in higher-order macros

drewbanin · 2019-08-21T14:58:25Z

Moving this issue to the LMA milestone. I think more work is required here to fix this same issue on BigQuery and Snowflake.

beckjake · 2019-08-21T15:25:35Z

Moving this issue to the LMA milestone. I think more work is required here to fix this same issue on BigQuery and Snowflake.

Okay! I've approved it as-is because I do think this PR is both a step in the right direction and very reasonable.

darrenhaken · 2019-09-13T17:50:27Z

@drewbanin is there any support you need on this? It is the killer feature

drewbanin · 2019-09-13T19:40:20Z

hey @darrenhaken - I'd love your help testing when we have some working code here!

darrenhaken · 2019-09-13T19:42:15Z

Of course, we can do BQ testing. Having highly available datasets is a critical feature for us 🙂

drewbanin · 2019-09-13T20:29:34Z

strong agree @darrenhaken - this particular issue is long overdue!

drewbanin · 2019-09-17T12:58:10Z

@darrenhaken I might do a little more cleanup/refactoring work here, but the --full-refresh atomic replace logic has been implemented for BigQuery in this PR. Feel free to give it a spin and let us know how it goes!

darrenhaken · 2019-09-17T18:28:23Z

This is awesome! I’ll take to see if I can do some testing this week.

darrenhaken · 2019-09-18T12:44:01Z

@whittid4 FYI

darrenhaken · 2019-09-18T12:44:22Z

@drewbanin how does it work with BigQuery? i.e. does it create a temp table first?

drewbanin · 2019-09-18T13:43:48Z

@darrenhaken for a full-refresh build of an incremental model:

if the target table doesn't exist: create or replace table as ...
if the target table does exist and is a table: create or replace table as ...
if the target table does exist and is a view: drop view ...; create or replace table as ...

so, this is still not atomic for a view --> incremental materialization switch, but I'm unsure that BQ provides any mechanisms for swapping a view for a table atomically.

Edit: The logic is here and it's actually pretty readable as far as materialization code goes :)

elexisvenator · 2019-10-15T05:08:28Z

I'm assuming the answer is because of restrictions in dbt/BQ/snowflake which I am not familiar with.. but why isnt the workflow to just call the table materialization when a full refresh is required? Is there a need to reimplement the logic in the incremental as well?

drewbanin · 2019-10-15T14:58:59Z

@elexisvenator that's the big idea! I want to accomplish the approach you're describing by making the table materialization just call a macro like replace_table(...) which can also be called from incremental models in full-refresh mode.

The Python code in dbt doesn't really know about the materializations that exist in dbt. This is a really neat feature -- it means that each plugin provided for dbt is able to define its own implementation for tables/incrementals/full refreshes/etc. I'd rather push that logic into the materialization layer (and provide good abstractions that can be shared across materializations) than encode this type of information in the dbt Python code.

That's just to say: your instinct is a good one, and I want to accomplish it with good abstractions in jinja instead of hard-assumptions in Python!

drewbanin · 2019-10-15T15:01:04Z

@beckjake can you give this another quick look? I think it's ready to roll now.

The big change I made since last time involves create or replace table statements on BigQuery. BigQuery does not allow partitioning/clustering table configs to change in create or replace table statements. This meant that --full-refresh didn't work in the initial implementation when partition_by or cluster_by were changed in the model code.

We should apply similar logic to the table materialization, but I figured that was out of scope for this issue. I'll create a separate issue to address it.

beckjake

Looks good, I have a couple questions but nothing significant.

beckjake · 2019-10-15T15:08:25Z

core/dbt/include/global_project/macros/materializations/incremental/helpers.sql

+    from {{ target_relation }}
+    where ({{ unique_key }}) in (
+        select ({{ unique_key }})
+        from {{ tmp_relation.include(schema=False, database=False) }}


Is this include() necessary/even correct for all databases? I think whatever made your tmp_relation should be giving you the correct include policy.

this is such a great catch! Yes - this macro should definitely expect the tmp_relation to already have a valid include policy for the given database.

beckjake · 2019-10-15T15:12:08Z

plugins/bigquery/dbt/adapters/bigquery/impl.py

@@ -346,6 +346,36 @@ def execute_model(self, model, materialization, sql_override=None,

        return res

+    @available.parse_none


I'm going to be pedantic about types here: This should probably be @available.parse(lambda *a, **k: True) (or False).

beckjake · 2019-10-15T15:13:14Z

plugins/bigquery/dbt/adapters/bigquery/impl.py

+            conf_cluster = [conf_cluster]
+
+        return table_partition == conf_partition \
+            and table_cluster == conf_cluster


Does order matter? If not, we should compare set(table_cluster) == set(conf_cluster).

Yeah -- the order is significant -- clustering works like ordering whereby the table is clustered by the first clustering key, then the second, and so on.

This query fails if you run it twice, swapping the order of the clustering keys on the second run:

create or replace table dbt_dbanin.debug_clustering partition by date_day cluster by id, name as ( select current_date as date_day, 1 as id, 'drew' as name );

darrenhaken · 2019-10-15T18:52:33Z

Well this is exciting news! Is this coming earlier than November?

…

On Tue, 15 Oct 2019 at 20:44, Drew Banin ***@***.***> wrote: Merged #1682 <#1682> into dev/louisa-may-alcott. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1682>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHIY67FQZDG7QBK3GUM6TTQOYFSBANCNFSM4ILQG5UQ> .

drewbanin · 2019-10-15T19:32:24Z

@darrenhaken this is shipping in 0.15.0, due in November! We'll have a pre-release ready hopefully by the end of this week :)

darrenhaken · 2019-10-15T19:35:05Z

Ok I’ll give the pre-release a whirl 🙂

…

On Tue, 15 Oct 2019 at 21:32, Drew Banin ***@***.***> wrote: @darrenhaken <https://github.com/darrenhaken> this is shipping in 0.15.0, due in November! We'll have a pre-release ready hopefully by the end of this week :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1682>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHIY643XBWCKH5WLAAAMETQOYLEXANCNFSM4ILQG5UQ> .

cla-bot bot added the cla:yes label Aug 14, 2019

drewbanin marked this pull request as ready for review August 21, 2019 13:52

drewbanin requested review from beckjake and removed request for beckjake August 21, 2019 13:52

beckjake approved these changes Aug 21, 2019

View reviewed changes

drewbanin changed the base branch from dev/0.14.1 to dev/louisa-may-alcott September 16, 2019 21:42

drewbanin added 2 commits October 14, 2019 21:54

(#525) drop existing relation at end of full-refresh incremental build

0f1693a

handle changing partition/cluster configs on BQ

95a0587

drewbanin force-pushed the fix/minimize-incremental-downtime branch from 0fe4da8 to 95a0587 Compare October 15, 2019 02:54

pep8; code cleanup

43b8293

beckjake approved these changes Oct 15, 2019

View reviewed changes

pr feedback

50f4f8a

drewbanin merged commit e83aab2 into dev/louisa-may-alcott Oct 15, 2019

drewbanin deleted the fix/minimize-incremental-downtime branch October 15, 2019 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(#525) drop existing relation at end of full-refresh incremental build #1682

(#525) drop existing relation at end of full-refresh incremental build #1682

drewbanin commented Aug 14, 2019 •

edited

Loading

drewbanin commented Aug 21, 2019

beckjake commented Aug 21, 2019

darrenhaken commented Sep 13, 2019 •

edited

Loading

drewbanin commented Sep 13, 2019

darrenhaken commented Sep 13, 2019

drewbanin commented Sep 13, 2019

drewbanin commented Sep 17, 2019

darrenhaken commented Sep 17, 2019

darrenhaken commented Sep 18, 2019

darrenhaken commented Sep 18, 2019

drewbanin commented Sep 18, 2019 •

edited

Loading

elexisvenator commented Oct 15, 2019

drewbanin commented Oct 15, 2019

drewbanin commented Oct 15, 2019

beckjake left a comment

beckjake Oct 15, 2019

drewbanin Oct 15, 2019

beckjake Oct 15, 2019

beckjake Oct 15, 2019

drewbanin Oct 15, 2019

darrenhaken commented Oct 15, 2019 via email

drewbanin commented Oct 15, 2019

darrenhaken commented Oct 15, 2019 via email

		@@ -346,6 +346,36 @@ def execute_model(self, model, materialization, sql_override=None,

		return res

		@available.parse_none

(#525) drop existing relation at end of full-refresh incremental build #1682

(#525) drop existing relation at end of full-refresh incremental build #1682

Conversation

drewbanin commented Aug 14, 2019 • edited Loading

drewbanin commented Aug 21, 2019

beckjake commented Aug 21, 2019

darrenhaken commented Sep 13, 2019 • edited Loading

drewbanin commented Sep 13, 2019

darrenhaken commented Sep 13, 2019

drewbanin commented Sep 13, 2019

drewbanin commented Sep 17, 2019

darrenhaken commented Sep 17, 2019

darrenhaken commented Sep 18, 2019

darrenhaken commented Sep 18, 2019

drewbanin commented Sep 18, 2019 • edited Loading

elexisvenator commented Oct 15, 2019

drewbanin commented Oct 15, 2019

drewbanin commented Oct 15, 2019

beckjake left a comment

Choose a reason for hiding this comment

beckjake Oct 15, 2019

Choose a reason for hiding this comment

drewbanin Oct 15, 2019

Choose a reason for hiding this comment

beckjake Oct 15, 2019

Choose a reason for hiding this comment

beckjake Oct 15, 2019

Choose a reason for hiding this comment

drewbanin Oct 15, 2019

Choose a reason for hiding this comment

darrenhaken commented Oct 15, 2019 via email

drewbanin commented Oct 15, 2019

darrenhaken commented Oct 15, 2019 via email

drewbanin commented Aug 14, 2019 •

edited

Loading

darrenhaken commented Sep 13, 2019 •

edited

Loading

drewbanin commented Sep 18, 2019 •

edited

Loading