Fix missing sql_header for incremental models #2200

drewbanin · 2020-03-12T19:57:57Z

resolves #2136

Description

This PR adds support for injecting a sql_header above merge statements generated for incremental models.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6

Just one question, otherwise LGTM

jtcohen6 · 2020-03-12T21:45:31Z

core/dbt/include/global_project/macros/materializations/common/merge.sql

@@ -87,6 +90,9 @@
 {% macro default__get_insert_overwrite_merge_sql(target, source, dest_columns, predicates) -%}
    {%- set predicates = [] if predicates is none else [] + predicates -%}
    {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
+    {%- set sql_header = config.get('sql_header', none) -%}
+
+    {{ sql_header if sql_header is not none }}


I think, if we do this, then the sql_header will run twice for "dynamic" (scripting) insert_overwrite materializations. (The first time will be when creating the temp table.) I assume that's ok?

This is a good call. Temporary UDFs can't be defined twice in the same BQ script. Think we're going to need to find an alternative approach here.

I kicked around the idea of making this a materialization-level thing, and while I think that might be a good approach long-term, I don't want to touch every dbt materialization in support of this change. Going to try to figure out a nice way to make this sensible for a very specific use case (insert_overwrite on BigQuery incremental models) and localize the jank as much as possible

drewbanin · 2020-03-16T15:45:58Z

Let's get this merged for 0.16.1 - we should consider whether or not to keep this logic in adapter macros, or move it into materializations (the better long-term answer)

Edit: This is not going out in 0.16.1, let's get it merged for 0.17.0

…t into fix/sql-header-in-incrementals

beckjake

This looks good! We should be sure to note this change in the signature of get_merge_sql in the release notes - plugins should probably know about this/use it (I think spark overrides get_merge_sql).

jtcohen6

This is a fine approach that fixes the problem at hand, which is interpolating the sql_header in the right spots during incremental runs on BigQuery.

Thinking through the implications:

This PR does change the contract of get_insert_overwrite_merge_sql by adding the argument include_sql_header. Right now, only the BigQuery adapter uses that macro, and it just uses the default implementation.
This PR doesn't change the contract of get_merge_sql, which is good (this is a macro in wider use), but it also means it doesn't have the added lever of include_sql_header. The default__ and snowflake__ implementations of get_merge_sql must then always include the sql_header if provided, meaning that we can only pass model SQL (and never a temp table) as the source of get_merge_sql for those implementations. That's okay based on how they're used today, but it's feasible that we may want to pass a temp relation source someday—something that spark__get_merge_sql does currently.

Thinking far ahead: My personal belief is that get_merge_sql, get_delete_insert_merge_sql, get_insert_overwrite_merge_sql, etc. should eventually converge toward having the same basic contract, and that we should further leverage predicates to (a) handle the idiosyncrasies of each, and (b) enable end users to pass additional custom logic (match conditions, filters, etc). That may be an unrealistic aspiration, though I don't think it's out of proportion with this being some of the trickiest, highest-leverage, and most cost-impactful SQL that dbt generates.

drewbanin · 2020-05-01T14:11:48Z

@beckjake FYI merging this before the tests pass - it was a changelog merge conflict

(#2136) Fix missing sql_header for incremental models

367eff7

cla-bot bot added the cla:yes label Mar 12, 2020

drewbanin mentioned this pull request Mar 12, 2020

set_sql_header works well with 'table', but not 'incremental' #2136

Closed

4 tasks

drewbanin requested a review from jtcohen6 March 12, 2020 20:35

jtcohen6 approved these changes Mar 12, 2020

View reviewed changes

drewbanin mentioned this pull request Mar 18, 2020

SQL Headers Are Not Injected During BigQuery Incremental Loads #2194

Closed

5 tasks

drewbanin added 2 commits April 6, 2020 17:12

Merge branch 'dev/octavius-catto' of github.com:fishtown-analytics/db…

4045510

…t into fix/sql-header-in-incrementals

Fix for bq insert_overwrite incrementals, add additional tests

3a5eb4c

drewbanin changed the base branch from dev/barbara-gittings to dev/octavius-catto April 6, 2020 21:58

drewbanin changed the title ~~(#2136) Fix missing sql_header for incremental models~~ Fix missing sql_header for incremental models Apr 6, 2020

Update changelog

579a6d6

drewbanin requested review from jtcohen6 and beckjake April 7, 2020 00:36

beckjake approved these changes Apr 7, 2020

View reviewed changes

jtcohen6 approved these changes Apr 7, 2020

View reviewed changes

drewbanin added 2 commits April 29, 2020 15:52

Merge branch 'dev/octavius-catto' into fix/sql-header-in-incrementals

29e2bbc

Merge branch 'dev/octavius-catto' into fix/sql-header-in-incrementals

f3d4377

drewbanin merged commit 68babfb into dev/octavius-catto May 1, 2020

drewbanin deleted the fix/sql-header-in-incrementals branch May 1, 2020 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing sql_header for incremental models #2200

Fix missing sql_header for incremental models #2200

drewbanin commented Mar 12, 2020 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Mar 12, 2020

drewbanin Apr 6, 2020

drewbanin commented Mar 16, 2020 •

edited

Loading

beckjake left a comment

jtcohen6 left a comment

drewbanin commented May 1, 2020

Fix missing sql_header for incremental models #2200

Fix missing sql_header for incremental models #2200

Conversation

drewbanin commented Mar 12, 2020 • edited Loading

Description

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Mar 12, 2020

Choose a reason for hiding this comment

drewbanin Apr 6, 2020

Choose a reason for hiding this comment

drewbanin commented Mar 16, 2020 • edited Loading

beckjake left a comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

drewbanin commented May 1, 2020

drewbanin commented Mar 12, 2020 •

edited

Loading

drewbanin commented Mar 16, 2020 •

edited

Loading