Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing sql_header for incremental models #2200

Merged
merged 6 commits into from
May 1, 2020

Conversation

drewbanin
Copy link
Contributor

@drewbanin drewbanin commented Mar 12, 2020

resolves #2136

Description

This PR adds support for injecting a sql_header above merge statements generated for incremental models.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question, otherwise LGTM

@@ -87,6 +90,9 @@
{% macro default__get_insert_overwrite_merge_sql(target, source, dest_columns, predicates) -%}
{%- set predicates = [] if predicates is none else [] + predicates -%}
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
{%- set sql_header = config.get('sql_header', none) -%}

{{ sql_header if sql_header is not none }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if we do this, then the sql_header will run twice for "dynamic" (scripting) insert_overwrite materializations. (The first time will be when creating the temp table.) I assume that's ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call. Temporary UDFs can't be defined twice in the same BQ script. Think we're going to need to find an alternative approach here.

I kicked around the idea of making this a materialization-level thing, and while I think that might be a good approach long-term, I don't want to touch every dbt materialization in support of this change. Going to try to figure out a nice way to make this sensible for a very specific use case (insert_overwrite on BigQuery incremental models) and localize the jank as much as possible

@drewbanin
Copy link
Contributor Author

drewbanin commented Mar 16, 2020

Let's get this merged for 0.16.1 - we should consider whether or not to keep this logic in adapter macros, or move it into materializations (the better long-term answer)

Edit: This is not going out in 0.16.1, let's get it merged for 0.17.0

@drewbanin drewbanin changed the base branch from dev/barbara-gittings to dev/octavius-catto April 6, 2020 21:58
@drewbanin drewbanin changed the title (#2136) Fix missing sql_header for incremental models Fix missing sql_header for incremental models Apr 6, 2020
@drewbanin drewbanin requested review from jtcohen6 and beckjake April 7, 2020 00:36
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! We should be sure to note this change in the signature of get_merge_sql in the release notes - plugins should probably know about this/use it (I think spark overrides get_merge_sql).

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fine approach that fixes the problem at hand, which is interpolating the sql_header in the right spots during incremental runs on BigQuery.

Thinking through the implications:

  • This PR does change the contract of get_insert_overwrite_merge_sql by adding the argument include_sql_header. Right now, only the BigQuery adapter uses that macro, and it just uses the default implementation.
  • This PR doesn't change the contract of get_merge_sql, which is good (this is a macro in wider use), but it also means it doesn't have the added lever of include_sql_header. The default__ and snowflake__ implementations of get_merge_sql must then always include the sql_header if provided, meaning that we can only pass model SQL (and never a temp table) as the source of get_merge_sql for those implementations. That's okay based on how they're used today, but it's feasible that we may want to pass a temp relation source someday—something that spark__get_merge_sql does currently.

Thinking far ahead: My personal belief is that get_merge_sql, get_delete_insert_merge_sql, get_insert_overwrite_merge_sql, etc. should eventually converge toward having the same basic contract, and that we should further leverage predicates to (a) handle the idiosyncrasies of each, and (b) enable end users to pass additional custom logic (match conditions, filters, etc). That may be an unrealistic aspiration, though I don't think it's out of proportion with this being some of the trickiest, highest-leverage, and most cost-impactful SQL that dbt generates.

@drewbanin
Copy link
Contributor Author

@beckjake FYI merging this before the tests pass - it was a changelog merge conflict

@drewbanin drewbanin merged commit 68babfb into dev/octavius-catto May 1, 2020
@drewbanin drewbanin deleted the fix/sql-header-in-incrementals branch May 1, 2020 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

set_sql_header works well with 'table', but not 'incremental'
3 participants