-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for sql as a header to create or replace #1967
Add support for sql as a header to create or replace #1967
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really slick implementation! I like it!
I think we should also make sql_header
a "known" config - that will allow the config to be set for whole groups of models from the dbt_project.yml
file. I actually don't have a good use-case in mind, but this isn't necessarily BigQuery specific. I think it makes sense to add this as a "clobber field" over here:
https://github.com/fishtown-analytics/dbt/blob/e51c942e91a94936f68f2965963d3b46f1257658/core/dbt/source_config.py#L11-L18
We'd also probably want to add the {{ sql_header if sql_header is not none }}
to each of postgres/redshift/snowflake too, but that should be pretty straightforward.
For BigQuery (and really every plugin we implement this for), let's definitely include the query header in both bigquery__create_table_as
as well as bigquery__create_view_as
-- BQ views support temporary UDFs, right?
Last, I'd call the sql_header
macro something more explicit, like set_sql_header
. I think that will help clarify the difference between:
{% call sql_header() %}
...
{% endcall %}
and
{{ config(sql_header=my_temp_udf()) }}
both of which are supported in this implementation.
Really cool PR - thanks for opening it!!
@drewbanin Think I got to all of your suggestions. Let me know if 'common.sql' is a reasonable place to put the Attempted to add an integration test for bigquery that creates an innocuous table as a header with a |
Would this work for anything database specific, like BigQuery SQL scripting? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is looking really good! One quick note on the test. Can you also update create_table_as
and create_view_as
for the Snowflake plugin?
@drewbanin Can't believe I would forget Snowflake :/ Better test + Snowflake should be added now. @sdepablos Tried it out and it seems like this works for BigQuery SQL scripting! Originally only had temporary UDFs as a clear use case, but glad there are others. The following should work, as an example:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d'oh @kconvey I think there are a couple more macros to update:
and
These are the "default" materializations which are used if a plugin does not supply its own materialization. I think that, slightly longer term, we're going to want a better answer here than copy/pasting this snippet around everywhere. For now though, if you don't mind making that change, we'll be able to get this one merged :)
Ok - thanks! I just kicked off the tests, but it does look like there's a small merge conflict introduced by the addition of BigQuery labels. Can you take a quick look at that? |
Pulled the latest changes and labeled the column in the integration test (so hopefully that passes). May need you to kick off tests again. Thanks! |
Meep! I'm seeing
in the integration tests. Should we configure that model to be materialized as a table instead of a view? I did't realize this was a limitation of temporary UDFs. I think you can ignore the postgres error - that seems like a network blip to me |
Should be configured as a table now. |
Gosh, lots of little tweaks to make around the test suite here. FYI, we plan on making PRs build for community contributed PRs to help tighten this CI loop. I think we need to bump the number of expected tests from 7 to 8 here. We do this to make sure that all of the specified models in a test actually run, and this PR adds a new model. Check out the code here: And an error from the CI build here:
We'll want to change each instance of |
Should have remembered that one from the labels PR. Thanks for catching! |
/azp run |
@beckjake can you try kicking off the tests here? Azure doesn't seem to want to run this with env vars for me :/ |
@drewbanin I tried to kick it off manually - we'll see if it works, I guess. |
The tests are successfully running here: https://dev.azure.com/fishtown-analytics/dbt/_build/results?buildId=1260&view=results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for your contribution @kconvey!
We've already cut a release candidate for 0.15.1, but this is pretty tightly scoped and well tested so I feel comfortable merging it. It will ship in 0.15.1 after the holidays :D
For future visitors to this issue: I tried a few ways of defining temporary BigQuery functions for use in model definitions, and eventually arrived at the following setup, which you might also enjoy.
Advantages of this method:
Disadvantages of this method:
|
#1879
Any SQL that does not work within
create or replace table
can be extracted and injected ahead ofcreate or replace table
by using a call block like:The resulting SQL will look like:
This is most obviously useful for BigQuery's temporary UDFs, which do not have a perfect dbt analogue. dbt's support for persistent UDFs or use of macros can likely stand-in in most cases, but this feature facilitates dbt execution of SQL that was written (or generated) to BQ's SQL standard, rather than Jinja-templated SQL.