-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(#525) drop existing relation at end of full-refresh incremental build #1682
(#525) drop existing relation at end of full-refresh incremental build #1682
Conversation
Moving this issue to the LMA milestone. I think more work is required here to fix this same issue on BigQuery and Snowflake. |
Okay! I've approved it as-is because I do think this PR is both a step in the right direction and very reasonable. |
@drewbanin is there any support you need on this? It is the killer feature |
hey @darrenhaken - I'd love your help testing when we have some working code here! |
Of course, we can do BQ testing. Having highly available datasets is a critical feature for us 🙂 |
strong agree @darrenhaken - this particular issue is long overdue! |
@darrenhaken I might do a little more cleanup/refactoring work here, but the --full-refresh atomic replace logic has been implemented for BigQuery in this PR. Feel free to give it a spin and let us know how it goes! |
This is awesome! I’ll take to see if I can do some testing this week. |
@whittid4 FYI |
@drewbanin how does it work with BigQuery? i.e. does it create a temp table first? |
@darrenhaken for a full-refresh build of an incremental model:
so, this is still not atomic for a view --> incremental materialization switch, but I'm unsure that BQ provides any mechanisms for swapping a view for a table atomically. Edit: The logic is here and it's actually pretty readable as far as materialization code goes :) |
0fe4da8
to
95a0587
Compare
I'm assuming the answer is because of restrictions in dbt/BQ/snowflake which I am not familiar with.. but why isnt the workflow to just call the table materialization when a full refresh is required? Is there a need to reimplement the logic in the incremental as well? |
@elexisvenator that's the big idea! I want to accomplish the approach you're describing by making the The Python code in dbt doesn't really know about the materializations that exist in dbt. This is a really neat feature -- it means that each plugin provided for dbt is able to define its own implementation for tables/incrementals/full refreshes/etc. I'd rather push that logic into the materialization layer (and provide good abstractions that can be shared across materializations) than encode this type of information in the dbt Python code. That's just to say: your instinct is a good one, and I want to accomplish it with good abstractions in jinja instead of hard-assumptions in Python! |
@beckjake can you give this another quick look? I think it's ready to roll now. The big change I made since last time involves We should apply similar logic to the table materialization, but I figured that was out of scope for this issue. I'll create a separate issue to address it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I have a couple questions but nothing significant.
from {{ target_relation }} | ||
where ({{ unique_key }}) in ( | ||
select ({{ unique_key }}) | ||
from {{ tmp_relation.include(schema=False, database=False) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this include()
necessary/even correct for all databases? I think whatever made your tmp_relation
should be giving you the correct include policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is such a great catch! Yes - this macro should definitely expect the tmp_relation
to already have a valid include policy for the given database.
@@ -346,6 +346,36 @@ def execute_model(self, model, materialization, sql_override=None, | |||
|
|||
return res | |||
|
|||
@available.parse_none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to be pedantic about types here: This should probably be @available.parse(lambda *a, **k: True)
(or False
).
conf_cluster = [conf_cluster] | ||
|
||
return table_partition == conf_partition \ | ||
and table_cluster == conf_cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does order matter? If not, we should compare set(table_cluster) == set(conf_cluster)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- the order is significant -- clustering works like ordering whereby the table is clustered by the first clustering key, then the second, and so on.
This query fails if you run it twice, swapping the order of the clustering keys on the second run:
create or replace table dbt_dbanin.debug_clustering
partition by date_day
cluster by id, name
as (
select current_date as date_day, 1 as id, 'drew' as name
);
Well this is exciting news! Is this coming earlier than November?
…On Tue, 15 Oct 2019 at 20:44, Drew Banin ***@***.***> wrote:
Merged #1682 <#1682> into
dev/louisa-may-alcott.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1682>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHIY67FQZDG7QBK3GUM6TTQOYFSBANCNFSM4ILQG5UQ>
.
|
@darrenhaken this is shipping in 0.15.0, due in November! We'll have a pre-release ready hopefully by the end of this week :) |
Ok I’ll give the pre-release a whirl 🙂
…On Tue, 15 Oct 2019 at 21:32, Drew Banin ***@***.***> wrote:
@darrenhaken <https://github.com/darrenhaken> this is shipping in 0.15.0,
due in November! We'll have a pre-release ready hopefully by the end of
this week :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1682>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHIY643XBWCKH5WLAAAMETQOYLEXANCNFSM4ILQG5UQ>
.
|
fixes #525
Primary goal: minimize downtime for incremental models run in full-refresh mode
Secondary goal: encapsulate incremental upsert logic across adapters so it can be repurposed in higher-order macros