-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework insert_overwrite incremental strategy #2198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments here - let me know what you think @jtcohen6 !
plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql
Outdated
Show resolved
Hide resolved
@jtcohen6 I played around with this some more today. Are you opposed to updating the Right now, I'm seeing a generated merge statement like this:
but we could instead make this merge statement run:
This just has the benefit of avoiding a weird failure mode if the user should specify a list of partitions to overwrite, but they generate a SQL select statement which returns an incongruent set of partitions. What do you think? |
Not opposed! I just spent a few minutes trying to figure out what would happen to that incongruous data. I don't think it would go... anywhere? Better to make that explicit. @drewbanin As far as the specific implementation: Given that we define |
@jtcohen6 can you make a separate issue to add that check? We should ship this PR as-is for 0.16.0, but:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it
Opened #2210 I can't merge until the Azure pipeline completes. Is it stuck / did it ever kick off? I pushed the most recent commit 2 days ago |
I logged in and manually kicked it, I think it is running now. |
Ok, lesson learned - kicking it made github act weird and now it only wants tests on the merge commit instead of the final commit... |
resolves #2196
Description
Reimplement
insert_overwrite
incremental strategy on BigQuery, with two options:partitions
model configBenchmarks calculated in this sheet. TL;DR:
incremental_overwrite
with static (user-supplied) partitions is always cheapest and usually fastestmerge
(default) with a cluster key is faster and cheaper at small data volumesinsert_overwrite
dynamic is always slowest, but its cost scales better than eithermerge
at larger data volumes (> 50 GB)Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.