[Bug] Initial models builds with microbatch incremental strategy resulting in row duplication #10924
Closed
2 tasks done
Labels
bug
Something isn't working
incremental
Incremental modeling with dbt
microbatch
Issues related to the microbatch incremental strategy
pre-release
Bug not yet in a stable release
wontfix
Not a bug or out of scope for dbt-core
Is this a new bug in dbt-core?
Current Behavior
When creating a new model using the
microbatch
incremental strategy, the initial tmp table is a full copy of source data and is re-inserted, in full, for every batch. The delete queries are removing data from the approach batch range, but inserts are full-table loads.The impact of this is heavily duplicated data on initial model builds, and increasing batch query times.
Expected Behavior
There's generally two ways I'd expect for this to happen:
full_refresh
, similar to existing incremental models (perhaps not ideal for very large tables).tmp
table creation mirrors theevent_time
windows in the batch and only inserts values for the same period that were deleted. Currently there is noevent_time
filter on these tmp tables (probably ideal).Steps To Reproduce
microbatch
incremental strategy. Ensure a table doesn't already exist with the same name.event-time
conditions, such asdbt build -s model_name
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered: