Implement archival with a "merge" statement #1339

drewbanin · 2019-03-06T15:54:01Z

Feature

Feature description

The current implementation of archival (0.13.0) effectively implements a merge using update and insert statements. Instead, we should leverage a merge abstraction (also used by incremental models) to help normalize the implementation of archival.

There are a few benefits of using a merge here:

it is an atomic way of performing archival on databases like Snowflake[1] and BigQuery
the database can presumably do less work if the inserts and updates are specified in the same query
It should greatly simplify the archival materialization sql

Reasons not to do this:

if it ain't broke.....
merge is not implemented for all adapters. We'll need to build out a merge abstraction for redshift/postgres/et al, which could be complicated

[1] Original issue description, snowflake specific

If two archive jobs run simultaneously on Snowflake, duplicate records can be inserted into the archive destination table. This problem can be circumvented with a merge.

The problem here is that archival is currently implemented as:

create temp table
insert
update

If two jobs run at the same time, they will both create identical temp tables (how do these not conflict?). When the jobs proceed to insert/update data, they will both duplicate work in the insert + update steps, resulting in duplicated data being inserted into the destination table.

Because a proper Snowflake merge would happen as a single atomic operation, two merges that are serialized would still result in the intended behavior. In this approach, dbt wouldn’t use a temp table. Instead, the merge would be responsible for finding new records to merge, inserting, and updating all at once. The second serialized merge would find no changes to merge, and would exit without modifying the destination table.

The text was updated successfully, but these errors were encountered:

tayloramurphy · 2019-03-06T16:51:02Z

Merge docs for reference https://docs.snowflake.net/manuals/sql-reference/sql/merge.html

Does dbt archive have tests as well for the archived? Would nice to either have built in ones or be able to specify custom ones.

drewbanin · 2019-03-06T16:53:04Z

We have a whole slew of archive-related issues coming down in 0.13.1: https://github.com/fishtown-analytics/dbt/issues?q=is%3Aopen+is%3Aissue+label%3Aarchive

I don't think testing for archived tables exists yet, though that will certainly be doable once they act like more proper dbt resources! Check out this issue in particular. I just added a note to that issue to investigate testing for archives

drewbanin added enhancement New feature or request estimate: 8 snapshots Issues related to dbt's snapshot functionality labels Mar 6, 2019

drewbanin added this to the Wilt Chamberlain milestone Mar 6, 2019

drewbanin self-assigned this Mar 6, 2019

drewbanin changed the title ~~Use a merge statement for archival on Snowflake~~ Implement archival with a "merge" statement Mar 23, 2019

drewbanin mentioned this issue May 28, 2019

Use merge pattern for Archival queries #1478

Merged

drewbanin closed this as completed in #1478 Jun 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement archival with a "merge" statement #1339

Implement archival with a "merge" statement #1339

drewbanin commented Mar 6, 2019 •

edited

Loading

tayloramurphy commented Mar 6, 2019

drewbanin commented Mar 6, 2019

Implement archival with a "merge" statement #1339

Implement archival with a "merge" statement #1339

Comments

drewbanin commented Mar 6, 2019 • edited Loading

Feature

Feature description

[1] Original issue description, snowflake specific

tayloramurphy commented Mar 6, 2019

drewbanin commented Mar 6, 2019

drewbanin commented Mar 6, 2019 •

edited

Loading