Add an archival strategy that operates by checking for column value diffs by PK #706

drewbanin · 2018-03-26T19:40:50Z

Edit: See the 'check' strategy defined in #1175

dbt's implementation of archive could be scoped to specific columns. If these columns haven't changed between invocations of dbt archive, then new rows would not need to be inserted.

Most of this logic can be implemented in the materialization, though we'll also need to add a field to the archive: block in the dbt_project.yml file.

The text was updated successfully, but these errors were encountered:

drewbanin · 2018-04-26T18:22:02Z

use archive block?
specify key columns to check?

mplovepop · 2018-05-09T18:27:33Z

Possibly separate issues, but I was thinking along two lines: 1. A column excludes list. Our ETL brings in dozens of denormalized columns per order that would bloat the archive to no purpose. 2. Column data transform. One of our ETL brings in datetimes as varchars and it would be nice to massage into a timestamp first. I already do this with the updated_at option: coalesce(nullif(updated_at, ''), '2015-01-01T00:00:00+00:00')::timestamptz

drewbanin · 2018-05-09T20:05:57Z

@mplovepop our current thinking involves 1) a new "archive" block and 2) archiving a query, instead of a table. This might look like:


{% archive(target_schema='dbt_archived', target_table='orders_archived') %}

select
  id,
  status,
  coalesce(nullif(updated_at, ''), '2015-01-01T00:00:00+00:00')::timestamptz as updated_at

from source_data.status

{% endarchive %}

Still some more thinking required here about the exact interface, but do you buy the general approach? I think it might work for the use cases you mentioned here

mplovepop · 2018-05-09T20:34:30Z

Where would that go? I like the general approach and maybe require certain archive column names, _dbt_unique_key and _dbt_updated_at that would be used when computing _dbt_sdc_id, _dbt_valid_from, _dbt_valid_to (as a suggestion to go along with another archive issue I saw here).

…trategy Add the check archive strategy (#706)

drewbanin added enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! labels Mar 26, 2018

drewbanin mentioned this issue Apr 26, 2018

bad archive error when required args are not supplied #671

Closed

drewbanin mentioned this issue Jul 11, 2018

Allow archival of tables from one database.schema to a different database.schema #838

Closed

drewbanin added the snapshots Issues related to dbt's snapshot functionality label Jul 11, 2018

drewbanin added this to the Wilt Chamberlain milestone Nov 28, 2018

drewbanin mentioned this issue Dec 5, 2018

Implement archival with blocks #1175

Closed

drewbanin changed the title ~~Make archive column-aware~~ Add an archival strategy that operates by checking for column value diffs by PK Mar 23, 2019

beckjake mentioned this issue Apr 3, 2019

Feature/archive blocks #1361

Merged

beckjake mentioned this issue Apr 10, 2019

Add the check archive strategy (#706) #1394

Merged

beckjake added a commit that referenced this issue Apr 12, 2019

Merge pull request #1394 from fishtown-analytics/feature/check-cols-s…

bf0f909

…trategy Add the check archive strategy (#706)

beckjake closed this as completed in #1361 Apr 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an archival strategy that operates by checking for column value diffs by PK #706

Add an archival strategy that operates by checking for column value diffs by PK #706

drewbanin commented Mar 26, 2018 •

edited

Loading

drewbanin commented Apr 26, 2018

mplovepop commented May 9, 2018 •

edited

Loading

drewbanin commented May 9, 2018

mplovepop commented May 9, 2018

Add an archival strategy that operates by checking for column value diffs by PK #706

Add an archival strategy that operates by checking for column value diffs by PK #706

Comments

drewbanin commented Mar 26, 2018 • edited Loading

drewbanin commented Apr 26, 2018

mplovepop commented May 9, 2018 • edited Loading

drewbanin commented May 9, 2018

mplovepop commented May 9, 2018

drewbanin commented Mar 26, 2018 •

edited

Loading

mplovepop commented May 9, 2018 •

edited

Loading