-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2701] Snapshot dbt_updated_at not updated #7869
Comments
Thanks for opening this @patkearns10! See below for the reproducible example I'm using (with dbt-postgres) that shows the same thing that you are reporting. But the thing I'm wondering: is this actually a problem that needs to be solved in the implementation of snapshots that use the
|
id | updated_at | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|
1 | 2023-01-01 00:00:00 | 5aec37f35c393a7ef... | 2023-01-01 00:00:00 | 2023-01-01 00:00:00 |
Take a 2nd snapshot:
dbt snapshot --vars '{"my_data_version": 2}'
dbt show --inline "select * from {{ ref('my_snapshot') }}"
id | updated_at | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|
1 | 2023-01-01 00:00:00 | 5aec37f35c393a7ef... | 2023-01-01 00:00:00 | 2023-01-01 00:00:00 | 2023-01-02 00:00:00 |
1 | 2023-01-02 00:00:00 | 8ff9567d22a84a2c9... | 2023-01-02 00:00:00 | 2023-01-02 00:00:00 |
Here's the default implementation, which we can see only updates the dbt_valid_to
(but not dbt_updated_at
) for rows that are no longer valid:
dbt-core/core/dbt/include/global_project/macros/materializations/snapshots/snapshot_merge.sql
Lines 14 to 18 in f767943
when matched | |
and DBT_INTERNAL_DEST.dbt_valid_to is null | |
and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete') | |
then update | |
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to |
Here's the associated implementation for dbt-postgres:
dbt-core/plugins/postgres/dbt/include/postgres/macros/materializations/snapshot_merge.sql
Lines 5 to 10 in f767943
update {{ target }} | |
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to | |
from {{ source }} as DBT_INTERNAL_SOURCE | |
where DBT_INTERNAL_SOURCE.dbt_scd_id::text = {{ target }}.dbt_scd_id::text | |
and DBT_INTERNAL_SOURCE.dbt_change_type::text in ('update'::text, 'delete'::text) | |
and {{ target }}.dbt_valid_to is null; |
If we do nothing, we should update the documentation. In my opinion it does not make sense that the I do understand this is changing something that has been around for 4 years, so there is probably some concern there. |
Agreed that it's been implemented this way for 4+ years and it might do more harm than good to change it. At the very least, we'd need to understand all the implications extremely thoroughly before considering making such a change. What row-level data point would a user want that they can't get right now? Right now, it looks like coalesce(`dbt_valid_to`, `dbt_valid_from`) If you are always looking for a "system time" (a la SQL:2011), then that is not possible with the * When using invalidate_hard_deletes and a key is hard-deleted, I'm leaning heavily towards leaving the implementation as-is and updating the documentation. We have a listing of gotchas related to snapshots listed here, and the situation you are bringing up looks like it would another good entry for that list. |
@patkearns10 Thanks again for noticing this behavior and posting an elucidating write-up 🤩 Since there would be more risk than reward to change the behavior, I'm going to close this as An option in user spaceIf a user needs/wants the data to be populated differently, they can override the key macro in their dbt project like this (if they are using Postgres): {% macro postgres__snapshot_merge_sql(target, source, insert_cols) -%}
{%- set insert_cols_csv = insert_cols | join(', ') -%}
update {{ target }}
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to
from {{ source }} as DBT_INTERNAL_SOURCE
where DBT_INTERNAL_SOURCE.dbt_scd_id::text = {{ target }}.dbt_scd_id::text
and DBT_INTERNAL_SOURCE.dbt_change_type::text in ('update'::text, 'delete'::text)
and {{ target }}.dbt_valid_to is null;
insert into {{ target }} ({{ insert_cols_csv }})
select {% for column in insert_cols -%}
DBT_INTERNAL_SOURCE.{{ column }} {%- if not loop.last %}, {%- endif %}
{%- endfor %}
from {{ source }} as DBT_INTERNAL_SOURCE
where DBT_INTERNAL_SOURCE.dbt_change_type::text = 'insert'::text;
{% endmacro %} The logic to use would vary by adapter. If an adapter doesn't provide their own override of |
[Preview](https://deploy-preview-3540--docs-getdbt-com.netlify.app/docs/build/snapshots#snapshot-meta-fields) ## What are you changing in this pull request and why? The values in the examples for `dbt_updated_at` weren't an accurate portrayal. See dbt-labs/dbt-core#7869 for context. ### 🎩 <img width="400" alt="image" src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/04170f33-3317-4d66-86f4-ce166bfa1b50"> Formatting of some timestamps was fixed as well: <img width="200" alt="image" src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/7bd71f6d-c9b3-4c5a-974a-044d8f0a2773"> ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [About versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) so my content adheres to these guidelines.
hi @patkearns10 and @dbeatty10, thanks for this post, i had the same issue and fix this that way. however by overriding the macro snapshot_merge_sql in he dbt project, it looks like this affect all existing strategies (timestamp, check, custom strategy created). Is it possible to change the default behaviour of the macro only for one strategy (for example the custom strategy created) and not the others in he dbt project ? Thanks in advance |
@ntsteph it sounds like you've created a custom snapshot strategy, and you're wondering if you can change the default behavior of If so, you'd need to override the definition of the snapshot materialization for your adapter (here for dbt-postgres). So although it's possible, there's no "easy" way to override the logic for |
Problem 7:
|
Is this a new bug in dbt-core?
Current Behavior
This was brought up by a customer and I am kinda surprised I have never heard this before.
Our docs show
two fields in the existing row should be updated when the
updated_at
timestamp is changed signifying a change of some value, thedbt_valid_to
and thedbt_updated_at
Looks like only one update is happening and the
dbt_updated_at
does not get updated.Expected Behavior
According to the docs, the older row should have two fields updated:
dbt_valid_to
dbt_updated_at
Steps To Reproduce
1 as id, '2023-01-01' as updated_at
1 as id, '2023-01-02' as updated_at
dbt_updated_at
has been modified in the original rowRelevant log output
No response
Environment
Which database adapter are you using with dbt?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: