You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing the feature flag invalidate_hard_deletes, I noticed that the column dbt_updated_at has the same value as the column dbt_valid_from, which does not truly represent when the row was last updated.
Steps To Reproduce
execute dbt snapshot
Pick one row, modify the value of a column which you know included in the snapshot configuration
execute dbt snapshot
Delete the same row you changed in step 2.
execute dbt snapshot
Expected behavior
I expect that the column dbt_update_at represents when dbt modified last time the record.
Screenshots and log output
This is the result of our testing
System information
Which database are you using dbt with?
[x ] redshift
The output of dbt --version:
installed version: 0.19.0-b1
latest version: 0.18.1
Your version of dbt is ahead of the latest release!
Plugins:
- redshift: 0.19.0b1
- postgres: 0.19.0b1
- bigquery: 0.19.0b1
- snowflake: 0.19.0b1
Traditionally, snapshots record dbt_valid_from and dbt_updated_at when created. When it's time to update rows to mark them as "expired", the snapshot only change the dbt_valid_to field. We took the same approach for hard deletes: their dbt_valid_to would change to reflect that they're no longer valid, but the value of dbt_updated_at itself would not change.
In large part, this is because dbt_updated_at is for dbt's use, as a fallback mechanism in case a snapshot has switched strategies. (See this comment and #2350 for details.)
The timestamp when this snapshot row was first inserted
This column can be used to order the different "versions" of a record.
dbt_valid_to
The timestamp when this row row became invalidated.
The most recent snapshot record will have dbt_valid_to set to null.
dbt_scd_id
A unique key generated for each snapshotted record.
This is used internally by dbt
dbt_updated_at
The updated_at timestamp of the source record when this snapshot row was inserted.
This is used internally by dbt
As such, I'm hesitant to change the behavior here; I'd rather document and make clear that dbt_valid_to is the field to use for understanding when a hard-deleted record was marked missing, notdbt_updated_at.
I'm going to close this issue for now, but I'm open to hearing your disagreement! The code change involved here would be quite straightforward.
Describe the bug
While testing the feature flag
invalidate_hard_deletes
, I noticed that the columndbt_updated_at
has the same value as the columndbt_valid_from
, which does not truly represent when the row was last updated.Steps To Reproduce
dbt snapshot
dbt snapshot
dbt snapshot
Expected behavior
I expect that the column
dbt_update_at
represents whendbt
modified last time the record.Screenshots and log output
This is the result of our testing
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:The text was updated successfully, but these errors were encountered: