-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt Snapshots not working correctly when switching from "check" to "timestamp" strategy #2350
Comments
We think this may be a factor of how Here is the SQL:
|
Hey @jrandrews - thanks for writing this issue up! It's very complete and very well written! I think the issue here is indeed the null value for where snapshotted_data.dbt_unique_key is null
or (
snapshotted_data.dbt_unique_key is not null
and snapshotted_data.dbt_valid_to is null
and (
-- This line is probably the problem
(snapshotted_data.MetaDataLastUpdatedTime < source_data.MetaDataLastUpdatedTime)
)
) You're correct in saying that we'd expect the historical There is always so much to consider when changing the snapshot code, but I suspect we can simply change this equality from:
to
We can guarantee that
So, I don't believe this change would manifest as a functional change for any snapshot users out there, and it should fix the issue you're seeing. Let me know if you buy all of that, or if you can think of anything I might be missing here. |
@drewbanin thanks for the review and thoughts on this. Your approach sounds reasonable. Honestly I haven't read the details of the snapshot code thoroughly and I'll rely on your good judgement as it seems like it's pretty involved code. I don't this fix would do anything to repair the existing data problem we have, right? We'll need to clean that up ourselves if so. |
@jrandrews unfortunately no, you'd need to manually repair the snapshot table yourself |
I think we've managed to fix it by doing this to all of the tables in question for now (although of course it will reoccur if we ever switch another table from check to snapshot in the future so the code fix will definitely be helpful): UPDATE snapshot.my_snapshot SET MetaDataLastUpdatedTime = dbt_updated_at WHERE MetaDataLastUpdatedTime IS NULL |
closed by #2391 |
Describe the bug
We have found cases where, using the
dbt snapshot
command, snapshots are not updating appropriately even though new source data rows have arrived which should cause snapshots to update.Steps To Reproduce
Expected behavior
I would expect, regardless of whether switching back and forth between "check" and "timestamp" strategies for snapshots, that new data coming in from the source would appropriately and correctly be inserted into the snapshot, and existing rows would be end-dated appropriately.
There may also be some larger issues here around temporality as described in this dbt discourse article and how dbt snapshots uses and tracks timestamps even in the dbt_* columns based on what snapshot strategy is being used: https://discourse.getdbt.com/t/dbt-bitemporality-and-snapshots/1067 (although I am absolutely not asking for full bi-temporal support here.)
Screenshots and log output
See screenshot attached of a) our snapshot table and b) the incoming row which should be inserted in the snapshot but isn't. Employee name blurred out for obvious reasons.
System information
Which database are you using dbt with?
BigQuery
The output of
dbt --version
:The operating system you're using:
Mac OS X Mojave 10.14.6
Same issue happens also when running from dbt cloud
The output of
python --version
:Python 2.7.16
Additional context
This is a pretty major issue for us, we're losing existing history because of this and looking at also having to throw away a bunch of previously gathered history from when we were using the "check" strategy.
The text was updated successfully, but these errors were encountered: