-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1362] [Bug] Toggling values when snapshotting a table with a key violation #6089
Comments
Hey @kjstultz - I would also expect Snowflake to raise an error on the That's the default behavior, but if you've set the By any chance, do you know if you have that parameter set for your account / user? |
Interesting, good point @jtcohen6. Currently we have that value always set to "True". After combing through the code, I think the root of it is actually because dbt snapshot doesn't use the base table as the "merging" table. It creates a staged table on top that it uses, and from examining the code submitted, the PK violation in the table breaks the creation of that table. As I look through the code more, I do really think the solution would be checking for a PK violation before the merge statement is initiated. |
@kjstultz Ah - so just to confirm, the snapshot query itself is returning duplicate values of the
This is a fair thought. Today, we say it's the full responsibility of the end user to guarantee the uniqueness of the One way to guarantee this is by creating another model (e.g. ephemeral model), put a Or: Should the snapshot materialization just execute its own "unique test," implicitly, after creating the staging table and before running the |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Forgot about this one, b
Definitely the second one. I'm not a big software development person, but I'd argue adding a parameter to the snapshotting config that defaults to "Check PK violation" to "FALSE" would be fine. A lot of use cases won't need this, but I've worked at enough organizations that have enough exceptions that having this parameter would be really really helpful. |
@kjstultz thanks again for creating this issue! Appreciate that you invested the time at Coalesce in NOLA to figure out how to reproduce this tricky situation. Reading the discussion between you and @jtcohen6, it sounds like the feature discussed thus far would look like this:
Although this proposed feature wouldn't fix a snapshot table that already has duplicates, it would prevent duplicates from being added in the first place! Feels like a win to me, especially with it being configurable for those that (somehow) know that their "unique" key is truly unique. |
Perfect! I think that'd be a great solution. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this a new bug in dbt-core?
Current Behavior
If I have two values for the same primary key in two different rows in a given table that is snapshotted, it toggles between the two whenever I run the snapshot. (Snowflake adapter)
Expected Behavior
When I have a primary key error in a table I am expecting to snapshot, I would expect at least a warning (if not a full failure) if there is a primary key violation in the base table.
Steps To Reproduce
DBT Core 1.0.1
Load test data:
create or replace table sandbox.public.table2(pk int,city string); insert into sandbox.public.table2 values(1,'New York');
Run Snapshotting code:
Run:
dbt snapshot --select DBT_TEST_SNAP_ALT
Insert a key violating record into the table:
insert into sandbox.public.table2 values(1,'NOLA');
Re-run snapshot
Re-run snapshot again
Re-run snapshot again
Resulting table
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered: