Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“check” snapshot strategy fails if columns need to be quoted #2975

Closed
1 of 5 tasks
feluelle opened this issue Dec 23, 2020 · 7 comments · Fixed by #5389
Closed
1 of 5 tasks

“check” snapshot strategy fails if columns need to be quoted #2975

feluelle opened this issue Dec 23, 2020 · 7 comments · Fixed by #5389
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! snapshots Issues related to dbt's snapshot functionality

Comments

@feluelle
Copy link
Contributor

feluelle commented Dec 23, 2020

Describe the bug

If you use the "check" snapshot strategy (default) on data which contains columns with special characters in it and do one initial run and then a subsequent insert run. The second run will fail due to a syntax error (see "Screenshots and log output below").

Steps To Reproduce

{% snapshot fake_snapshot2 %}
{{
    config(
      target_database='analytics',
      target_schema='dbt_claire',
      unique_key='id',
      strategy='check',
      check_cols='all'
    )
}}
select
    1 as id,
    1 as "unnamed: 0",
    current_timestamp() as created_at
{% endsnapshot %}

(stolen from slack discussion: https://getdbt.slack.com/archives/CBSQTAPLG/p1608759471264100?thread_ts=1608758278.258300&cid=CBSQTAPLG)

Expected behavior

I expect the snapshot to not fail on columns containing special characters. Special characters should work as long as you enclose it with ".

Screenshots and log output

2020-12-23 21:00:59.335996 (MainThread):   syntax error at or near ":"
2020-12-23 21:00:59.339215 (MainThread):   LINE 81: ..._data._dbt_id is null) or snapshotted_data.Unnamed: 0 != sou...

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

Plugins:
  - postgres: 0.18.1
  - snowflake: 0.18.1
  - bigquery: 0.18.1
  - redshift: 0.18.1

The operating system you're using: MacOS Catalina Version 10.15.7 with Docker for Mac 3.0.3

The output of python --version: Python 3.8.6

cc @clrcrl

@feluelle feluelle added bug Something isn't working triage labels Dec 23, 2020
@clrcrl
Copy link
Contributor

clrcrl commented Dec 23, 2020

I'm pretty sure it's this line here (or the related function — we don't seem to be quoting the column names)
https://github.com/fishtown-analytics/dbt/blob/dev/kiyoshi-kuromiya/core/dbt/include/global_project/macros/materializations/snapshot/strategies.sql#L167

@jtcohen6 jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! snapshots Issues related to dbt's snapshot functionality and removed triage labels Dec 31, 2020
@jtcohen6
Copy link
Contributor

Good call, it does seem like this could be a straightforward fix. There are two ways I think it could be done:

  1. Wrap {{ col }} in adapter.quote() similar to how we create comma-separated column lists for the incremental materialization:

https://github.com/fishtown-analytics/dbt/blob/a0eade4fdd091cd06a0d8e143db2d1a7456d3774/core/dbt/include/global_project/macros/materializations/common/merge.sql#L54-L62

  1. Change the default implementation of get_columns_in_query (used when check_cols = 'all') to return the column's quoted name attribute, instead of just its name:
    https://github.com/fishtown-analytics/dbt/blob/a0eade4fdd091cd06a0d8e143db2d1a7456d3774/core/dbt/include/global_project/macros/adapters/common.sql#L14

That fix feels like it could be a good first issue. The tricky bit would be testing, specifically:

  • Validating on Snowflake, which is sensitive to identifier casing when quoted
  • Adding an automated test to ensure we don't regress from this behavior

I've opened another issue (#2986), bigger in scope, to corral the cacophony of inconsistencies with quoting as they exist today.

@sou-joshi
Copy link

I would like to pick this.

@feluelle
Copy link
Contributor Author

feluelle commented Apr 6, 2021

Sure! Please go ahead :)

@tconbeer
Copy link
Contributor

tconbeer commented Oct 5, 2021

@sou-joshi Have you made any progress on this? If not I'm happy to pick it up.

@wbarth11
Copy link

We just encountered this very same issue. We have a lot of columns with spacing in our Snowflake database. Initial snapshot run materializes perfectly fine, but all subsequent runs fail due to columns not being wrapped with quotation marks. Thanks for working through this one!

@jbowlen
Copy link

jbowlen commented May 5, 2022

Unfortunately a problem for me as well.
I was planning to snapshot my sources, which may have white spaces and special characters in its column names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants