Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for column tests not rendering on quoted columns #425

Merged
merged 3 commits into from
Jun 9, 2023

Conversation

drewbanin
Copy link
Contributor

resolves #201

Description

Quoted columns are rendered with explicit quotes in the manifest.json file. These quotes prevent the dbt-docs site from matching a column name from the manifest (attached to a model) with a test node relevant to that column. Here's an example test node in the manifest for a quoted column:

        "test.my_new_project.not_null_quoted_ident__TEST_2_COLUMN_.f762812d7c": {
            "test_metadata": {
                "name": "not_null",
                "kwargs": {
                    "column_name": "\"TEST_2_COLUMN\"",
                    "model": "{{ get_where_subquery(ref('quoted_ident')) }}"
                },
                "namespace": null
            },
            "database": "ANALYTICS",
            ...
            "config": {...},
            "compiled_code": "select \"TEST_2_COLUMN\"\nfrom ANALYTICS.dbt_dbanin.quoted_ident\nwhere \"TEST_2_COLUMN\" is null",
            "column_name": "\"TEST_2_COLUMN\"",
            "file_key_name": "models.quoted_ident",
            "attached_node": "model.my_new_project.quoted_ident"
        }

It is probably worth exploring a change to the dbt-core implementation in the future. I imagine we could instead leave the column_name field unquoted, but include a boolean column_is_quoted flag, or similar. That would make string processing a little bit easier, as well as help us account for different quoting styles across data platforms more readily.

In the meantime, this PR works by stripping quotes from the column_name before performing a case-insensitive match across model node + test node.

Checklist

@cla-bot cla-bot bot added the cla:yes label May 31, 2023
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd that we resolve the quoting before storing the values in the manifest, rather than hanging onto the original (unquoted) column_name and quote: true parameter

// strip quotes from start and end of test column if present in both locations
// this is necessary to attach a test to a column when `quote: true` is set for a column
let test_column_name = test_column;
if (test_column.startsWith('"') && test_column.endsWith('"')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote characters are different on different adapters. E.g. on dbt-bigquery it will be:

  "column_name": "`id`",

It is less common to quote columns on BQ, since it doesn't generally support spaces / special characters — but it does still support reserved keywords (e.g. from) as column names if quoted.

I'm not sure if there's a cleverer regex check we could/should be doing here instead of hard-coding "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed another commit here. I tried to go the regex route, but i think it was a little too complicated. The specific problem is that I wanted to ensure that the start and end quote chars are the same (eg. we shouldn't strip quotes from the string "some_column`). I ended up enumerating different quote chars based on data platform in the javascript. I made the backtick BQ-only, which is not correct, but wondering if it's close enough?

Other thoughts:

  • I am very open to other suggestions/ideas on how to handle this, or just a statement that "this ain't it" and i'll have a harder think about it :)
  • Can you confirm my assumption that the metadata.adapter_type field will read "bigquery" on bigquery?
  • Really reinforcing my desire for some sort of adapter-specific info within the manifest. That could look like including quote chars in the metadata blob to avoid needing to encode this sort of info in js!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark & databricks also use backticks as quotes :)

Agree this isn't really scalable to other adapters! It could make sense to include this kind of "static" adapter-specific info somewhere in the manifest — though if I had to pick between doing that, and including the original unquoted column_name value on the test node, I have a slight preference for the latter

Copy link
Contributor

@jtcohen6 jtcohen6 Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day - I'm fine with this approach for now. It will handle a real edge case for the vast majority of folks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - just added the same logic for spark + databricks and a test too! 🏓

@drewbanin drewbanin requested a review from jtcohen6 June 6, 2023 13:10
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! spun it locally, and it spins just fine :)

press "go" at your leisure, and then trigger this action (unless you'd rather I do it)

@jtcohen6 jtcohen6 merged commit 2f9b8f9 into main Jun 9, 2023
@jtcohen6 jtcohen6 deleted the fix/quoted-columns-not-rendering branch June 9, 2023 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

quote: true for column in .yml properties causes tests not to render in TESTS column in the Column table
2 participants