-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for column tests not rendering on quoted columns #425
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is odd that we resolve the quoting before storing the values in the manifest, rather than hanging onto the original (unquoted) column_name
and quote: true
parameter
src/app/services/project_service.js
Outdated
// strip quotes from start and end of test column if present in both locations | ||
// this is necessary to attach a test to a column when `quote: true` is set for a column | ||
let test_column_name = test_column; | ||
if (test_column.startsWith('"') && test_column.endsWith('"')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote characters are different on different adapters. E.g. on dbt-bigquery
it will be:
"column_name": "`id`",
It is less common to quote columns on BQ, since it doesn't generally support spaces / special characters — but it does still support reserved keywords (e.g. from
) as column names if quoted.
I'm not sure if there's a cleverer regex check we could/should be doing here instead of hard-coding "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed another commit here. I tried to go the regex route, but i think it was a little too complicated. The specific problem is that I wanted to ensure that the start and end quote chars are the same (eg. we shouldn't strip quotes from the string "some_column`
). I ended up enumerating different quote chars based on data platform in the javascript. I made the backtick BQ-only, which is not correct, but wondering if it's close enough?
Other thoughts:
- I am very open to other suggestions/ideas on how to handle this, or just a statement that "this ain't it" and i'll have a harder think about it :)
- Can you confirm my assumption that the
metadata.adapter_type
field will read "bigquery" on bigquery? - Really reinforcing my desire for some sort of adapter-specific info within the manifest. That could look like including quote chars in the
metadata
blob to avoid needing to encode this sort of info in js!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark
& databricks
also use backticks as quotes :)
Agree this isn't really scalable to other adapters! It could make sense to include this kind of "static" adapter-specific info somewhere in the manifest — though if I had to pick between doing that, and including the original unquoted column_name
value on the test node, I have a slight preference for the latter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the day - I'm fine with this approach for now. It will handle a real edge case for the vast majority of folks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - just added the same logic for spark + databricks and a test too! 🏓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! spun it locally, and it spins just fine :)
press "go" at your leisure, and then trigger this action (unless you'd rather I do it)
resolves #201
Description
Quoted columns are rendered with explicit quotes in the manifest.json file. These quotes prevent the dbt-docs site from matching a column name from the manifest (attached to a model) with a test node relevant to that column. Here's an example test node in the manifest for a quoted column:
It is probably worth exploring a change to the dbt-core implementation in the future. I imagine we could instead leave the column_name field unquoted, but include a boolean
column_is_quoted
flag, or similar. That would make string processing a little bit easier, as well as help us account for different quoting styles across data platforms more readily.In the meantime, this PR works by stripping quotes from the
column_name
before performing a case-insensitive match across model node + test node.Checklist
changie new
to create a changelog entry