-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cardinality equality schema test #35
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you @dwallace0723!
@@ -0,0 +1,28 @@ | |||
{% macro test_cardinality_equality(model, from, to, field) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwallace0723 the from
, to
, and field
args can be named arbitrarily -- we used these for the referential integrity test, but you can pick other names if you'd like!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably just leave them as is. The current naming makes logical sense to me. LMK if you think there are more suitable arg names for this specific test.
|
||
with table_a as ( | ||
select | ||
count(1) as num_rows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the dimensions before the aggregates here (eg. group by 1
, also in the table_b
cte)
Any reason to use count(1)
instead of count(*)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: count(1)
instead of count(*)
, this is strictly habitual to keep myself out of the practice of selecting *. I'm relatively positive it makes no performance difference whatsoever. Will gladly change to count(*)
if it better fits the dbt style guidelines.
from ( | ||
select * | ||
from table_a | ||
except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is groovy! I think the except
set function is asymmetrical -- table_a - table_b
is different than table_b - table_a
. As a result, this test would "pass" if table_b
contained records which were not present in table_a
, ie. if table_b
is a superset of table_a
.
There may be a simpler way to do this, but I've done something like this before:
-- table_a and table_b are equal if this union results in 0 rows
(
select * from table_a
except
select * from table_b
)
union all
(
select * from table_b
except
select * from table_a
)
This is cool (for some value of cool) b/c it's essentially the definition of set equality:
A = B if A is a subset of B, and B is a subset of A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It does feel a little bit strange that there isn't a simpler way to do this than a union all
back to you @drewbanin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more comments. Pending those, this is looking good to me
|
||
table_b as ( | ||
select | ||
{ field }}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo here
union all | ||
select * | ||
from except_b | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you alias this subquery, then this schema test will also work on Postgres. At present, it errors out with:
subquery in FROM must have an alias
LINE 36: from (
^
HINT: For example, FROM (SELECT ...) [AS] foo.
So just change it to
) as sbq
and you should be good! Or, you could make another CTE for the subquery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I just created an additional CTE for the subquery.
Cool, changes made @drewbanin 👍 |
Groovy! Thanks for contributing @dwallace0723 -- merging this now :) |
This custom schema test macro checks to see if the cardinality of one field in a model exactly matches the cardinality of a chosen field in a different model.