Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cardinality equality schema test #35

Merged
merged 6 commits into from
Jan 4, 2018

Conversation

dwallace0723
Copy link
Contributor

This custom schema test macro checks to see if the cardinality of one field in a model exactly matches the cardinality of a chosen field in a different model.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you @dwallace0723!

@@ -0,0 +1,28 @@
{% macro test_cardinality_equality(model, from, to, field) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwallace0723 the from, to, and field args can be named arbitrarily -- we used these for the referential integrity test, but you can pick other names if you'd like!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably just leave them as is. The current naming makes logical sense to me. LMK if you think there are more suitable arg names for this specific test.


with table_a as (
select
count(1) as num_rows,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the dimensions before the aggregates here (eg. group by 1, also in the table_b cte)

Any reason to use count(1) instead of count(*) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: count(1) instead of count(*), this is strictly habitual to keep myself out of the practice of selecting *. I'm relatively positive it makes no performance difference whatsoever. Will gladly change to count(*) if it better fits the dbt style guidelines.

from (
select *
from table_a
except
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is groovy! I think the except set function is asymmetrical -- table_a - table_b is different than table_b - table_a. As a result, this test would "pass" if table_b contained records which were not present in table_a, ie. if table_b is a superset of table_a.

There may be a simpler way to do this, but I've done something like this before:

-- table_a and table_b are equal if this union results in 0 rows
(
  select * from table_a
  except
  select * from table_b
)

union all

(
  select * from table_b
  except
  select * from table_a
)

This is cool (for some value of cool) b/c it's essentially the definition of set equality:
A = B if A is a subset of B, and B is a subset of A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It does feel a little bit strange that there isn't a simpler way to do this than a union all

@dwallace0723
Copy link
Contributor Author

back to you @drewbanin

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments. Pending those, this is looking good to me


table_b as (
select
{ field }},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo here

union all
select *
from except_b
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you alias this subquery, then this schema test will also work on Postgres. At present, it errors out with:

subquery in FROM must have an alias
  LINE 36: from (
                ^
  HINT:  For example, FROM (SELECT ...) [AS] foo.

So just change it to

) as sbq

and you should be good! Or, you could make another CTE for the subquery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I just created an additional CTE for the subquery.

@dwallace0723
Copy link
Contributor Author

Cool, changes made @drewbanin 👍

@drewbanin
Copy link
Contributor

Groovy! Thanks for contributing @dwallace0723 -- merging this now :)

@drewbanin drewbanin merged commit af8102f into dbt-labs:master Jan 4, 2018
@dwallace0723 dwallace0723 deleted the add-cardinality-test branch January 4, 2018 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants