Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional Dependency Test Macro #989

Open
edejong-dbc opened this issue Feb 21, 2025 · 0 comments · May be fixed by #988
Open

Functional Dependency Test Macro #989

edejong-dbc opened this issue Feb 21, 2025 · 0 comments · May be fixed by #988
Labels
enhancement New feature or request triage

Comments

@edejong-dbc
Copy link

Describe the feature

It would be valuable to have a functional_dependency generic test in dbt-utils that ensures one or more "determinant" columns uniquely determine another "dependent" column. The test should fail if any determinant combination maps to more than one distinct dependent value. An optional where clause would allow users to scope the test to a subset of records.

This is also especially helpful for hierarchical relationships. For instance:

  • If region → country, ensuring each region only has one corresponding country.
  • If (country, state) → city, ensuring each (country, state) pair only maps to one city.
  • Ensuring each (parent_id, child_id) pair in a hierarchy leads to exactly one “child name” or other dependent attributes.

Describe alternatives you've considered

  • Manual SQL tests: Users currently write custom SQL queries or macros to check for duplicate mappings, but this adds boilerplate and inconsistency across projects.
  • Existing uniqueness tests: While dbt-utils has unique_combination_of_columns, it doesn’t directly verify that those columns determine the value of a separate column.

Additional context

  • This feature is database-agnostic since it primarily involves standard SQL grouping and counting.
  • The test should align with existing dbt-utils generic test conventions, including adapter dispatch and lower-case SQL syntax.

Who will this benefit?

  • Data modelers needing to validate referential integrity, e.g. ensuring (order_id, product_id) always maps to a unique quantity.
  • Teams maintaining dimensional models who want to confirm a dimension’s natural key always leads to a single record.
  • Organizations checking hierarchical or parent-child relationships, where each parent or parent-child pair must map to only one child or child property.

Are you interested in contributing this feature?

Yes, we already have a working macro and example seeds demonstrating passing and failing scenarios. We’d be happy to raise a PR and contribute the code, along with documentation.

@edejong-dbc edejong-dbc added enhancement New feature or request triage labels Feb 21, 2025
@edejong-dbc edejong-dbc linked a pull request Feb 21, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant