[Feature] Should dbt_metrics support null-safe joins? In other words, is null = null? #164

callum-mcdata · 2022-10-28T17:41:59Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As shown in some slack threads and some Github Issues, some customers are running into issues where they expect their NULL values to be correctly represented in the output. This is not currently supported in dbt_metrics because we don't use NULL-SAFE operators for joins. Should we support this as a config option or macro parameter?

Describe alternatives you've considered

The easy alternative to this is to just state that null values need to be overwritten for metric calculations.

Who will this benefit?

Anyone who has null values in their dimensions

Are you interested in contributing this feature?

Maybe

Anything else?

No response

josepfranquetf · 2023-01-04T10:38:51Z

Hi @callum-mcdata, I think it would make sense to support null-safe joins. A sensible case would be when you have a derived metric which is a division between two metrics. If you have only one dimension which has a null value (not strange), the metric could not be calculated. I really think that being worried of nulls can create friction to the user.

callum-mcdata · 2023-01-05T17:32:09Z

Hey @josepfranquetf. Do you mind laying out an example with data for the use case that you mentioned above? I'm not sure I fully understand the implications on derived metrics. I've got some opinions in this area after testing of my own but want to make sure I hear from users before settling on a path.

josepfranquetf · 2023-01-09T09:57:42Z

Hey @callum-mcdata. Imagine that you are running a metrics.calculate macro with one dimension (marketing_source) and a derived metric (cac), which is calculated as investment/number_of_customers. Additionally, marketing_source takes 3 different values: Facebook, Google and null.
In this case, cac would be null when marketing_source is null because null-safe joins are not supported (Although, investment and number_of_customers have positive values when marketing_source is null).

opalh · 2023-01-09T16:39:06Z

Hey @callum-mcdata, I was struggling with this issue before managed to understand what was happening. From my point of view, since metrics are created to have multiple dimensions it is problematic that you lose rows if one of the dimensions is null and for me, it doesn't seem reasonable to replace all nulls in my columns to be able to use metrics with dimensions.
Also, it was quite confusing to understand where the difference is coming from since the null rows were there but they had zero in the metric.

callum-mcdata · 2023-01-09T22:44:56Z

No commitment that we'll be implementing this but I wanted to dig into the method that we would use to resolve this issue.

Using A IS [ NOT ] DISTINCT FROM B

All righty, let's do some digging here for all our supported adapters:

Snowflake: Documentation here. An example of this can be found in this StackOverflow.
Redshift: Unable to find any documentation on a native function to accomplish this.
BigQuery: Documentation here.
Databricks: Documentation here.
Postgres: Documentation here.

I suspect that this is the best syntax to use that is consistent across all the supported adapters. We could potentially need to figure out an alternative solution for Redshift but that is fine.

callum-mcdata added the enhancement New feature or request label Oct 28, 2022

callum-mcdata added the awaiting_response Waiting on a response label Jan 5, 2023

github-actions bot added triage and removed awaiting_response Waiting on a response labels Jan 5, 2023

callum-mcdata mentioned this issue Feb 23, 2023

[CT-2137] [Feature] Cross-database implementation of is_distinct_from dbt-labs/dbt-core#6997

Open

3 tasks

callum-mcdata closed this as completed Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Should dbt_metrics support null-safe joins? In other words, is null = null? #164

[Feature] Should dbt_metrics support null-safe joins? In other words, is null = null? #164

callum-mcdata commented Oct 28, 2022

josepfranquetf commented Jan 4, 2023 •

edited

Loading

callum-mcdata commented Jan 5, 2023

josepfranquetf commented Jan 9, 2023

opalh commented Jan 9, 2023 •

edited

Loading

callum-mcdata commented Jan 9, 2023 •

edited

Loading

[Feature] Should dbt_metrics support null-safe joins? In other words, is null = null? #164

[Feature] Should dbt_metrics support null-safe joins? In other words, is null = null? #164

Comments

callum-mcdata commented Oct 28, 2022

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

josepfranquetf commented Jan 4, 2023 • edited Loading

callum-mcdata commented Jan 5, 2023

josepfranquetf commented Jan 9, 2023

opalh commented Jan 9, 2023 • edited Loading

callum-mcdata commented Jan 9, 2023 • edited Loading

Using A IS [ NOT ] DISTINCT FROM B

josepfranquetf commented Jan 4, 2023 •

edited

Loading

opalh commented Jan 9, 2023 •

edited

Loading

callum-mcdata commented Jan 9, 2023 •

edited

Loading