Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recency test to work properly with day datepart #708

Closed
wants to merge 6 commits into from

Conversation

yuanminglee
Copy link

resolves #414

This is a:

  • documentation update
  • bug fix with no breaking changes
  • [] new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

  • When datepart is day or greater, the recency test should compare against the current date instead of current timestamp
  • The test wasn't working with timestamp columns in BigQuery before (since you can't compare a timestamp with datetime). This PR fixes that by converting timestamp to datetime first.
  • Note: this PR was done in collaboration with @rodrigoaps

Checklist

  • This code is associated with an Issue which has been triaged and accepted for development.
  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt.type_* macros instead of explicit datatypes (e.g. dbt.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

deanna-minnick and others added 3 commits October 10, 2022 14:29
* add safe_divide documentation

* add safe_divide macro

* add integration test for safe_divide macro

* moved macro and documentation to new SQL generator section

Co-authored-by: Grace Goheen <[email protected]>
@joellabes joellabes changed the base branch from main to utils-v1 October 27, 2022 02:59
Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great @yuanminglee! A couple of comments and potential changes, let me know what you think!

I'm aiming to cut the release candidate for 1.0 at the start of next week, so please also let me know if you won't be able to get to this before then.

macros/generic_tests/recency.sql Outdated Show resolved Hide resolved
macros/generic_tests/recency.sql Outdated Show resolved Hide resolved
macros/generic_tests/recency.sql Show resolved Hide resolved
@yuanminglee yuanminglee requested a review from joellabes October 28, 2022 02:54
@yuanminglee
Copy link
Author

This is looking great @yuanminglee! A couple of comments and potential changes, let me know what you think!

I'm aiming to cut the release candidate for 1.0 at the start of next week, so please also let me know if you won't be able to get to this before then.

Thanks @joellabes! I've made the suggested change, and removed the unused line 😃

commit 04536a1
Author: Joel Labes <[email protected]>
Date:   Wed Nov 23 08:30:34 2022 +1300

    Merge main into utils-v1 (dbt-labs#726)

    * Feature/safe divide (dbt-labs#697)

    * add safe_divide documentation

    * add safe_divide macro

    * add integration test for safe_divide macro

    * moved macro and documentation to new SQL generator section

    Co-authored-by: Grace Goheen <[email protected]>

    * Revert "Feature/safe divide (dbt-labs#697)" (dbt-labs#702)

    This reverts commit f368cec.

    * Quick nitpicks (dbt-labs#718)

    I was doing some studying on these and spotted some stuff. One verb conjugation and a consistency in macro description

    Co-authored-by: deanna-minnick <[email protected]>
    Co-authored-by: Grace Goheen <[email protected]>
    Co-authored-by: ian-fahey-dbt <[email protected]>

commit 2703459
Author: fivetran-catfritz <[email protected]>
Date:   Tue Nov 1 20:26:17 2022 -0500

    Slugify for snowflake (dbt-labs#707)

commit 17b017d
Merge: 31c503d b08f7bb
Author: Joel Labes <[email protected]>
Date:   Thu Oct 27 16:51:56 2022 +1300

    Merge branch 'heads/0.9.3' into utils-v1

commit b08f7bb
Author: Joel Labes <[email protected]>
Date:   Thu Oct 27 15:28:55 2022 +1300

    Wrap xdb warnings in if execute block

commit 586f278
Author: Joel Labes <[email protected]>
Date:   Thu Oct 27 15:27:57 2022 +1300

    Change deprecation resolution advice

commit 4fae6c5
Author: Joel Labes <[email protected]>
Date:   Thu Oct 27 15:19:27 2022 +1300

    Switch to dbt.escape_single_quotes
Comment on lines +13 to +14
{#- Use a date-based threshold if column type is DATE or datepart is day or greater #}
{%- if column_type == 'DATE' and datepart in ['year', 'quarter', 'month', 'week', 'day'] %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and the code disagree here (one says and, the other says or) - which one did you intend to implement?

Regardless, can you say more about why the checking of the datepart is even necessary? The original issue (#414) only talks about issues caused by the datatype of the column being date-only. Dropping the time components for all thresholds will return inaccurate results. Consider this situation:

      - dbt_utils.recency:
          datepart: day
          field: my_datetime_column
          interval: 1

As I write this, it's 2022-11-23T04:57:19Z. By dropping the time component even when it's available, the threshold will be 2022-11-22T00:00:00Z instead of T04:57:19Z, so the validity window is 5 hours larger, which would lead to false negatives. For example a record created at 1am would be considered valid even though it's actually 4 hours too old.

Suggested change
{#- Use a date-based threshold if column type is DATE or datepart is day or greater #}
{%- if column_type == 'DATE' and datepart in ['year', 'quarter', 'month', 'week', 'day'] %}
{#- Use a date-based threshold if column type is DATE #}
{%- if column_type == 'DATE' %}

Let me know if I'm missing something!

@joellabes
Copy link
Contributor

@yuanminglee I spent some more time noodling on this today, and have changed my mind from what I put on the original issue, where I said:

We could ask the user to specify the granularity to which we truncate threshold, but it would be nicer for them if we got the column and used its data_type inside the macro instead.

This falls down in a couple of places, most significantly the fact that it doesn't work for ephemeral models where we can't get the column type. We could just use the old behaviour for CTEs, but a core tenet of dbt is that you can change materializations without any impacts to other parts of the experience. If someone changed to/from an ephemeral model, their tests would behave differently which I don't like.

Alternatively we could refuse to test ephemeral models, but there's no good justification for that.

Soooo instead I think we should just add an optional setting on the test (called something like ignore_time_component) which determines whether it should truncate to the day level or not. When that's set to true, I think we should truncate both columns - it would be weird for an ignore_time_component command to only impact one of the columns. It should basically never come up because I imagine people will only use this when their column is a date only, but I couldn't think of a concise way to express that so it'll range from a no-op to "consistent with what it says on the tin" if you do use it on a datetime column.

@joellabes joellabes mentioned this pull request Nov 25, 2022
17 tasks
@joellabes
Copy link
Contributor

Closing in favour of #731 - sorry for all the hassle @yuanminglee!

@joellabes joellabes closed this Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recency does not work as expected for datepart day
3 participants