Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-895] [Spike] Consolidate current_timestamp & associates #5521

Closed
jtcohen6 opened this issue Jun 1, 2022 · 12 comments · Fixed by #5838
Closed

[CT-895] [Spike] Consolidate current_timestamp & associates #5521

jtcohen6 opened this issue Jun 1, 2022 · 12 comments · Fixed by #5838
Assignees
Labels
spike Team:Adapters Issues designated for the adapter area of the code tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality utils Cross-database building blocks
Milestone

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 1, 2022

Split out from dbt-labs/dbt-utils#577 + dbt-labs/dbt-utils#597, resolves dbt-labs/dbt-utils#339

Describe the feature

Rewrite current_timestamp + current_timestamp_utc to be built out of core/plugin functionality (e.g. date_function) and a convert_timezone macro.

Currently in dbt-core, there are three methods/macros all purporting to do the same thing. I've linked the examples from Postgres.

Then, there are the two versions currently living in dbt-utils:

Things we need to figure out

Ideal outcome

When I close my eyes and imagine how this could be really good:

  1. We have one method/macro, defined in dbt-core and implemented by each adapter plugin, that provides the SQL for getting the current timestamp. Whenever we need the current timestamp in a different data type, we use that current_timestamp macro plus a data type conversion function (see Use built-in adapter functionality for datatypes dbt-utils#586).
  2. We have a convert_timezone macro, defined in dbt-core and implemented by each adapter plugin. @clausherther has already written one for the dbt-date package—Claus, would you have any interest in kicking it upstream? :)
  3. It's trivial to write a current_timestamp_in_utc macro that builds on top of 1+2. Whether that lives in dbt-core, dbt-utils, or somewhere else doesn't matter so much to me. Wherever it lives, it should look a lot like dbt_date.now(tz=None). We'd leave the dbt_utils one in place for backwards compatibility.

Questions

Describe alternatives you've considered

Leaving all of it where it is :)

Additional context

Timezones are no fun

This stuff is pretty tricky to test. We're not really testing it currently. I had an idea in dbt-labs/dbt-utils#577 (comment), but it's not a great one. We are indirectly testing our use of snapshot_get_time in dbt-core functional tests (specifically test_hard_delete_snapshot.

Who will this benefit?

  • Users who want to call well-named macros, and expect them to do the thing they're meant to
  • Claus, as maintainer of the dbt-date package (I hope!)
@clausherther
Copy link
Contributor

I have no idea what this means in terms of actual work, but I'm in. 👍

@clausherther
Copy link
Contributor

Couple of unordered thoughts on convert_timezone from dbt_date:

  • Timezone conversion at a minimum needs a target timezone. In dbt_date, we currently require a default project timezone via a global var, e.g. "dbt_date:time_zone": "America/Los_Angeles". That way something like dbt_date.now() by default always returns a "local" time.
  • Some platforms also allow you to specify a source timezone. So, for example, you can convert from something other than UTC to some other non-UTC timezone. So, convert_timezone has both target_tz and source_tz parameters, both optional.
  • In dbt-date, we effectively strip off the tz bundle on Snowflake by casting to a timestamp_ntz. I've found that to be the most sensible, since it IMO it better preserves the converted time.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Jun 2, 2022

Converting to UTC and then casting to timestampntz (datetime in BQ) generally seems like the Way To Go. As analytical practice, it makes sense to pull out the timezone and store as a separate column (local_timezone). Relevant: dbt-labs/docs.getdbt.com#1504

dbeatty10 referenced this issue in dbt-labs/dbt-utils Jun 17, 2022
* Initialize lift + shift, dateadd + datediff

* Install branches of all plugins

* Use dispatch for passthrough behavior

* Only support dbt-core >= 1.2

* Switch to alternative dev branches

* Lift and shift cross-database macros, fixtures, and tests into dbt-core and adapters

* Test `current_timestamp()` adapter definition vs. original dbt-utils

* Split out the `current_timestamp` work into its own separate effort (#599)

* Reinstate original `current_timestamp` implementation

* Reinstate original `type_*` implementations

* Reinstate original `current_timestamp_in_utc` implementation

* Kick out the `current_timestamp` + `type_*` tests

* Preserve `dbt_utils` as one more step along the dispatch train

* Remove branch identifiers

Co-authored-by: Jeremy Cohen <[email protected]>
@jtcohen6
Copy link
Contributor Author

I'm going to transfer this to dbt-core. I think the real blocker there is around understanding + consolidating the different options there. I also want to keep the follow-on work from the cross-db migration organized in one place.

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-utils Jul 25, 2022
@github-actions github-actions bot changed the title Consolidate current_timestamp & associates [CT-895] Consolidate current_timestamp & associates Jul 25, 2022
@jtcohen6 jtcohen6 added tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality Team:Adapters Issues designated for the adapter area of the code utils Cross-database building blocks and removed enhancement New feature or request labels Jul 25, 2022
@leahwicz leahwicz changed the title [CT-895] Consolidate current_timestamp & associates [CT-895] [Spike] Consolidate current_timestamp & associates Jul 26, 2022
@leahwicz leahwicz added the spike label Jul 26, 2022
@clausherther
Copy link
Contributor

clausherther commented Jul 29, 2022

Just learned a couple of things on Snowflake:

  • The 3 parameter version of convert_timezone that allows you to specify a source timezone does not accept a timestamp_tz column (any more?). So, one has to convert to timestamp_ntz before passing that in.
  • If your SF instance/account/session/... is to set to timezone X and you're trying to convert what you think is a timestamp in timezone Z to timezone X, nothing will actually get converted, b/c SF assumes source and target are the same (and you just stripped off the tz 😞 )

@dbeatty10
Copy link
Contributor

dbeatty10 commented Aug 30, 2022

Added a comparison of dbt-core+adapters implementation vs. dbt-utils in this comment.

TLDR: they don't appear to yield the same data types.

@dbeatty10
Copy link
Contributor

@Fleid ☝️ this is the ticket that we were discussing earlier today about inconsistent timestamp data types across adapters.

@jtcohen6
Copy link
Contributor Author

@dbeatty10 Amazing work pulling together all the threads from your research!

I think there are two potential scopes for this work:

  1. Do just the work that's needed to remove dbt_utils.current_timestamp ahead of cutting dbt-utils v1.0
  2. Actually reconcile the divergent implementations of getting current timestamp in dbt-core + adapters (also including date_function + snapshot_get_time)

The table you created (and included as a screenshot in your comment) really clinches it for me. We probably don't have the time, capacity, or ability to totally rationalize our approach to current_timestamp without risking major breaking changes — we might need to wait for dbt v2 for the really good version of this. I'm going to call option 2 "perfect," and option 1 "better."

GOAL: Find a path that can unlock "better" (unblocking dbt-utils v1.0), even though it will not get us to "perfect" (total consistency within dbt-core + adapters) — without breaking people's projects, and without adding a ton of additional tech debt (more inconsistency) to dbt-core + adapters in the process.

Here's what I think we COULD do now, to unblock step 1:

  • Move all the "active" and adapter-specific code from dbt-utils into dbt-core. Just leave dispatching macros from dbt-utilsdbt-core for v0.9, and then remove for v1.0.
  • Find the least-bad and least-breaking way of organizing that code in dbt-core. If it requires a current_timestamp_backcompat macro, just for Snowflake/Postgres (where dbt_utils.current_timestamp != dbt.current_timestamp), so be it — I think it's better to have all the inconsistent code in one place, and clearly documented.
  • We have multiple implementations hang out in dbt-core for some time, BUT we develop some crystal-clear guidance on what users should actually do when they want a cross-db compatible current_timestamp. That should be with documentation and guidance (We should create an article or guide explaining datetime and timestamp types, specifically time zones docs.getdbt.com#1504), and also {# DEPRECATED: DO NOT USE IN NEW PROJECTS #} inline in the code.

I'm poking around with the code a bit to see if that's possible. I'm not sure yet — I'll open a branch if I get there.

@jtcohen6
Copy link
Contributor Author

I haven't had a chance to actually test any of this yet — just sharing the links here to share how I'm thinking about the narrower path I outlined above:

Motivation: If there's going to be inconsistency — feels unavoidable unless we break things — better to have it all in one place.

@dbeatty10
Copy link
Contributor

Awesome to see your thinking on this @jtcohen6 !

Agreed that inconsistency is unavoidable unless we break things. Also agreed that it is better to preserve that inconsistency rather than introducing breaking changes before dbt v2.

What's the advantage of adding current_timestamp_backcompat to dbt-core?

Is it so that users of dbt_utils >= 1.0.0 can update their references from {{ dbt_utils.current_timestamp() }} to {{ current_timestamp_backcompat() }} (assuming the end user has dbt Core >= 1.3.0 installed)?

If that is the case, how is that better for users rather than leaving {{ dbt_utils.current_timestamp() }} as-is?

@jtcohen6
Copy link
Contributor Author

Is it so that users of dbt_utils >= 1.0.0 can update their references from {{ dbt_utils.current_timestamp() }} to {{ current_timestamp_backcompat() }} (assuming the end user has dbt Core >= 1.3.0 installed)?

Exactly my thinking.

If that is the case, how is that better for users rather than leaving {{ dbt_utils.current_timestamp() }} as-is?

If there's going to be inconsistency, I'd rather have it all in one place!

Right now, it's spread across dbt-core, adapter plugins, and dbt-utils. By moving from dbt-utils into dbt-core / adapters, entropy may not actually be reduced, but it is more concentrated.

And we can make clear that the backcompat is just there for that reason — backward compatibility — and shouldn't be used by anyone new. That doesn't feel clear if we keep dbt_utils.current_timestamp around. This feels directionally correct toward having a clear + concise recommendation for what to do going forward, even if we need to keep some other crufty stuff around.

@colin-rogers-dbt
Copy link
Contributor

Here is where I ended up at for dbt-core changes: https://github.com/dbt-labs/dbt-core/pull/5838/files#diff-4ce397ee2492fb1078425ea420483cf3b55645b5bbcee3139ae696ad07c6c1bc

Still need to polish/test everything but it consolidates the dbt-core logic into a single file. Can reconcile our two versions (I don't have the backwards compat).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spike Team:Adapters Issues designated for the adapter area of the code tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality utils Cross-database building blocks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants