Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support "relative" config for begin on microbatch models #11270

Open
3 tasks done
Tracked by #11292
bdewilde opened this issue Feb 4, 2025 · 1 comment
Open
3 tasks done
Tracked by #11292

[Feature] Support "relative" config for begin on microbatch models #11270

bdewilde opened this issue Feb 4, 2025 · 1 comment
Labels
awaiting_response enhancement New feature or request microbatch Issues related to the microbatch incremental strategy

Comments

@bdewilde
Copy link

bdewilde commented Feb 4, 2025

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently, the begin config on microbatch incremental models is a fixed timestamp value that indicates the earliest point in time from which the data is needed or relevant. It's currently required, though there's declared interest in making it optional in the future. I'd like to propose a third case: specifying begin as a relative time (e.g. "INTERVAL '1 year'") whose value is computed dynamically when the model is run.

This is useful because models are sometimes only relevant over a rolling window in time, specified by some sort of lookback (not to be confused with the batching config of the same name) relative to some reference time (typically "now"). In cases of a full refresh, it would be convenient to have the model start from the desired timestamp, rather than having to manually change the config every time.

Describe alternatives you've considered

The simplest alternative is to manually update the begin config before doing a full refresh for a microbatch incremental model. In my microbatch models, I've also added a condition in the query's WHERE clause that filters records by their event time column if they're less than a dynamically computed lookback timestamp, which is always more recent than the model's configured "begin". That works in the sense that the resulting data is what I want; however, iterating over lots of batches with zero rows is inefficient and seems a bit pointless. Finally, one could just use a different (not microbatch) incremental strategy, though this negates all the benefits of the new strategy.

Who will this benefit?

Folks that have large time-based models that only need to be populated over a rolling window in time (I have many!), who'd like microbatch to make running these models even easier.

Are you interested in contributing this feature?

No response

Anything else?

No response

@bdewilde bdewilde added enhancement New feature or request triage labels Feb 4, 2025
@graciegoheen
Copy link
Contributor

Hi! Thanks so much for opening this feature request. If you want to do a relative date for begin, then you should be able to calculate it in the config with modules.datetime and modules.pytz (docs on these modules here). Let me know if that works for you!

@graciegoheen graciegoheen added awaiting_response microbatch Issues related to the microbatch incremental strategy and removed triage labels Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response enhancement New feature or request microbatch Issues related to the microbatch incremental strategy
Projects
None yet
Development

No branches or pull requests

2 participants