-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace environment variable with a project flag to gate microbatch functionality #10799
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10799 +/- ##
==========================================
- Coverage 89.12% 89.07% -0.06%
==========================================
Files 183 183
Lines 23592 23626 +34
==========================================
+ Hits 21027 21044 +17
- Misses 2565 2582 +17
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…yDefault` as it was prior to a9df50f
…with behavior flag changes This is needed to get the tests to pass. This is only necessary until the changes in dbt-adapters are merged. Said another way: we need to merge the changes to dbt-adapters first, and then revert these dependency changes before merging these changes.
dev-requirements.txt
Outdated
git+https://github.com/dbt-labs/dbt-adapters.git@main | ||
git+https://github.com/dbt-labs/dbt-adapters.git@main#subdirectory=dbt-tests-adapter | ||
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag | ||
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag#subdirectory=dbt-tests-adapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to get the tests to pass. This is only necessary until the changes in dbt-adapters are merged. Said another way: we need to merge the changes to dbt-adapters first, and then revert these dependency changes before merging this PR's changes.
…o `find_materialization_macro_candidate_by_name` This is necessary because in the following commit we're going to add a function to get whether or not a manifest should be run using the new microbatch functionality. Getting the MacroCandidate is necessary because we need to know the locality of the macro.
…ity should be used The new microbatch functionality is, unfortunately, potentially dangerous. That is it adds a new materalization strategy `microbatch` which an end user could have defined as a custom strategy previously. Additionally we added config keys to nodes, and as `config` is just a Dict[str, Any], it could contain anything, thus meaning people could already be using the configs we're adding for different purposes. Thus we need some intellegent gating. Specifically something that adheres to the following: cms = Custom Microbatch Strategy abms = Adapter Builtin Microbatch Strategy bf = Behavior flag umb = Use Microbatch Batching t/f/e = True/False/Error | cms | abms | bf | umb | | t | t | t | t | | f | t | t | t | | t | f | t | t | | f | f | t | e | | t | t | f | f | | f | t | f | t | | t | f | f | f | | f | f | f | e | (The above table assumes that there is a microbatch model present in the project) In order to achieve this we need to check that either the microbatch behavior flag is set to true OR microbatch materializaion being used is the _root_ microbatch materialization (i.e. not custom). The function we added in this commit, `use_microbatch_batches`, does just that.
…tom_microbatch_strategy`
9b3c122
to
543e024
Compare
In 0349968 I had done this for the function `find_materialization_macro_by_name`, but that wasn't the right function to do it to, and will be reverted shortly. `find_materialization_macro_by_name` is used for finding the general materialization macro, whereas `find_macro_by_name` is more general. For the work we're doing, we need to find the microbatch macro, which is not a materialization macro.
…_name` to `find_materialization_macro_candidate_by_name`" This reverts commit 0349968.
…tead of `root` Previously were were checking for a locality of `root`. However, a locality of `root` means it was provided by a `package`. We wnt to check for locality of `core` which basically means `builtin via dbt-core/adapters`. There is another locality `imported` which I beleive means it comes from another package.
…in boolean checks The method `use_microbatch_batches` is always invoked to evaluate an `if` statement. In most instances, it is part of a logic chain (i.e. there are multiple things being evaluated in the `if` statement). In `if` statements where there are multiple things being evaulated, `use_microbatch_batches` should come _last_ (or as late as possible). This is because it is likely the most costly thing to evaluate in the logic chain, and thus any shortcuts cuts via other evaluations in the if statement failing (and thus avoiding invoking `use_microbatch_batches) is desirable.
dev-requirements.txt
Outdated
git+https://github.com/dbt-labs/dbt-adapters.git@main | ||
git+https://github.com/dbt-labs/dbt-adapters.git@main#subdirectory=dbt-tests-adapter | ||
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag | ||
git+https://github.com/dbt-labs/dbt-adapters.git@microbatch-behavior-flag#subdirectory=dbt-tests-adapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder: let's make sure we bump the lower bound pin of dbt-adapters
in setup.py as part of this PR once the dbt-adapters branches are merged.
return "D020" | ||
|
||
def message(self) -> str: | ||
description = "The use of a custom microbatch macro outside of batched execution is deprecated. To use it with batched execution, set `flags.require_batched_execution_for_custom_microbatch_strategy` to `True` in `dbt_project.yml`. In the future this will be the default behavior." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit, just because this is user-facing:
"In the future this will be the default behavior." -> "In the future batched execution will be the default behaviour."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also mention the name of the custom macro that exists (e.g. get_incremental_microbatch_sql
) as part of the message in case the user that's seeing this doesn't know about the custom strategy macro naming offhand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Can't approve a PR I "authored" but here is my psuedo-approval!
Thank you for taking this across the finish line @QMalcolm, clean + precise work ✨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self approving because I didn't open the PR 🙈
… branch with behavior flag changes" This reverts commit 5b9d7f7.
hey @QMalcolm @MichelleArk , the docs team received this github issue. i was reviewing this dbt-adapters PR and #10799 -- am i correct in thinking that the following needs to be added to the behavior flag doc:
|
@mirnawong1 It's actually a little bit different than that. The flag |
hey @QMalcolm oh ok i see! so so does this also mean we have another flag: |
@mirnawong1 There is no other flag. I recognize that as slightly confusing. For context, the only reason we gated microbatch in the first place was the fear of breaking people's projects who already had a custom microbatch macro. What we've done here is make it so one only needs to set a flag if they have a custom microbatch macro, instead of making everyone set a flag because someone else's project might have a microbatch macro. For a person setting up a microbatch model for the first time, there is no longer a flag or environment variable they need to set (assuming they don't have a custom microbatch macro) |
oh ok got it! so i can update the callout here and replace it with content informing users of the behavior flag should they have a custom microbatch macro only. all other users that are setting up incremental microbatch strategy for the first time w no existing custom macro don't need to do anything apart from their config. thank you so much for clarifying this! |
Resolves #10798
Problem
We had gated the new microbatch feature behind an environment variable in the initial implementation of microbatch (#10594). However, for a better experience, we want people to be setting a project flag (AKA behavior flag).
Solution
Gate microbatch functionality behind a project flag (AKA behavior flag) instead of an environment variable.
Checklist