-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial refactoring of incremental materialization #5359
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
I haven't yet incorporated the most recent feedback, so we don't have adapter.dispatch and default macros for everything yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me. Love the fact that we are breaking things down into smaller parts for folks to implement and adding more guardrails.
With this kind of change, do we think we can actually try to have smaller tests focusing on certain macros vs having one functional test testing the whole function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @gshank! Thanks for throwing up the draft PR, to push forward the conversation. I'm excited by the progress here. I left a bunch of comments, some of which are just "oh I get this now," others being open questions that we should look to answer before merging.
core/dbt/include/global_project/macros/materializations/models/incremental/strategies.sql
Show resolved
Hide resolved
core/dbt/include/global_project/macros/materializations/models/incremental/strategies.sql
Show resolved
Hide resolved
core/dbt/include/global_project/macros/materializations/models/incremental/strategies.sql
Show resolved
Hide resolved
core/dbt/include/global_project/macros/materializations/models/incremental/strategies.sql
Outdated
Show resolved
Hide resolved
core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql
Outdated
Show resolved
Hide resolved
core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql
Outdated
Show resolved
Hide resolved
5a8ad92
to
afd347d
Compare
@jtcohen6 There's a difference in the way the config.get('...', default='...') works when the config key exists (built-in or "extra") and if it doesn't. The materialization code is doing config.get('incremental_strategy', default='merge') but it only sets incremental_strategy to 'merge' if the config key (and the extra key) doesn't exist. So we're not actually getting the default set from the macro config.get calls because we now have an 'incremental_strategy' config attribute. I almost want to check for a None value to set a default, but that's a pretty big change, so I'm guessing that's not a good idea. We could change the lines to {%- set strategy = config.get("incremental_strategy") or "merge" %} In the new code in 'get_incremental_strategy_macro' I set the strategy to 'default' if it's none, but if we're only doing the minimal changes to the adapters (not including using that bit) we'll have to do something else. |
@gshank I see what you mean. I'd be in favor of making a few small changes in each adapter to get this PR merge-able, if it enables us to have a better implementation here, and avoid contorting ourselves into a harder-to-reason-about logical flow. Following the design / conversation in #5245, I think all that will require us to do, in each adapter, is define a macro {# This part goes in dbt-core #}
{% macro get_default_incremental_sql(arg_dict) %}
{{ return(adapter.dispatch('get_default_incremental_sql', 'dbt'))(arg_dict) }}
{% endmacro %}
{#
-- The first 'default' here means for the default adapter,
-- the second 'default' means for the default incremental strategy (if not otherwise specified)
#}
{% macro default__get_default_incremental_sql(arg_dict) %}
{{ return(get_append_incremental_sql(arg_dict)) }}
{% endmacro %}
{# This macro goes in dbt-postgres #}
{% macro postgres__get_default_incremental_sql(arg_dict) %}
{{ return(get_delete_insert_incremental_sql(arg_dict)) }}
{% endmacro %}
{# This macro goes in dbt-snowflake #}
{% macro snowflake__get_default_incremental_sql(arg_dict) %}
{{ return(get_merge_incremental_sql(arg_dict)) }}
{% endmacro %}
{# This macro goes in dbt-bigquery #}
{% macro bigquery__get_default_incremental_sql(arg_dict) %}
{{ return(get_merge_incremental_sql(arg_dict)) }}
{% endmacro %} |
I'm wondering if putting the args in the strategy_arg_dict actually serves any real purpose? It's not clear to me what it gives us that simply passing in the args doesn't. If users have a custom materialization, unless they copy-and-paste the whole incremental materialization, I assume they will be pulling the in any new args from config in their custom code, right? It's still not clear to me why we want the sql_header in bigquery and only in bigquery. As far as I can tell, none of the other adapters include the sql_header, since "include_sql_header" is set to false everywhere except that one place in bigquery. I think that in order to use the default__get_incremental_default_sql in the previous comment, we'd have to pretty much implement all the changes in this pull request. Redshift works fine with no changes on top of dbt-core. Snowflake works with just the tweak to the config.get of the incremental_strategy. Spark works with the tweak to the config.get call. BigQuery works, except that for testing purposes I removed the sql_header thing and everything else works fine except for the test that explicitly checks for the sql header. |
Our thinking was that each of these strategy macros takes a slightly different set of arguments today, and will likely continue to take different arguments in the future. We want a single place in the incremental materialization where we call the strategy macro, passing all possible arguments in.
Yes, this sounds right to me.
Every other adapter (and sometimes BigQuery) includes
Is that mainly because of the |
For BigQuery, maybe we could set an extra config? It would be nice to not have to pass around an extra parameter just for that one case. As far as using default__get_incremental_default_sql, the code just isn't set up such that we can just insert that one bit. I look at putting it in and it just doesn't fit. We'd need to pull in almost all of the new macros. I can try doing that... What the timeframe here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we're very close! One blocking/functional comment, and a few nit picks around the edges. In all, though, I'm excited to get this foundation laid, so we can reap the fruits of it in adapter repos. Thanks as well for pushing that work a bit further in dbt-snowflake
, as demonstration.
Looking forward to talking through it all in a few minutes
{% macro postgres__get_incremental_default_sql(arg_dict) %} | ||
|
||
{% if arg_dict["unique_key"] %} | ||
{% do return(get_incremental_delete_insert_sql(arg_dict)) %} | ||
{% else %} | ||
{% do return(get_incremental_append_sql(arg_dict)) %} | ||
{% endif %} | ||
|
||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love to see this!!
{% macro default__get_incremental_default_sql(arg_dict) %} | ||
|
||
{% do return(get_incremental_append_sql(arg_dict)) %} | ||
|
||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represents an implicit breaking change for maintainers of existing adapter plugins who use the default incremental materialization:
- Previously, the (implicit) default was
delete+insert
- Now, the explicit default is
append
I think it's a good change! It's just one we'll want to document very clearly
(cc @dataders)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a docs issue: dbt-labs/docs.getdbt.com#1761
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in incremental.sql
made me wonder the same thing about default behavior.
core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work resolving that dbt-snowflake
failure! And thanks for addressing most of my outstanding comments on the PR.
I see there's one remaining failure in dbt-bigquery
tests, I think related to the matter of include_sql_header
. Pending that resolution, this is good to go from a functional perspective / my point of view. Another engineer should still take a look for code review.
{% macro default__get_incremental_default_sql(arg_dict) %} | ||
|
||
{% do return(get_incremental_append_sql(arg_dict)) %} | ||
|
||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a docs issue: dbt-labs/docs.getdbt.com#1761
I didn't find any good way to get include_sql_header without a param, so a put the param back. You can't set config values at runtime... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There's no impact for python model's incremental change as we just reuse the same logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work wrapping this up!
) ### Description Avoids using default as a temporary fix for dbt-labs/dbt-core#5359. This is a temporary fix and dbt-labs/dbt-spark#394 should be ported later.
### Description Applies "Initial refactoring of incremental materialization" (dbt-labs/dbt-core#5359). Now it uses `adapter.get_incremental_strategy_macro` instead of dbt-spark's `dbt_spark_get_incremental_sql` macro to dispatch the incremental strategy macro. The overwritten `dbt_spark_get_incremental_sql` macro will not work anymore. Co-authored-by: allisonwang-db <[email protected]>
@@ -433,6 +433,7 @@ class NodeConfig(NodeAndTestConfig): | |||
# Note: if any new fields are added with MergeBehavior, also update the | |||
# 'mergebehavior' dictionary | |||
materialized: str = "view" | |||
incremental_strategy: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank @nathaniel-may does this now mean that the value of incremental_strategy
is now in the manifest.json when it wasn't before? or does this add it to the python context such that it is accessible as an attribute of model
as in model.incremental_strategy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that it should be in the manifest.json. It's accessible like other config keys, but the behavior is a bit different for builtin attributes than for adhoc attributes, in that setting defaults doesn't work the same way.
…s/dbt-core#5359). Now it uses `adapter.get_incremental_strategy_macro` instead of dbt-spark's `dbt_spark_get_incremental_sql` macro to dispatch the incremental strategy macro.
…s/dbt-core#5359). Now it uses `adapter.get_incremental_strategy_macro` instead of dbt-spark's `dbt_spark_get_incremental_sql` macro to dispatch the incremental strategy macro. Add tests that check for the new error string
resolves #5245
Description
This is not ready for final review, but I'm putting it out here for discussion and commenting.
Checklist
changie new
to create a changelog entry