You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Microbatch batches have been improved in core such that they may be run concurrently. When a batch is executed, the necessary data is first moved into a temp_relation in the data warehouse. The path for this temp_relation is <resource_identifier>__dbt_tmp. As it currently stands, the <resource_identifier>__dbt_tmp will be the same for each batch of a given microbatch model. If the batches are run concurrently, they may end up clobbering the temp_relation destination, which may lead to some wonkiness. As such, we need a way to ensure that each batch for a microbatch model gets a unique temp_relation path.
QMalcolm
changed the title
Update default make_temp_relation macro to incorporate config.__dbt_internal_event_time_startif available, microbatch temp relations
[Microbatch] Update default make_temp_relation macro to incorporate a batch specific identifier if available
Nov 21, 2024
what a shame that I didn't find this issue before learning how the source code works, and found that my problem could be solved by #361. I was currently starting a similar PR (if I copy/past the proposed solution it breaks the macro cause BQ conflict, dbt-core 1.8.8 / dbt-adapters 1.7.0 / dbt-bigquery 1.8.3).
At least, I've learned a lot and waiting to this release.
However, let me give you a little more context, because this solves my issue, but it's not related to microbatches. I'm currently implementing dbt with Airflow, and we are used to having multiples runs to ingest data in different partitions of BigQuery table.
Thus, having a dbt model model_A, materialized as incremental with insert_overwrite strategy, launching multiples runs, let's say D-2 and D-1, for different partitions, will create a conflict with model_A__dbt_tmp table. And the dbt-bigquery don't leverage this situation and relies on the make_temp_relation on dbt-adapters. So, it's the exact same situation as microbatch but on a less granular scope. I see that in a similar situation, it also share here #222
You'll probably know and understand better than me, but I wanted to share my situation.
Desired Improvement
Microbatch batches have been improved in core such that they may be run concurrently. When a batch is executed, the necessary data is first moved into a
temp_relation
in the data warehouse. The path for this temp_relation is<resource_identifier>__dbt_tmp
. As it currently stands, the<resource_identifier>__dbt_tmp
will be the same for each batch of a given microbatch model. If the batches are run concurrently, they may end up clobbering the temp_relation destination, which may lead to some wonkiness. As such, we need a way to ensure that each batch for a microbatch model gets a uniquetemp_relation
path.Helpful Prior Art
Similar to dbt-postgres: https://github.com/dbt-labs/dbt-postgres/blob/ae48e67dae6c1b00cda37ee9bdc61d3330506638/dbt/include/postgres/macros/adapters.sql#L149-L152
The text was updated successfully, but these errors were encountered: