-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] MicrobatchBuilder generates invalid table names with spaces for hourly batches #11165
Comments
I am experiencing the same issue!
|
+1 |
Thanks for reporting this @pei0804 (and to @yanithx and @syakesaba for confirming)! I was able to reproduce this issue -- see "reprex" below for details. Potential fix@pei0804 very nice research on the relevant areas of code 🤩. Making the following modifications to this code worked when I tested it out -- although our final fix might look a bit different: @staticmethod
def batch_id(start_time: datetime, batch_size: BatchSize) -> str:
return MicrobatchBuilder.format_batch_start(start_time, batch_size)
@staticmethod
def format_batch_start(batch_start: datetime, batch_size: BatchSize) -> str:
# If we want a date only
if batch_size != BatchSize.hour:
return batch_start.strftime('%Y%m%d') # e.g. "20241218"
# If we want date + time
return batch_start.strftime('%Y%m%dT%H%M%SZ') # e.g. "20241218T000000Z" This changed the SQL from this: create or replace temporary table analytics_dev.dbt_dbeatty.my_microbatch_model__dbt_tmp_20250110 14:00:00+00:00 to this instead: create or replace temporary table analytics_dev.dbt_dbeatty.my_microbatch_model__dbt_tmp_20250110T140000Z Note:
RepexCreate this file:
{{
config(
materialized="incremental",
incremental_strategy="microbatch",
begin="2025-01-10T00:00:00",
batch_size="hour",
event_time="_partition_hourly",
unique_key="_partition_hourly",
lookback=1,
)
}}
select 1 as id, {{ dbt.current_timestamp() }} as _partition_hourly
Run this command: dbt run -s my_microbatch_model Get this error:
|
I got the same error for hourly batch size.. Model config Error message
|
Is this a new bug in dbt-core?
Current Behavior
When using
incremental_strategy="microbatch"
withbatch_size="hour"
, MicrobatchBuilder generates a batch ID that contains spaces and special characters. This leads to invalid temporary table names and SQL syntax errors in database adapters.Example error from Snowflake adapter:
The error occurs because the generated temporary table name contains spaces and timezone information:
Expected Behavior
MicrobatchBuilder should generate a valid batch ID without spaces or special characters for hourly batches, similar to how it handles daily batches.
For example:
create or replace temporary table [...].model_name__dbt_tmp_20241218T000000Z
Steps To Reproduce
dbt run --select hoge --event-time-start "2024-12-18T00:00:00" --event-time-end "2024-12-18T01:00:00"
Relevant log output
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
The issue occurs in several parts of the codebase:
core/dbt/task/run.py
:https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/task/run.py#L352-L356
make_temp_relation
macro to create unique temporary table names:https://github.com/dbt-labs/dbt-adapters/blob/5407391c5cef22a5c0431daa469d6a8295c026d8/dbt/include/global_project/macros/adapters/relation.sql#L9-L16
MicrobatchBuilder
class:dbt-core/core/dbt/materializations/incremental/microbatch.py
Lines 195 to 203 in a175793
For hourly batches (
batch_size="hour"
),format_batch_start
returnsstr(batch_start)
which generates a datetime string like"2024-12-18 00:00:00+00:00"
. Whilebatch_id
removes hyphens, it does not handle spaces and timezone information, resulting in an invalid table name.For non-hourly batches (day/month/year), it correctly uses
batch_start.date()
which produces a clean format like"2024-12-18"
, and after removing hyphens becomes"20241218"
.The issue stems from the fact that the batch ID flows from
run.py
through the Jinja templating system and into SQL table names without proper sanitization for hourly batches. The fix would likely involve modifying theformat_batch_start
method to ensure hourly timestamps use a database-friendly format (e.g., ISO format"20241218T000000Z"
) similar to how it handles daily batches.The text was updated successfully, but these errors were encountered: