Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cannot --defer from --state for microbatch incremental strategy #11128

Open
2 tasks done
vvaneecloo opened this issue Dec 11, 2024 · 3 comments
Open
2 tasks done
Labels
awaiting_response bug Something isn't working microbatch Issues related to the microbatch incremental strategy state Stateful selection (state:modified, defer)

Comments

@vvaneecloo
Copy link

vvaneecloo commented Dec 11, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Using the --defer flag with the microbatch incremental strategy do not work even though the --defer flag works with other models.

Expected Behavior

I would expect the model to work like any other model with the --defer flag.

Steps To Reproduce

  1. Working with:
  • dbt-core=1.9.0 (latest)
  • dbt-databricks=1.9.0 (latest)
  1. When I do:

dbt run -s fact_weekly_sales_and_stock --event-time-start "2024-12-01" --event-time-end "2024-12-08" --defer --state prod_manifests i.e. I defer from prod

Here's the highlighted error:

09:34:53  Unhandled error while executing 
Exception on worker thread. Database Error
  [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dev_datalake_insight_analytics_supply_chain`.`fact_weekly_sales` cannot be found. Verify the spelling and correctness of the schema and catalog.

I think that even though the sql is generated, it fails to take into account the prod path instead of the dev path

Relevant log output

dbt run -s fact_weekly_sales_and_stock --event-time-start "2024-12-01" --event-time-end "2024-12-08" --defer --state prod_manifests        
09:32:33  Running with dbt=1.9.0
09:32:34  Registered adapter: databricks=1.9.0
09:32:35  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 3 unused configuration paths:
- models.dbt_artifacts
- models.insight_supply_chain.staging.codegen
- models.insight_supply_chain.activation.cockpit
09:32:36  Found 717 models, 171 data tests, 12 seeds, 1 operation, 456 sources, 15 exposures, 1097 macros
09:32:36  
09:32:36  Concurrency: 1 threads (target='exploration-dev')
09:32:36  
09:34:52  1 of 1 START sql microbatch model dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [RUN]
09:34:52  Batch 1 of 7 START batch 2024-12-01 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [RUN]
09:34:53  Unhandled error while executing 
Exception on worker thread. Database Error
  [TABLE_OR_VIEW_NOT_FOUND] The table or view `hive_metastore`.`dev_datalake_insight_analytics_supply_chain`.`fact_weekly_sales` cannot be found. Verify the spelling and correctness of the schema and catalog.
  If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
  To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 39 pos 5
09:34:53  Batch 1 of 7 ERROR creating batch 2024-12-01 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [ERROR in 1.20s]
09:34:53  Batch 2 of 7 SKIP batch 2024-12-02 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [SKIPPED in 0.00s]
09:34:53  Batch 3 of 7 SKIP batch 2024-12-03 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [SKIPPED in 0.00s]
09:34:53  Batch 4 of 7 SKIP batch 2024-12-04 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [SKIPPED in 0.00s]
09:34:53  Batch 5 of 7 SKIP batch 2024-12-05 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [SKIPPED in 0.00s]
09:34:53  Batch 6 of 7 SKIP batch 2024-12-06 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [SKIPPED in 0.00s]
09:34:53  Batch 7 of 7 SKIP batch 2024-12-07 of dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [SKIPPED in 0.00s]
09:34:53  1 of 1 ERROR creating sql microbatch model dev_datalake_insight_analytics_supply_chain.fact_weekly_sales_and_stock  [ERROR in 1.23s]
09:34:53  
09:34:53  1 of 1 START hook: insight_all_sources.on-run-end.0 ............................ [RUN]
09:34:53  1 of 1 OK hook: insight_all_sources.on-run-end.0 ............................... [OK in 0.01s]
09:34:53  
09:34:53  Finished running 1 incremental model, 1 project hook in 0 hours 2 minutes and 17.12 seconds (137.12s).
09:34:54  
09:34:54  Completed with 1 error, 0 partial successes, and 0 warnings:
09:34:54  
09:34:54    ERROR
09:34:54  

Environment

- OS: Mac OS Sequoia Version 15.1.1
- Python: 3.10.12
- dbt: 1.9.0

Which database adapter are you using with dbt?

spark // databricks

Additional Context

No response

@vvaneecloo vvaneecloo added bug Something isn't working triage labels Dec 11, 2024
@vvaneecloo vvaneecloo changed the title [Bug] Microbatch incremental strategy - Cannot defer from --state [Bug] Microbatch incremental strategy - Cannot --defer from --state Dec 11, 2024
@vvaneecloo
Copy link
Author

Here's the log output from a model which is not configured as microbatch:

dbt run -s dim_country_organization --defer --st
ate prod_manifests
10:34:38  Running with dbt=1.9.0
10:34:39  Registered adapter: databricks=1.9.0
10:34:40  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 3 unused configuration paths:
- models.insight_supply_chain.staging.codegen
- models.dbt_artifacts
- models.insight_supply_chain.activation.cockpit
10:34:42  Found 717 models, 171 data tests, 12 seeds, 1 operation, 456 sources, 15 exposures, 1097 macros
10:34:42  
10:34:42  Concurrency: 1 threads (target='exploration-dev')
10:34:42  
10:36:53  1 of 1 START sql table model dev_datalake_insight_analytics_supply_chain.dim_country_organization  [RUN]
10:38:06  1 of 1 OK created sql table model dev_datalake_insight_analytics_supply_chain.dim_country_organization  [OK in 73.74s]
10:38:06  
10:38:07  1 of 1 START hook: insight_all_sources.on-run-end.0 ............................ [RUN]
10:38:07  1 of 1 OK hook: insight_all_sources.on-run-end.0 ............................... [OK in 0.01s]
10:38:07  
10:38:07  Finished running 1 project hook, 1 table model in 0 hours 3 minutes and 25.10 seconds (205.10s).
10:38:07  
10:38:07  Completed successfully
10:38:07  
10:38:07  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

@dbeatty10 dbeatty10 added state Stateful selection (state:modified, defer) microbatch Issues related to the microbatch incremental strategy labels Dec 11, 2024
@dbeatty10 dbeatty10 changed the title [Bug] Microbatch incremental strategy - Cannot --defer from --state [Bug] Cannot --defer from --state for microbatch incremental strategy Dec 11, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @vvaneecloo !

I wasn't able to reproduce this when I tried it out in dbt-postgres and dbt-databricks. Could you try out the files and commands below and see if you can tweak them to get the error that you reported?

Project files and commands

Create these files:

models/events.sql

{{ config(materialized="view", event_time="event_occured_at") }}

{# -- Create 5 days of data #}
{% for i in range(1, 5) %}

select
    {{ dbt.dateadd(
        datepart="day",
        interval=(-1 * i),
        from_date_or_timestamp=dbt.current_timestamp()
    ) }} as event_occured_at,
    {{ i % 2 }} as id  {# -- Alternate between two ids: 0 and 1 #}

{% if not loop.last %} union all{% endif %}

{% endfor %}

models/my_microbatch_model.sql

{{
    config(
        materialized='incremental',
        incremental_strategy='microbatch',
        unique_key='id',
        event_time='event_occured_at',
        batch_size='day',
        lookback=3,
        begin='9999-12-30',
        full_refresh=false,
    )
}}

select *
from {{ ref('events') }}

Run these commands:

dbt build --target prod --select events
dbt parse --target prod --target-path prod_manifests
dbt show --inline "select * from {{ ref('events') }}"
dbt run -s my_microbatch_model --event-time-start "2024-12-01" --event-time-end "2024-12-08" --defer --state prod_manifests

(The dbt show command should give an error because the events model hasn't been build in the default target. This is what we want so that it will defer to the prod events model instead.)

I got this output with everything working as expected:

$ dbt run -s my_microbatch_model --event-time-start "2024-12-01" --event-time-end "2024-12-08" --defer --state prod_manifests

00:58:32  Running with dbt=1.9.0
00:58:34  Registered adapter: databricks=1.9.0
00:58:34  Found 2 models, 604 macros
00:58:34  
00:58:34  Concurrency: 10 threads (target='databricks')
00:58:34  
00:58:37  1 of 1 START sql microbatch model dbt_dbeatty_dev.my_microbatch_model .......... [RUN]
00:58:37  Batch 1 of 7 START batch 2024-12-01 of dbt_dbeatty_dev.my_microbatch_model ........... [RUN]
00:58:41  Batch 1 of 7 OK created batch 2024-12-01 of dbt_dbeatty_dev.my_microbatch_model ...... [OK in 3.93s]
...
00:58:58  Batch 7 of 7 START batch 2024-12-07 of dbt_dbeatty_dev.my_microbatch_model ........... [RUN]
00:59:01  Batch 7 of 7 OK created batch 2024-12-07 of dbt_dbeatty_dev.my_microbatch_model ...... [OK in 3.58s]
00:59:02  1 of 1 OK created sql microbatch model dbt_dbeatty_dev.my_microbatch_model ..... [SUCCESS in 24.54s]
00:59:02  
00:59:02  Finished running 1 incremental model in 0 hours 0 minutes and 27.51 seconds (27.51s).
00:59:02  
00:59:02  Completed successfully
00:59:02  
00:59:02  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

@vvaneecloo
Copy link
Author

Hey @dbeatty10, thanks for your comprehensive answer!

Will definitely test this week & get back to you as soon as possible :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response bug Something isn't working microbatch Issues related to the microbatch incremental strategy state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

2 participants