Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError while using deferred flag and model filters #2875

Closed
1 of 5 tasks
nave91 opened this issue Nov 9, 2020 · 3 comments · Fixed by #2958
Closed
1 of 5 tasks

KeyError while using deferred flag and model filters #2875

nave91 opened this issue Nov 9, 2020 · 3 comments · Fixed by #2958
Labels
bug Something isn't working state Stateful selection (state:modified, defer)

Comments

@nave91
Copy link

nave91 commented Nov 9, 2020

Describe the bug

When the current run isn't aware of a source that exists in deferring run and when we specify a model filter, the dbt run command fails with a KeyError: <unaware_source_id>. I think dbt run should successfully complete or if its a configuration error it should give a more descriptive error statement.

Steps To Reproduce

  1. Checkout branch: https://github.com/nave91/jaffle_shop/tree/bug/defer-with-filters
  2. Run dbt -d run --models config.materialized:table,state:modified+ --defer --state ./deferring_target which fails with an error.
  3. Run dbt -d run --models state:modified+ --defer --state ./deferring_target passes without any errors.

To recreate ./deferring_target:

  1. Add any source to project with a model depending on it.
  2. Run dbt run and copy the target directory.

Expected behavior

The run should complete without any errors or a more descriptive error when we specify model filters like config.materialized:table.

Screenshots and log output

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

Plugins:
  - bigquery: 0.18.1
  - snowflake: 0.18.1
  - redshift: 0.18.1
  - postgres: 0.18.1

The operating system you're using:
MacOs Catalina 10.15.7
The output of python --version:
Python 3.8.2

Additional context

~/git/jaffle_shop(bug/defer-with-filters) » dbt -d run --models config.materialized:table,state:modified+ --defer --state ./deferring_target                                 2 ↵ naveen@Naveens-MacBook-Pro
2020-11-09 22:41:10.593567 (MainThread): Running with dbt=0.18.1
2020-11-09 22:41:12.097193 (MainThread): running dbt with arguments Namespace(cls=<class 'dbt.task.run.RunTask'>, debug=True, defer=True, exclude=None, fail_fast=False, full_refresh=False, log_cache_events=False, log_format='default', models=['config.materialized:table,state:modified+'], partial_parse=None, profile=None, profiles_dir='/Users/naveen/.dbt', project_dir=None, record_timing_info=None, rpc_method='run', selector_name=None, single_threaded=False, state=PosixPath('deferring_target'), strict=False, target=None, test_new_parser=False, threads=None, use_cache=True, use_colors=None, vars='{}', version_check=True, warn_error=False, which='run', write_json=True)
2020-11-09 22:41:12.098467 (MainThread): Tracking: tracking
2020-11-09 22:41:12.112634 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0d4a60>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0daf10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0da400>]}
2020-11-09 22:41:12.144638 (MainThread): Partial parsing not enabled
2020-11-09 22:41:12.146188 (MainThread): Parsing macros/catalog.sql
2020-11-09 22:41:12.151603 (MainThread): Parsing macros/relations.sql
2020-11-09 22:41:12.154273 (MainThread): Parsing macros/adapters.sql
2020-11-09 22:41:12.183946 (MainThread): Parsing macros/materializations/snapshot_merge.sql
2020-11-09 22:41:12.188553 (MainThread): Parsing macros/core.sql
2020-11-09 22:41:12.193896 (MainThread): Parsing macros/materializations/helpers.sql
2020-11-09 22:41:12.205495 (MainThread): Parsing macros/materializations/snapshot/snapshot_merge.sql
2020-11-09 22:41:12.208681 (MainThread): Parsing macros/materializations/snapshot/strategies.sql
2020-11-09 22:41:12.228941 (MainThread): Parsing macros/materializations/snapshot/snapshot.sql
2020-11-09 22:41:12.271125 (MainThread): Parsing macros/materializations/seed/seed.sql
2020-11-09 22:41:12.299178 (MainThread): Parsing macros/materializations/incremental/helpers.sql
2020-11-09 22:41:12.301574 (MainThread): Parsing macros/materializations/incremental/incremental.sql
2020-11-09 22:41:12.318498 (MainThread): Parsing macros/materializations/common/merge.sql
2020-11-09 22:41:12.351718 (MainThread): Parsing macros/materializations/table/table.sql
2020-11-09 22:41:12.360024 (MainThread): Parsing macros/materializations/view/view.sql
2020-11-09 22:41:12.368574 (MainThread): Parsing macros/materializations/view/create_or_replace_view.sql
2020-11-09 22:41:12.375325 (MainThread): Parsing macros/etc/get_custom_alias.sql
2020-11-09 22:41:12.376571 (MainThread): Parsing macros/etc/query.sql
2020-11-09 22:41:12.378245 (MainThread): Parsing macros/etc/is_incremental.sql
2020-11-09 22:41:12.381212 (MainThread): Parsing macros/etc/datetime.sql
2020-11-09 22:41:12.393173 (MainThread): Parsing macros/etc/get_custom_schema.sql
2020-11-09 22:41:12.395858 (MainThread): Parsing macros/etc/get_custom_database.sql
2020-11-09 22:41:12.398614 (MainThread): Parsing macros/adapters/common.sql
2020-11-09 22:41:12.478857 (MainThread): Parsing macros/schema_tests/relationships.sql
2020-11-09 22:41:12.481771 (MainThread): Parsing macros/schema_tests/not_null.sql
2020-11-09 22:41:12.483674 (MainThread): Parsing macros/schema_tests/unique.sql
2020-11-09 22:41:12.487213 (MainThread): Parsing macros/schema_tests/accepted_values.sql
2020-11-09 22:41:12.498646 (MainThread): * Deprecation Warning: dbt v0.17.0 introduces a new config format for the
dbt_project.yml file. Support for the existing version 1 format will be removed
in a future release of dbt. The following packages are currently configured with
config version 1:
 - jaffle_shop

For upgrading instructions, consult the documentation:
  https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-0-17-0

2020-11-09 22:41:12.498840 (MainThread): Sending event: {'category': 'dbt', 'action': 'deprecation', 'label': '909baabc-7612-4869-8c50-a18d9cfa843b', 'property_': 'warn', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f108b80>]}
2020-11-09 22:41:12.499374 (MainThread): Partial parsing not enabled
2020-11-09 22:41:12.530387 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_customers".
2020-11-09 22:41:12.549148 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_payments".
2020-11-09 22:41:12.559049 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_orders".
2020-11-09 22:41:12.570599 (MainThread): Acquiring new postgres connection "model.jaffle_shop.dim_customers".
2020-11-09 22:41:12.584907 (MainThread): Acquiring new postgres connection "model.jaffle_shop.fct_orders".
2020-11-09 22:41:12.594309 (MainThread): Acquiring new postgres connection "model.jaffle_shop.order_payments".
2020-11-09 22:41:12.604708 (MainThread): Acquiring new postgres connection "model.jaffle_shop.customer_payments".
2020-11-09 22:41:12.613672 (MainThread): Acquiring new postgres connection "model.jaffle_shop.customer_orders".
2020-11-09 22:41:13.141491 (MainThread): Found 8 models, 20 tests, 0 snapshots, 0 analyses, 138 macros, 0 operations, 3 seed files, 0 sources
2020-11-09 22:41:13.144965 (MainThread):
2020-11-09 22:41:13.145323 (MainThread): Merged 6 items from state (sample: ['seed.jaffle_shop.raw_orders', 'model.jaffle_shop.stg_payments', 'seed.jaffle_shop.raw_payments', 'model.jaffle_shop.stg_orders', 'seed.jaffle_shop.raw_customers'])
2020-11-09 22:41:13.145489 (MainThread): Connection 'model.jaffle_shop.customer_orders' was properly closed.
2020-11-09 22:41:13.145797 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20f100>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20ffd0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20ff70>]}
2020-11-09 22:41:13.146030 (MainThread): Flushing usage events
2020-11-09 22:41:13.338030 (MainThread): Encountered an error:
2020-11-09 22:41:13.338333 (MainThread): 'source.jaffle_shop.public.source1'
2020-11-09 22:41:13.343158 (MainThread): Traceback (most recent call last):
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 124, in main
    results, succeeded = handle_and_check(args)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 202, in handle_and_check
    task, res = run_from_args(parsed)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 255, in run_from_args
    results = task.run()
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 419, in run
    result = self.execute_with_hooks(selected_uids)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 379, in execute_with_hooks
    self.before_run(adapter, selected_uids)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/run.py", line 392, in before_run
    self.defer_to_manifest(selected_uids)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/run.py", line 389, in defer_to_manifest
    self.write_manifest()
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 62, in write_manifest
    self.manifest.write(path)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 762, in write
    self.writable_manifest().write(path)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 741, in writable_manifest
    forward_edges, backward_edges = build_edges(edge_members)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 231, in build_edges
    forward_edges[unique_id].append(node.unique_id)
KeyError: 'source.jaffle_shop.public.source1'
@nave91 nave91 added bug Something isn't working triage labels Nov 9, 2020
@jtcohen6 jtcohen6 added state Stateful selection (state:modified, defer) and removed triage labels Nov 10, 2020
@jtcohen6
Copy link
Contributor

Oh boy. Thanks for the repro instructions @nave91, I've been able to replicate this error locally in an even simpler project.

I've confirmed that the error results from dbt run only when:

  • defer is turned on (--defer or DBT_DEFER_TO_STATE=true)
  • the selection criteria does not include at least one non-ephemeral model that directly depended on a now-missing source

The error crops up here, where dbt tries to infer the backward edges of the deferred model and then trips over its missing source key:
https://github.com/fishtown-analytics/dbt/blob/0ce2f41db47e7e538f727f0ef57f89fffdab8e25/core/dbt/contracts/graph/manifest.py#L231-L237

What are our options here? I think it's either one of the following:

  • "Defer sources." Instead of looking in the current project for a source matching that key, look in the state manifest's sources. (It has to be there, otherwise the state manifest couldn't have compiled.)
  • Catch this exception and... do nothing with it. At the end of the day, there is no functional difference between a run where an upstream source, beyond the veil of deferral, does exist in the current project, vs. one where it does not.

This error is rare in the "Slim CI" use case because it only crops up if you exclude modified models from selection criteria. It's much more common to run all state:modified, which would successfully capture models whose source() dependencies have changed.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 3, 2020

Update: this error also crops up for ephemeral models that have been deleted from the project, but which are understood to have edges within the comparison manifest. I think our best bet here is to handle the missing key as gracefully as possible, and move on.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 15, 2020

Captain's log, 12/14/20, 9:48 pm: Per #2954, KeyError also appears when a schema test which uses its arguments to construct its unique_id (e.g. relationships) changes one of its arguments. When it comes time to run the new/modified version of the test, the old test rears its head, presumably from the child_map of the deferred manifest:

Encountered an error:
'test.testy.relationships_model_a_old__args__ref_model_b_'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working state Stateful selection (state:modified, defer)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants