Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-950] [Bug] dbt show doesn't work with json structs #972

Closed
2 tasks done
jeremyyeo opened this issue Oct 18, 2023 · 2 comments · Fixed by #974
Closed
2 tasks done

[ADAP-950] [Bug] dbt show doesn't work with json structs #972

jeremyyeo opened this issue Oct 18, 2023 · 2 comments · Fixed by #974
Labels
bug Something isn't working

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Oct 18, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

If we have a struct with a json datatype, previewing the data via dbt show will result in an exception.

Expected Behavior

No errors.

Steps To Reproduce

  1. Project setup.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: 1.0

models:
  my_dbt_project:
    +materialized: table
-- models/foo.sql
select 
  struct<
    k array<
        struct<c1 int64, c2 json>
      >
  >(
    [
      struct(
        1 as c1, 
        to_json(struct(1 as a)) as c2
      )
    ]
  ) 
  as v
  1. Try building the model.
$ dbt build

00:51:34  Running with dbt=1.6.6
00:51:36  Registered adapter: bigquery=1.6.7
00:51:37  Found 1 model, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
00:51:37  
00:52:18  Concurrency: 1 threads (target='bq')
00:52:18  
00:52:18  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
00:52:24  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [CREATE TABLE (1.0 rows, 0 processed) in 6.01s]
00:52:24  
00:52:24  Finished running 1 table model in 0 hours 0 minutes and 47.48 seconds (47.48s).
00:52:24  
00:52:24  Completed successfully
00:52:24  
00:52:24  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
  1. Try previewing the data via show.
$ dbt show --select foo

00:52:56  Running with dbt=1.6.6
00:52:58  Registered adapter: bigquery=1.6.7
00:52:59  Found 1 model, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
00:52:59  
00:53:38  Concurrency: 1 threads (target='bq')
00:53:38  
00:53:41  Unhandled error while executing 
'JSON'
00:53:41  Encountered an error:
Runtime Error
  'JSON'

Relevant log output

�[0m13:52:56.815020 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1047f6640>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107423df0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107460400>]}


============================== 13:52:56.824630 | ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a ==============================
�[0m13:52:56.824630 [info ] [MainThread]: Running with dbt=1.6.6
�[0m13:52:56.826671 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'False', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'False', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt show --select foo', 'target_path': 'None', 'introspect': 'True', 'send_anonymous_usage_stats': 'True'}
�[0m13:52:58.767139 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e724b50>]}
�[0m13:52:58.779563 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e758c10>]}
�[0m13:52:58.781569 [info ] [MainThread]: Registered adapter: bigquery=1.6.7
�[0m13:52:58.809470 [debug] [MainThread]: checksum: 546b81fb56652c304d87abd676e84d4737d8a0c6b62160f4a6e79dcddbc842bb, vars: {}, profile: , target: , version: 1.6.6
�[0m13:52:58.810906 [debug] [MainThread]: Partial parsing not enabled
�[0m13:52:59.797102 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e90e0d0>]}
�[0m13:52:59.810463 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e90ec40>]}
�[0m13:52:59.811174 [info ] [MainThread]: Found 1 model, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
�[0m13:52:59.811862 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e7780d0>]}
�[0m13:52:59.813453 [info ] [MainThread]: 
�[0m13:52:59.814843 [debug] [MainThread]: Acquiring new bigquery connection 'master'
�[0m13:52:59.816353 [debug] [ThreadPool]: Acquiring new bigquery connection 'list_cse-sandbox-319708_dbt_jyeo'
�[0m13:52:59.817247 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m13:53:38.182314 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'ed8c7ffd-a790-4e71-bce1-e0b393b1dd6a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e90ea90>]}
�[0m13:53:38.183569 [info ] [MainThread]: Concurrency: 1 threads (target='bq')
�[0m13:53:38.184176 [info ] [MainThread]: 
�[0m13:53:38.186768 [debug] [Thread-1  ]: Began running node model.my_dbt_project.foo
�[0m13:53:38.187680 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly list_cse-sandbox-319708_dbt_jyeo, now model.my_dbt_project.foo)
�[0m13:53:38.188213 [debug] [Thread-1  ]: Began compiling node model.my_dbt_project.foo
�[0m13:53:38.198170 [debug] [Thread-1  ]: Writing injected SQL for node "model.my_dbt_project.foo"
�[0m13:53:38.199540 [debug] [Thread-1  ]: Timing info for model.my_dbt_project.foo (compile): 13:53:38.188599 => 13:53:38.199182
�[0m13:53:38.200299 [debug] [Thread-1  ]: Began executing node model.my_dbt_project.foo
�[0m13:53:38.209907 [debug] [Thread-1  ]: Opening a new connection, currently in state closed
�[0m13:53:38.255157 [debug] [Thread-1  ]: On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.6.6", "profile_name": "all", "target_name": "bq", "node_id": "model.my_dbt_project.foo"} */

  
    select *
    from (
        select 
  struct<
    k array<
        struct<c1 int64, c2 json>
      >
  >(
    [
      struct(
        1 as c1, 
        to_json(struct(1 as a)) as c2
      )
    ]
  ) 
  as v
    ) as model_limit_subq
    limit 5


�[0m13:53:40.336402 [debug] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:046f9eb0-2b0f-4375-85a9-710a7598e7c5&page=queryresults
�[0m13:53:41.094942 [debug] [Thread-1  ]: Timing info for model.my_dbt_project.foo (execute): 13:53:38.201010 => 13:53:41.094509
�[0m13:53:41.096067 [error] [Thread-1  ]: �[31mUnhandled error while executing �[0m
'JSON'
�[0m13:53:41.111366 [debug] [Thread-1  ]: Traceback (most recent call last):
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/task/base.py", line 372, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/task/base.py", line 323, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/task/base.py", line 419, in run
    return self.execute(compiled_node, manifest)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/task/show.py", line 38, in execute
    adapter_response, execute_result = self.adapter.execute(
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 290, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/adapters/bigquery/connections.py", line 501, in execute
    table = self.get_table_from_response(iterator)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/adapters/bigquery/connections.py", line 429, in get_table_from_response
    return agate_helper.table_from_data_flat(resp, column_names)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/dbt/clients/agate_helper.py", line 121, in table_from_data_flat
    for _row in data:
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/api_core/page_iterator.py", line 209, in _items_iter
    for item in page:
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/api_core/page_iterator.py", line 131, in __next__
    result = self._item_to_value(self._parent, item)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/cloud/bigquery/table.py", line 2887, in _item_to_row
    _helpers._row_tuple_from_json(resource, iterator.schema),
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 435, in _row_tuple_from_json
    row_data.append(_field_from_json(cell["v"], field))
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 411, in _field_from_json
    return converter(resource, field)
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 370, in _record_from_json
    value = [converter(item["v"], subfield) for item in cell["v"]]
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 370, in <listcomp>
    value = [converter(item["v"], subfield) for item in cell["v"]]
  File "/Users/jeremy/src/dbt-basic/venv_dbt_1.6.latest/lib/python3.9/site-packages/google/cloud/bigquery/_helpers.py", line 368, in _record_from_json
    converter = _CELLDATA_FROM_JSON[subfield.field_type]
KeyError: 'JSON'

�[0m13:53:41.112646 [debug] [Thread-1  ]: Finished running node model.my_dbt_project.foo
�[0m13:53:41.113983 [debug] [MainThread]: Connection 'master' was properly closed.
�[0m13:53:41.115045 [debug] [MainThread]: Connection 'model.my_dbt_project.foo' was properly closed.
�[0m13:53:41.115889 [error] [MainThread]: Encountered an error:
Runtime Error
  'JSON'
�[0m13:53:41.117232 [debug] [MainThread]: Command `dbt show` failed at 13:53:41.117080 after 44.34 seconds
�[0m13:53:41.117807 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1047f6640>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11e766ee0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x126d7f340>]}
�[0m13:53:41.118344 [debug] [MainThread]: Flushing usage events

Environment

- OS: macOS
- Python: 3.9.13
- dbt-core: 1.6.6
- dbt-bigquery: 1.6.7

Additional Context

And here's the output in BQ for the same select:

image

And of course - previewing in the Cloud IDE has an unsurprising outcome:

image

@jeremyyeo jeremyyeo added bug Something isn't working triage labels Oct 18, 2023
@github-actions github-actions bot changed the title [Bug] dbt show doesn't work with json structs [ADAP-950] [Bug] dbt show doesn't work with json structs Oct 18, 2023
@dbeatty10
Copy link
Contributor

Thanks for another great write-up @jeremyyeo 🙌

This looks like it is a bug within the Python Client for Google BigQuery. Specifically: googleapis/python-bigquery#1500

One option, in the meantime

Here is a monkey patch that one user suggested:

from google.cloud import bigquery

import json

def _json_from_json(value, _):
    """NOOP string -> string coercion"""
    return json.loads(value)


bigquery._helpers._CELLDATA_FROM_JSON['JSON'] = _json_from_json

End users of dbt-bigquery couldn't utilize this monkey patch on their own, but we could choose to add it to dbt-bigquery until googleapis/python-bigquery#1500 is resolved.

@dbeatty10 dbeatty10 removed the triage label Oct 18, 2023
@matt-winkler
Copy link
Contributor

@dbeatty10 nice find - seems to work:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants