Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run dbt docs generate with JSON logs #115

Closed
robertf-b opened this issue Jun 17, 2022 · 8 comments
Closed

Cannot run dbt docs generate with JSON logs #115

robertf-b opened this issue Jun 17, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@robertf-b
Copy link

Describe the bug

When running dbt docs generate with JSON logs enabled I receive an error: Encountered an error while generating catalog: Object of type DatabricksRelation is not JSON serializable.
This occurs when using dbt-databricks 1.1.0 on all of locally (Windows), Docker (Linux) and the preview Databricks dbt task type.
It does not occur in earlier versions.
It does not occur with the default log format.

Steps To Reproduce

  1. Run dbt --log-format json docs generate

Expected behavior

Doc site generates correctly, with JSON logs.

Screenshots and log output

{"code": "E044", "data": {}, "invocation_id": "4240ec3d-ef2f-4772-8540-d4d522a8c717", "level": "info", "log_version": 2, "msg": "Building catalog", "pid": 1628, "thread_name": "MainThread", "ts": "2022-05-25T14:22:49.779873Z", "type": "log_line"}
{"code": "Z046", "data": {"log_fmt": null, "msg": "Encountered an error while generating catalog: Object of type DatabricksRelation is not JSON serializable"}, "invocation_id": "4240ec3d-ef2f-4772-8540-d4d522a8c717", "level": "warn", "log_version": 2, "msg": "Encountered an error while generating catalog: Object of type DatabricksRelation is not JSON serializable", "pid": 1628, "thread_name": "MainThread", "ts": "2022-05-25T14:22:49.783320Z", "type": "log_line"}
{"code": "E041", "data": {"num_exceptions": 1}, "invocation_id": "4240ec3d-ef2f-4772-8540-d4d522a8c717", "level": "error", "log_version": 2, "msg": "dbt encountered 1 failure while writing the catalog", "pid": 1628, "thread_name": "MainThread", "ts": "2022-05-25T14:22:49.792754Z", "type": "log_line"}

System information

Windows:

Core:
  - installed: 1.1.0
  - latest:    1.1.1 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.1.0 - Up to date!
  - spark:      1.1.0 - Up to date!

The operating system you're using:
Windows/Linux/Databricks

The output of python --version:
Windows:
Python 3.9.0

Additional context

@robertf-b robertf-b added the bug Something isn't working label Jun 17, 2022
@bilalaslamseattle
Copy link
Collaborator

@ueshin @allisonwang-db I can repro this on my laptop, too.

@ueshin
Copy link
Collaborator

ueshin commented Jun 17, 2022

@jtcohen6 Could you help take a look at this issue?

Seems like SparkRelation or even BaseRelation are not serializable.

>>> json.dumps(SparkRelation.create(schema='a', identifier='b'))
...
TypeError: Object of type SparkRelation is not JSON serializable

>>> json.dumps(BaseRelation.create(schema='a', identifier='b'))
Traceback (most recent call last):
...
TypeError: Object of type BaseRelation is not JSON serializable

Thanks.

@ueshin
Copy link
Collaborator

ueshin commented Jun 17, 2022

Seems like it tries to show more logs in 1.1 than 1.0, that is breaking the command.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 17, 2022

I'm able to reproduce this locally with the latest dbt-databricks + dbt-core. I'm trying to figure out where in the methods called by docs generate a DatabricksRelation is being passed directly into a log message, and then formatted into JSON, without any intermediate serialization steps.

BaseRelation (and thereby SparkRelation + DatabricksRelation) are not directly JSON serializable, but they can be converted to dictionaries that are, via the to_dict() method of the parent class dbtClassMixin:

>>> import json
>>> from dbt.adapters.base import BaseRelation
>>> json.dumps(BaseRelation.create(schema='a', identifier='b').to_dict())
'{"path": {"database": null, "schema": "a", "identifier": "b"}, "type": null, "quote_character": "\\"", "include_policy": {"database": true, "schema": true, "identifier": true}, "quote_policy": {"database": true, "schema": true, "identifier": true}, "dbt_created": false}'

@jtcohen6
Copy link
Contributor

I found a simple fix for this, but I'd be curious to get @nathaniel-may's take on it before merging

@bilalaslamseattle
Copy link
Collaborator

@jtcohen6 any update on this? I'd love to close this out in the next point release.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 22, 2022

@bilalaslamseattle Thanks for flagging this again.

We've seen multiple issues in this category, across multiple adapters, and we think there exists a general-purpose solution that will be the right move longer-term: dbt-labs/dbt-core#5436

The work for that is definitely on our radar. If it appears that the general-purpose resolution will be too complex, we can put a one-off patch for this in dbt-spark, to unblock the user here.

(cc @nathaniel-may)

@VersusFacit
Copy link
Contributor

VersusFacit commented Oct 20, 2022

Hiya people on the thread.

Per this core PR, this JSON serialization bug should be solved across all adapters. There's a lot of layers of indirection in the logger call stack, so finding the root cause of this error took us some concerted time. I also threw on our PRs backlog tags, so in theory, you should be able to "seamlessly" integrate the fix into your env on the new release. (it's also live in main)

I'd love to close this if (🤞) people report things working here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants