Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-738] [Enhancement] dbt debug command not outputting logs correctly #5353

Closed
1 task done
UselessAlias opened this issue Jun 9, 2022 · 6 comments · Fixed by #6541
Closed
1 task done

[CT-738] [Enhancement] dbt debug command not outputting logs correctly #5353

UselessAlias opened this issue Jun 9, 2022 · 6 comments · Fixed by #6541
Labels
enhancement New feature or request logging

Comments

@UselessAlias
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When running the command dbt debug the logs are not output correctly. When the command is run with json log formatting turned on the log entries are still printed to the console in the same textual way. It also appears they are not output to the logs/dbt.log file either. When --debug is turned on alongside the json formatter then only the debug level logs are output using the json format.

Expected Behavior

It would be good to have the loglines of the debug command output using the json formatter for all levels of logs. We use dbt debug as an pre run test to check for any connections issues. We would like to be able to parse the logs for any lines with 'level': 'error' across our entire process, but this connection test doesn't get captured because of the lack of json logging.

Steps To Reproduce

Simply run dbt --log-format=json debug in any situation. With --debug included the difference becomes even more apparent.

Relevant log output

I've removed some pieces/lines which contain sensitive information.


{"code": "A001", "data": {"v": "=1.1.0"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "info", "log_version": 2, "msg": "Running with dbt=1.1.0", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.708619Z", "type": "log_line"}
{"code": "A003", "data": {"user_state": "tracking"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Tracking: tracking", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.711659Z", "type": "log_line"}
{"code": "Z040", "data": {"kwargs": "{'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d827e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d827e80>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d8f1fa0>]}"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d827e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d827e80>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d8f1fa0>]}", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.713623Z", "type": "log_line"}
dbt version: 1.1.0
python version: 3.8.13
python path: /usr/local/bin/python
os info: Linux-5.10.104-linuxkit-x86_64-with-glibc2.2.5
Using profiles.yml file at /root/.dbt/profiles.yml
Using dbt_project.yml file at /usr/app/dbt_dir/dbt_project.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
{"code": "Z006", "data": {"cmd": ["git", "--help"]}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Executing \"git --help\"", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.827315Z", "type": "log_line"}
{"code": "Z008", "data": {"bmsg": ""}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "STDERR: \"b''\"", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.836976Z", "type": "log_line"}
 - git [OK found]

Connection:
  host: 
  port: 443
  cluster: 
  endpoint: None
  schema: 
  organization: 0
{"code": "E005", "data": {"conn_name": "debug", "conn_type": "spark"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Acquiring new spark connection \"debug\"", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.845000Z", "type": "log_line"}
{"code": "E015", "data": {"conn_name": "debug", "conn_type": "spark"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Using spark connection \"debug\"", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.846550Z", "type": "log_line"}
{"code": "E016", "data": {"conn_name": "debug", "sql": "select 1 as id"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "On debug: select 1 as id", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.847964Z", "type": "log_line"}
{"code": "E037", "data": {"connection_state": "init"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Opening a new connection, currently in state init", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T13:58:50.849569Z", "type": "log_line"}
{"code": "E001", "data": {"args": [], "base_msg": "Error while running:\nselect 1 as id", "name": "Spark"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Spark adapter: Error while running:\nselect 1 as id", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T14:00:54.908987Z", "type": "log_line"}
  Connection test: [ERROR]

1 check failed:
dbt was unable to connect to the specified database.
The database returned the following error:

  >Object of type FailedToConnectException is not JSON serializable

Check your database credentials and try again. For more information, visit:
https://docs.getdbt.com/docs/configure-your-profile


{"code": "Z040", "data": {"kwargs": "{'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d6462b0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d62da60>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d62dd90>]}"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d6462b0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d62da60>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb85d62dd90>]}", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T14:00:54.913884Z", "type": "log_line"}
{"code": "Z042", "data": {}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Flushing usage events", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T14:00:54.914749Z", "type": "log_line"}
{"code": "E008", "data": {"conn_name": "debug"}, "invocation_id": "27f7d084-0279-4455-8a5e-b937004b5930", "level": "debug", "log_version": 2, "msg": "Connection 'debug' was properly closed.", "pid": 13, "thread_name": "MainThread", "ts": "2022-06-09T14:00:55.472184Z", "type": "log_line"}

Environment

- OS:Linux-5.10.104-linuxkit-x86_64-with-glibc2.2.5
- Python: 3.8.13
- dbt:1.1.0

What database are you using dbt with?

other (mention it in "Additional Context")

Additional Context

databricks

@UselessAlias UselessAlias added bug Something isn't working triage labels Jun 9, 2022
@github-actions github-actions bot changed the title [Bug] dbt debug command not outputting logs correctly [CT-738] [Bug] dbt debug command not outputting logs correctly Jun 9, 2022
@jtcohen6 jtcohen6 added enhancement New feature or request Team:Language logging and removed bug Something isn't working triage labels Jun 10, 2022
@jtcohen6
Copy link
Contributor

@UselessAlias Thanks for opening, you're right!

The debug task does a lot of printing today, rather than real logging. That's why it shows up the same regardless of --log-format, since these aren't actually events / logs.

print("dbt version: {}".format(version))
print("python version: {}".format(sys.version.split()[0]))
print("python path: {}".format(sys.executable))
print("os info: {}".format(platform.platform()))
print("Using profiles.yml file at {}".format(self.profile_path))
print("Using dbt_project.yml file at {}".format(self.project_path))

(there's more, that's just a taste)

I agree these should be real events (and thereby real logs). I see at least four, though maybe it should be more to better differentiate between success and error:

  • DebugEnvironmentDetails
  • DebugDependenciesResult
  • DebugConfigurationResult
  • DebugConnectionResult

I'm going to relabel this as an enhancement rather than a bug. I agree that it's really a prerequisite for making the debug task usable by machines, in addition to human beings trying to debug their local CLI.

@leahwicz leahwicz changed the title [CT-738] [Bug] dbt debug command not outputting logs correctly [CT-738] [Enhancement] dbt debug command not outputting logs correctly Jun 14, 2022
@leahwicz
Copy link
Contributor

AC: Go through and check for all print statements and change them structured log

@emmyoop
Copy link
Member

emmyoop commented Sep 22, 2022

Can also remove the method in this test and use run_dbt_and_capture instead.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 6, 2022

AC: Go through and check for all print statements and change them structured log

We may or may not want to leave print in place for dbt ls here:

print(result)

@VersusFacit
Copy link
Contributor

VersusFacit commented Jan 7, 2023

A PR for this is up :>

Also, I snuck in Emily's suggestion as best I could (required a tiny bit of editing on the "back end") of our test framework.

@VersusFacit
Copy link
Contributor

VersusFacit commented Jan 8, 2023

@jtcohen6 Seeing your comment, I went through the code of list and did some refactoring for filtering out messages which nets us the following behavior. First command was dbt list and the second dbt --log-format json list | jq '.msg'.
Screen Shot 2023-01-07 at 5 34 01 PM

The structured json logs are info type; just lists the name, no extra messages ahead of time. Achieving this particular behavior requires refactoring some event logger mutations.

Is this a good final behavior or would you like to see a different design implemented?

=======

per Leah's comment, there remains only the following prints in all of core/dbt/task/:

  1. the list for basic printing of file names, which I don't see utility in trying to alter
  2. the error message for when there are no available adapters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants