You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Presently, logging in the rpc server is a mixed bag:
some logs make it into stdout on the server, or the logs/dbt.log file, or the server response, or some combination thereof
some logs are structured in nature while others are not
If the /run endpoint is queried, the table response key is logged to stdout, the logs/dbt.log file, and the server response
I made some recommendations below, but there's really a whole lot to account for here! The guiding principles are:
Ensure that the stdout logs from the rpc server contain enough information to inform the
a user with access to stdout what's happening in real-ish time
We don't want to wait until the request is complete to show in-progress log information
Make the stdout logs machine-readable so that they can be parsed and processed by an external process
Make it possible to correlate log lines with an original request
This will help with debugging from the logs
This will help filter the logs relevant to a given request
Limit the amount of request/response data we log to stdout. We want to record logs, not data
Recommendations:
All logging to stdout for the rpc server should be "structured" in nature
The logs/dbt.log file and the stdout logs for the rpc server should be equivalent (this may already be the case)
Compilation warnings should be surfaced in the /status endpoint if they occur during recompilation or server startup
dbt should return all of the logs that are recorded during startup/recompilation in the /status endpoint
The full request and response body should not be printed to stdout or logs/dbt.log - we can instead log a subset of this information
Request: I think just the path and method are appropriate, but we can be less restrictive if it would be helpful
Response: dbt should log the logs key in the Response object to stdout and logs/dbt.log as those log lines happen. We'll need to add some sort of identifier (the request id?) to the log handler to make sure that we can correlate log lines back to an original request. The status code should be logged in the Response log line.
Scoping
There is a lot of work to do to make dbt's internal stdout/ui printing more conducive to structured logging. That's going to be its own project, and we should do it, but we should do it in a separate PR.
A vignette of how I think this could look is provided below.
Before:
WARNING: Test 'test.my_new_package.unique_nope_id' (models/schema.yml) depends on model 'nope' which was not found
WARNING: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.debug
Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources
13:30:13 | Concurrency: 8 threads (target='dev')
13:30:13 |
13:30:14 | Done.
Received request (<Request 'http://localhost:8580/jsonrpc/' [POST]>) from 127.0.0.1, data=b'{"jsonrpc": "2.0", "method": "run", "id": 1, "params": {"timeout": null, "sql": "c2VsZWN0IDEgYXMgaWQ=", "name": 'foo'}}'
Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources, 1 None
sending response (<Response 2476 bytes [200 OK]>) to 127.0.0.1, data={'result': {'raw_sql': 'select 1 as id', 'compiled_sql': 'select 1 as id', 'node': {'raw_sql': 'select 1 as id', 'database': 'analytics', 'schema': 'dev', 'fqn': ['my_new_package', 'rpc', 'foo'], 'unique_id': 'rpc.my_new_package.foo', 'package_name': 'my_new_package', 'root_path': '/Users/drew/fishtown/clients/debug', 'path': 'rpc/foo', 'original_file_path': 'from remote system', 'name': 'foo', 'resource_type': 'rpc', 'alias': 'drew_demo_schema_foo', 'config': {'enabled': True, 'materialized': 'view', 'persist_docs': {}, 'post-hook': [], 'pre-hook': [], 'vars': {}, 'quoting': {}, 'column_types': {}, 'tags': []}, 'tags': [], 'refs': [], 'sources': [], 'depends_on': {'nodes': [], 'macros': []}, 'docrefs': [], 'description': '', 'columns': {}, 'compiled': True, 'compiled_sql': 'select 1 as id', 'extra_ctes_injected': True, 'extra_ctes': [], 'injected_sql': 'select 1 as id', 'wrapped_sql': 'None'}, 'timing': [{'name': 'compile', 'started_at': '2019-08-27T17:33:12.276822Z', 'completed_at': '2019-08-27T17:33:12.295340Z'}, {'name': 'execute', 'started_at': '2019-08-27T17:33:12.295359Z', 'completed_at': '2019-08-27T17:33:12.299549Z'}], 'table': {'column_names': ['id'], 'rows': [[1.0]]}, 'logs': [{'message': 'Acquiring new postgres connection 'foo'.', 'timestamp': '2019-08-27 13:33:12,083', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Opening a new connection, currently in state init', 'timestamp': '2019-08-27 13:33:12,084', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources, 1 None', 'timestamp': '2019-08-27 13:33:12,269', 'levelname': 'NOTICE', 'level': 15}, {'message': 'Acquiring new postgres connection 'foo'.', 'timestamp': '2019-08-27 13:33:12,270', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Opening a new connection, currently in state init', 'timestamp': '2019-08-27 13:33:12,271', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Compiling rpc.my_new_package.foo', 'timestamp': '2019-08-27 13:33:12,276', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Using postgres connection 'foo'.', 'timestamp': '2019-08-27 13:33:12,295', 'levelname': 'DEBUG', 'level': 10}, {'message': 'On foo: select 1 as id', 'timestamp': '2019-08-27 13:33:12,296', 'levelname': 'DEBUG', 'level': 10}, {'message': 'SQL status: SELECT 1 in 0.00 seconds', 'timestamp': '2019-08-27 13:33:12,297', 'levelname': 'DEBUG', 'level': 10}]}, 'id': 1, 'jsonrpc': '2.0'}
See also #1237
Presently, logging in the rpc server is a mixed bag:
logs/dbt.log
file, or the server response, or some combination thereof/run
endpoint is queried, thetable
response key is logged to stdout, the logs/dbt.log file, and the server responseI made some recommendations below, but there's really a whole lot to account for here! The guiding principles are:
a user with access to stdout what's happening in real-ish time
Recommendations:
logs/dbt.log
file and the stdout logs for the rpc server should be equivalent (this may already be the case)/status
endpoint if they occur during recompilation or server startup/status
endpointlogs/dbt.log
- we can instead log a subset of this informationlogs
key in the Response object to stdout andlogs/dbt.log
as those log lines happen. We'll need to add some sort of identifier (the requestid
?) to the log handler to make sure that we can correlate log lines back to an original request. The status code should be logged in the Response log line.Scoping
There is a lot of work to do to make dbt's internal stdout/ui printing more conducive to structured logging. That's going to be its own project, and we should do it, but we should do it in a separate PR.
A vignette of how I think this could look is provided below.
Before:
After:
The text was updated successfully, but these errors were encountered: