Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a JSON logger to the project #1237

Closed
thisiscab opened this issue Jan 11, 2019 · 6 comments · Fixed by #1791
Closed

Add a JSON logger to the project #1237

thisiscab opened this issue Jan 11, 2019 · 6 comments · Fixed by #1791
Labels
enhancement New feature or request

Comments

@thisiscab
Copy link

thisiscab commented Jan 11, 2019

Feature description

I recently posted on the Slack channel a concern I had about the current logging functionality.

We're investing some time to see if dbt would resolve some of the problems we're faced with our current pipeline running Luigi. Thus far it's looking great ;)

In our previous pipeline, we had all the logs aggregated to a platform hosted online named Datadog. This allows us to better monitor our platform, save our logs for a longer period of time, keep the logs between deploy, add alerts on log errors and more.

The current logging format of dbt works very well for a human. It's well structured and it's easy on the eye to look for stuff, meanwhile, it gets a bit more complex to parse and make sense of it from the point of view of the machine.

Ideally, the logs would be formatted in a way that it can be ingested and acted on without much thinking, following a particular structure. This is where JSON come into play. Having the logs in a JSON format facilitates the integration to any log consumers such as Logstash and others.

I've played with the idea some bit and I realize that it's easier said than done, but hear me out. I've played with the idea some more in order to know how much work would be required to do so. I came up with a small prototype, it's messy, but you get the idea.

This prototype adds a dependency on json-log-formatter which would allow us to format the log into a JSON format without much trouble. Then, we would create a different handle that the log would report to and that handle would be able to Formatting and Filtering the logs to what is relevant for the machine. The implementation in the gist created a new file (to be renamed) and adds one line per log and that line is JSON formatted.

Here is an example output of the prototype:

{"dbt_table": "redacted_retail_editors", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.relationships_dim_cashier_redacted_retail_editor_id__id__ref_redacted_retail_editors_' (models/dims/dims.yml) depends on model 'redacted_retail_editors' which was not found", "lineno": 451, "created": 1547235404.175718, "processName": "MainProcess", "relativeCreated": 11235.170364379883, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 175.7180690765381, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,175", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "stg_redacted_user", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.not_null_stg_redacted_user_created_at' (models/staging/redacted/redacted_stg.yml) depends on model 'stg_redacted_user' which was not found", "lineno": 451, "created": 1547235404.1786635, "processName": "MainProcess", "relativeCreated": 11238.115787506104, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 178.6634922027588, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,178", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "stg_redacted_user", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.not_null_stg_redacted_user_sha1_email' (models/staging/redacted/redacted_stg.yml) depends on model 'stg_redacted_user' which was not found", "lineno": 451, "created": 1547235404.1820805, "processName": "MainProcess", "relativeCreated": 11241.532802581787, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 182.08050727844238, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,182", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "stg_redacted_user", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.not_null_stg_redacted_user_md5_email' (models/staging/redacted/redacted_stg.yml) depends on model 'stg_redacted_user' which was not found", "lineno": 451, "created": 1547235404.18593, "processName": "MainProcess", "relativeCreated": 11245.382308959961, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 185.9300136566162, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,185", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "stg_redacted_user", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.not_null_stg_redacted_user_updated_at' (models/staging/redacted/redacted_stg.yml) depends on model 'stg_redacted_user' which was not found", "lineno": 451, "created": 1547235404.1874862, "processName": "MainProcess", "relativeCreated": 11246.938467025757, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 187.4861717224121, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,187", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "stg_redacted_user", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.not_null_stg_redacted_user_last_name' (models/staging/redacted/redacted_stg.yml) depends on model 'stg_redacted_user' which was not found", "lineno": 451, "created": 1547235404.1899438, "processName": "MainProcess", "relativeCreated": 11249.396085739136, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 189.94379043579102, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,189", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "redacted_users", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.relationships_dim_customer_redacted_user_id__id__ref_redacted_users_' (models/dims/dims.yml) depends on model 'redacted_users' which was not found", "lineno": 451, "created": 1547235404.1973617, "processName": "MainProcess", "relativeCreated": 11256.814002990723, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 197.36170768737793, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,197", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "stg_redacted_user", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.not_null_stg_redacted_user_first_name' (models/staging/redacted/redacted_stg.yml) depends on model 'stg_redacted_user' which was not found", "lineno": 451, "created": 1547235404.2026784, "processName": "MainProcess", "relativeCreated": 11262.130737304688, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 202.67844200134277, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,202", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "redacted_users", "process": 244, "filename": "utils.py", "dbt_project": "redacted_pipeline", "levelname": "WARNING", "message": "WARNING: Test 'test.redacted_pipeline.relationships_dim_cashier_redacted_user_id__id__ref_redacted_users_' (models/dims/dims.yml) depends on model 'redacted_users' which was not found", "lineno": 451, "created": 1547235404.2049177, "processName": "MainProcess", "relativeCreated": 11264.36996459961, "funcName": "invalid_ref_fail_unless_test", "threadName": "MainThread", "module": "utils", "name": "dbt", "msecs": 204.91766929626465, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/utils.py", "asctime": "2019-01-11 19:36:44,204", "thread": 140487128296232, "levelno": 30}
{"dbt_table": "com", "process": 244, "filename": "deprecations.py", "dbt_project": "getdbt", "levelname": "INFO", "message": "* Deprecation Warning: The `sql_where` option for incremental models is deprecated and will be\n  removed in a future release. Check the docs for more information\n\n  https://docs.getdbt.com/docs/configuring-incremental-models\n  \n", "lineno": 12, "created": 1547235404.2738883, "processName": "MainProcess", "relativeCreated": 11333.340644836426, "funcName": "show", "threadName": "MainThread", "module": "deprecations", "name": "dbt", "msecs": 273.88834953308105, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/deprecations.py", "asctime": "2019-01-11 19:36:44,273", "thread": 140487128296232, "levelno": 20}
{"process": 244, "filename": "compilation.py", "levelname": "INFO", "message": "Found 155 macros, 13 archives, 161 tests, 5 seed files, 0 analyses, 23 models, 0 operations", "lineno": 48, "created": 1547235404.342669, "processName": "MainProcess", "relativeCreated": 11402.121305465698, "funcName": "print_compile_stats", "threadName": "MainThread", "module": "compilation", "name": "dbt", "msecs": 342.6690101623535, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/compilation.py", "asctime": "2019-01-11 19:36:44,342", "thread": 140487128296232, "levelno": 20}
{"process": 244, "filename": "runner.py", "levelname": "INFO", "message": "", "lineno": 235, "created": 1547235404.3844714, "processName": "MainProcess", "relativeCreated": 11443.923711776733, "funcName": "run", "threadName": "MainThread", "module": "runner", "name": "dbt", "msecs": 384.4714164733887, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/runner.py", "asctime": "2019-01-11 19:36:44,384", "thread": 140487128296232, "levelno": 20}
{"process": 244, "filename": "main.py", "levelname": "INFO", "message": "ctrl-c", "lineno": 84, "created": 1547235418.0521727, "processName": "MainProcess", "relativeCreated": 25111.62495613098, "funcName": "main", "threadName": "MainThread", "module": "main", "name": "dbt", "msecs": 52.17266082763672, "pathname": "/usr/lib/python3.5/site-packages/dbt_core-0.13.0a1-py3.5.egg/dbt/main.py", "asctime": "2019-01-11 19:36:58,052", "thread": 140487128296232, "levelno": 20}

It's noisy but contains a lot of goodies to use. Then that log can be ingested by any kind of tool and follows a structure that can be easily consumed by whatever!

When it comes to additional properties other than the default accessible via the record entry of the LogReport is currently less than ideal. Modifying how the current logs are formatted could be a tedious process. What I suggest, which is kind of hacky, is to parse the message via regular expressions and create the relevant key/value when necessary. This would add additional context to the log, this isn't perfect but it would be a good first step in my opinion.

Here is an example of the prototype where the logs are ingested in Datadog: https://cl.ly/8e681624e123 which makes me very happy. Keep in mind that this would solve for pretty much any log consumer tool, not only for Datadog. I'm basing this example on Datadog because that's what we use.

What do you folks think about the idea?
Let's have a discussion, thanks!

@drewbanin
Copy link
Contributor

Hey @cabouffard - thanks for the making this issue! I love the end result, but I wonder if there isn't a more structured way for dbt to generate these json logs.

I had a chat in the dbt slack maybe a month ago about making dbt publish events over statsd to applications like DataDog. My reservation then, as now, is just that the universe of things that folks may want to send to their monitoring tool is unbounded in nature. I think many people will be happy if we send some sane defaults, but I can imagine other contextual attributes like env vars creating challenges for some use cases.

The conclusion that I came to is that I don't think dbt should do something like this natively. Instead, I suggested that users who want to instrument rich event logging can wrap dbt, and fire precisely the events they need to their monitoring tool of choice.

After reading this issue (and seeing the attached screenshot 😍) I think I'm now leaning more in the direction of structured logging. I'm hopeful that there's a way to do this without the regex hackery present in your example, but this is a topic that's maybe a little outside of my personal wheelhouse.

So, thanks for getting the discussion started! I'll catch up with some of the Fishtown engineers and we'll get back to you here :)

@Aylr
Copy link

Aylr commented May 15, 2019

Structured logging would be incredible. I'm happy to contribute to this effort once decisions are made.

@schrockn
Copy link

+1 would love this feature

@Aylr
Copy link

Aylr commented Aug 10, 2019

@drewbanin I'm going to take a first pass at this.

@schrockn
Copy link

schrockn commented Aug 10, 2019

Also a command line option to dbt run and friends that would stream this to stdout instead of the freeform text would be fantastic

schrockn added a commit to dagster-io/dagster that referenced this issue Aug 13, 2019
Summary:
Finally was able to make some time for this! This is a
prototype-quality dbt integration, but it demonstrates what the shape of
this would look like.

I copied the example from https://github.com/fishtown-analytics/jaffle_shop/

This shells out to dbt itself and runs against the database in the
examples docker container. (I had to manually create the database). It
just parses stdout with regex's, which is quite fragile.

It would be better if dbt emitted some sort of structured log so that
this could be parsed more reliably. Taylor is excited enough about this
possibility that he might tackle it. See dbt-labs/dbt-core#1237

This emits materializations for each view or table created in the
example. This has not been thoroughly tested.

Next Steps:
  - Parse materializations out of the dbt project and render them as
  outputs
  - Also create a type-per-model and render metadata within dagit.
  - Consume dbt tests during a run and emit Expectations.
  - Model "seeds" as inputs into the dbt node.

Test Plan: Run jaffle example in dagit. Buildkite

Reviewers: #ft, natekupp

Reviewed By: #ft, natekupp

Subscribers: natekupp

Differential Revision: https://dagster.phacility.com/D795
@drewbanin drewbanin added this to the Louisa May Alcott milestone Aug 26, 2019
@drewbanin
Copy link
Contributor

Hey @cabouffard (cc @Aylr) - I just added this to the Louisa May Alcott (0.15.0) milestone!

@cabouffard can you tell me more about how you're sending these log lines to DataDog? I played around with their API last week and wasn't able to get a good handle on how teams use the different parts of the app in production settings. Do you send Events? Or do you plug into statsd? Or something else?

We're planning on adding a log handler which emits structured (json) logs to a file. Do you have a process which wraps dbt and scoops up these log lines currently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants