-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a JSON logger to the project #1237
Comments
Hey @cabouffard - thanks for the making this issue! I love the end result, but I wonder if there isn't a more structured way for dbt to generate these json logs. I had a chat in the dbt slack maybe a month ago about making dbt publish events over statsd to applications like DataDog. My reservation then, as now, is just that the universe of things that folks may want to send to their monitoring tool is unbounded in nature. I think many people will be happy if we send some sane defaults, but I can imagine other contextual attributes like env vars creating challenges for some use cases. The conclusion that I came to is that I don't think dbt should do something like this natively. Instead, I suggested that users who want to instrument rich event logging can wrap dbt, and fire precisely the events they need to their monitoring tool of choice. After reading this issue (and seeing the attached screenshot 😍) I think I'm now leaning more in the direction of structured logging. I'm hopeful that there's a way to do this without the regex hackery present in your example, but this is a topic that's maybe a little outside of my personal wheelhouse. So, thanks for getting the discussion started! I'll catch up with some of the Fishtown engineers and we'll get back to you here :) |
Structured logging would be incredible. I'm happy to contribute to this effort once decisions are made. |
+1 would love this feature |
@drewbanin I'm going to take a first pass at this. |
Also a command line option to dbt run and friends that would stream this to stdout instead of the freeform text would be fantastic |
Summary: Finally was able to make some time for this! This is a prototype-quality dbt integration, but it demonstrates what the shape of this would look like. I copied the example from https://github.com/fishtown-analytics/jaffle_shop/ This shells out to dbt itself and runs against the database in the examples docker container. (I had to manually create the database). It just parses stdout with regex's, which is quite fragile. It would be better if dbt emitted some sort of structured log so that this could be parsed more reliably. Taylor is excited enough about this possibility that he might tackle it. See dbt-labs/dbt-core#1237 This emits materializations for each view or table created in the example. This has not been thoroughly tested. Next Steps: - Parse materializations out of the dbt project and render them as outputs - Also create a type-per-model and render metadata within dagit. - Consume dbt tests during a run and emit Expectations. - Model "seeds" as inputs into the dbt node. Test Plan: Run jaffle example in dagit. Buildkite Reviewers: #ft, natekupp Reviewed By: #ft, natekupp Subscribers: natekupp Differential Revision: https://dagster.phacility.com/D795
Hey @cabouffard (cc @Aylr) - I just added this to the Louisa May Alcott (0.15.0) milestone! @cabouffard can you tell me more about how you're sending these log lines to DataDog? I played around with their API last week and wasn't able to get a good handle on how teams use the different parts of the app in production settings. Do you send Events? Or do you plug into statsd? Or something else? We're planning on adding a log handler which emits structured (json) logs to a file. Do you have a process which wraps dbt and scoops up these log lines currently? |
…arameter add log-format flag (#1237)
Feature description
I recently posted on the Slack channel a concern I had about the current logging functionality.
We're investing some time to see if dbt would resolve some of the problems we're faced with our current pipeline running Luigi. Thus far it's looking great ;)
In our previous pipeline, we had all the logs aggregated to a platform hosted online named Datadog. This allows us to better monitor our platform, save our logs for a longer period of time, keep the logs between deploy, add alerts on log errors and more.
The current logging format of dbt works very well for a human. It's well structured and it's easy on the eye to look for stuff, meanwhile, it gets a bit more complex to parse and make sense of it from the point of view of the machine.
Ideally, the logs would be formatted in a way that it can be ingested and acted on without much thinking, following a particular structure. This is where JSON come into play. Having the logs in a JSON format facilitates the integration to any log consumers such as Logstash and others.
I've played with the idea some bit and I realize that it's easier said than done, but hear me out. I've played with the idea some more in order to know how much work would be required to do so. I came up with a small prototype, it's messy, but you get the idea.
This prototype adds a dependency on json-log-formatter which would allow us to format the log into a JSON format without much trouble. Then, we would create a different handle that the log would report to and that handle would be able to Formatting and Filtering the logs to what is relevant for the machine. The implementation in the gist created a new file (to be renamed) and adds one line per log and that line is JSON formatted.
Here is an example output of the prototype:
It's noisy but contains a lot of goodies to use. Then that log can be ingested by any kind of tool and follows a structure that can be easily consumed by whatever!
When it comes to additional properties other than the default accessible via the record entry of the LogReport is currently less than ideal. Modifying how the current logs are formatted could be a tedious process. What I suggest, which is kind of hacky, is to parse the message via regular expressions and create the relevant key/value when necessary. This would add additional context to the log, this isn't perfect but it would be a good first step in my opinion.
Here is an example of the prototype where the logs are ingested in Datadog: https://cl.ly/8e681624e123 which makes me very happy. Keep in mind that this would solve for pretty much any log consumer tool, not only for Datadog. I'm basing this example on Datadog because that's what we use.
What do you folks think about the idea?
Let's have a discussion, thanks!
The text was updated successfully, but these errors were encountered: