Log test node metadata: depends_on_nodes #2316

dcereijodo · 2020-04-13T08:29:00Z

Describe the feature

Currently the extra field from the structured log record is providing unique_id and run_state for test nodes.

{"timestamp": "2020-04-13T08:01:21.445113Z", "message": "10:01:21 | 2 of 385 START test assert_nnull_refid............................... [RUN]", "channel": "dbt", "level": 11, "levelname": "INFO", "thread_name": "Thread-6", "process": 13649, "extra": {"unique_id": "test.pmt_curated.assert_nnull_refid", "run_state": "running"}}

Include additional metadata field depends_on_nodes from the test node

{"timestamp": "2020-04-13T08:01:21.445113Z", "message": "10:01:21 | 2 of 385 START test assert_nnull_refid............................... [RUN]", "channel": "dbt", "level": 11, "levelname": "INFO", "thread_name": "Thread-6", "process": 13649, "extra": {"unique_id": "test.pmt_curated.assert_nnull_refid", "run_state": "running", "depends_on_models": ["modelA", "modelB"]}}

Who will this benefit?

People analyzing the logs for getting pipeline metrics can relate a failed test to a model or set of models.

Also asked in in Slack.

The text was updated successfully, but these errors were encountered:

dcereijodo · 2020-04-13T08:29:51Z

I'd be happy to help with this if you think it's worth it.

drewbanin · 2020-04-14T02:24:10Z

Sure, I think this is a good idea! I would just call it depends_on instead of depends_on_models, since the test could conceivably be run against a seed, source, or snapshot too.

We'd love a PR for this one - let us know if there's anything we can help out with when you get to implementing :)

dcereijodo · 2020-04-18T09:44:15Z

Hey! I saw the depends_on_models attribute can be easily logged with a minimal change (check #2341 ). This will log the dependencies in the Began running node stage, but not in the test report as it was pictured in the issue description.

{"timestamp":"2020-04-18T07:54:45.832149Z","message":"Began running node test.pmt_curated.unique_request_ref_id","channel":"dbt","level":10,"levelname":"DEBUG","thread_name":"Thread-1","process":58947,"extra":{"node_alias":"unique_request_ref_id","node_schema":"None","node_database":"dwh","node_path":"models/marts/pmt_analytics/request/request.yml","node_name":"unique_request_ref_id","resource_type":"test","depends_on":["model.pmt_curated.request"],"node_index":28,"node_materialized":"view","node_started_at":"2020-04-18T07:54:45.832192","unique_id":"test.pmt_curated.unique_request_ref_id","run_state":"running"}}

I am wondering if I should leave it like this or I should add the depends_on in the test report line as well. If that's the case, could you maybe point me at which level you see this should be resolved? I was thinking somewhere in the call_runner function
https://github.com/fishtown-analytics/dbt/blob/c05b45b241b6394222675e63b98dfb19d4633023/core/dbt/task/runnable.py#L155
or in the TestRunner
https://github.com/fishtown-analytics/dbt/blob/c05b45b241b6394222675e63b98dfb19d4633023/core/dbt/node_runners.py#L544
but I can definitely use some guidance here.

drewbanin · 2020-04-21T21:33:42Z

I think we'd probably want to make a new log context manager (similar to this one called DependsOn (or something like it). This handler would need to implement a mapping_keys method which contains something like:

('depends_on_nodes', 'depends_on'),

So, pretty similar to what you've implemented in #2341

Once the DependsOn handler exists, you'd want to use it as a context manager in the GraphRunnableTask wrapping the end-of-test debug log line.

I think we could alternatively just use extended_metadata as an additional context manager here in conjunction with the change you've already made in #2341.

@beckjake do you have any further thoughts / preferences / ideas on this one?

beckjake · 2020-04-21T22:54:52Z

I don't really have strong feelings about it. That ModelMetadata is only used once, at the beginning of running a model (rather than the end), which seems acceptable to me. I was pretty satisfied with #2341. Is the concern around logging this information when we're not running a test? Because that doesn't seem so awful to me.

drewbanin · 2020-04-21T23:20:36Z

I think the goal here is to get depends_on in the same log line as the test failure. Right now, the node metadata is going to show up in the log line for the node start, but not the log line for the node end. I imagine pushing the depends_on list in the same log line as the ultimate status info is the desired goal here

dcereijodo · 2020-04-22T06:14:58Z

If it helps with the discussion around the scope, logging at the beginning of running a model already solves my use case. If logging at the test report line as well is preferable, I have no problem to add the change there too. Using the extended_metadata sounds good to me in this case.

dcereijodo added enhancement New feature or request triage labels Apr 13, 2020

drewbanin removed the triage label Apr 14, 2020

dcereijodo mentioned this issue Apr 18, 2020

log 'depends_on_nodes' node attribute in the log record 'extra' field #2341

Merged

4 tasks

beckjake closed this as completed in #2341 Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log test node metadata: depends_on_nodes #2316

Log test node metadata: depends_on_nodes #2316

dcereijodo commented Apr 13, 2020

dcereijodo commented Apr 13, 2020

drewbanin commented Apr 14, 2020

dcereijodo commented Apr 18, 2020 •

edited

Loading

drewbanin commented Apr 21, 2020

beckjake commented Apr 21, 2020

drewbanin commented Apr 21, 2020

dcereijodo commented Apr 22, 2020

Log test node metadata: depends_on_nodes #2316

Log test node metadata: depends_on_nodes #2316

Comments

dcereijodo commented Apr 13, 2020

Describe the feature

Who will this benefit?

dcereijodo commented Apr 13, 2020

drewbanin commented Apr 14, 2020

dcereijodo commented Apr 18, 2020 • edited Loading

drewbanin commented Apr 21, 2020

beckjake commented Apr 21, 2020

drewbanin commented Apr 21, 2020

dcereijodo commented Apr 22, 2020

dcereijodo commented Apr 18, 2020 •

edited

Loading