Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log test node metadata: depends_on_nodes #2316

Closed
dcereijodo opened this issue Apr 13, 2020 · 7 comments · Fixed by #2341
Closed

Log test node metadata: depends_on_nodes #2316

dcereijodo opened this issue Apr 13, 2020 · 7 comments · Fixed by #2341
Labels
enhancement New feature or request

Comments

@dcereijodo
Copy link
Contributor

Describe the feature

Currently the extra field from the structured log record is providing unique_id and run_state for test nodes.

{"timestamp": "2020-04-13T08:01:21.445113Z", "message": "10:01:21 | 2 of 385 START test assert_nnull_refid............................... [RUN]", "channel": "dbt", "level": 11, "levelname": "INFO", "thread_name": "Thread-6", "process": 13649, "extra": {"unique_id": "test.pmt_curated.assert_nnull_refid", "run_state": "running"}}

Include additional metadata field depends_on_nodes from the test node

{"timestamp": "2020-04-13T08:01:21.445113Z", "message": "10:01:21 | 2 of 385 START test assert_nnull_refid............................... [RUN]", "channel": "dbt", "level": 11, "levelname": "INFO", "thread_name": "Thread-6", "process": 13649, "extra": {"unique_id": "test.pmt_curated.assert_nnull_refid", "run_state": "running", "depends_on_models": ["modelA", "modelB"]}}

Who will this benefit?

People analyzing the logs for getting pipeline metrics can relate a failed test to a model or set of models.

Also asked in in Slack.

@dcereijodo dcereijodo added enhancement New feature or request triage labels Apr 13, 2020
@dcereijodo
Copy link
Contributor Author

I'd be happy to help with this if you think it's worth it.

@drewbanin drewbanin removed the triage label Apr 14, 2020
@drewbanin
Copy link
Contributor

Sure, I think this is a good idea! I would just call it depends_on instead of depends_on_models, since the test could conceivably be run against a seed, source, or snapshot too.

We'd love a PR for this one - let us know if there's anything we can help out with when you get to implementing :)

@dcereijodo
Copy link
Contributor Author

dcereijodo commented Apr 18, 2020

Hey! I saw the depends_on_models attribute can be easily logged with a minimal change (check #2341 ). This will log the dependencies in the Began running node stage, but not in the test report as it was pictured in the issue description.

{"timestamp":"2020-04-18T07:54:45.832149Z","message":"Began running node test.pmt_curated.unique_request_ref_id","channel":"dbt","level":10,"levelname":"DEBUG","thread_name":"Thread-1","process":58947,"extra":{"node_alias":"unique_request_ref_id","node_schema":"None","node_database":"dwh","node_path":"models/marts/pmt_analytics/request/request.yml","node_name":"unique_request_ref_id","resource_type":"test","depends_on":["model.pmt_curated.request"],"node_index":28,"node_materialized":"view","node_started_at":"2020-04-18T07:54:45.832192","unique_id":"test.pmt_curated.unique_request_ref_id","run_state":"running"}}

I am wondering if I should leave it like this or I should add the depends_on in the test report line as well. If that's the case, could you maybe point me at which level you see this should be resolved? I was thinking somewhere in the call_runner function
https://github.com/fishtown-analytics/dbt/blob/c05b45b241b6394222675e63b98dfb19d4633023/core/dbt/task/runnable.py#L155
or in the TestRunner
https://github.com/fishtown-analytics/dbt/blob/c05b45b241b6394222675e63b98dfb19d4633023/core/dbt/node_runners.py#L544
but I can definitely use some guidance here.

@drewbanin
Copy link
Contributor

I think we'd probably want to make a new log context manager (similar to this one called DependsOn (or something like it). This handler would need to implement a mapping_keys method which contains something like:

('depends_on_nodes', 'depends_on'),

So, pretty similar to what you've implemented in #2341

Once the DependsOn handler exists, you'd want to use it as a context manager in the GraphRunnableTask wrapping the end-of-test debug log line.

I think we could alternatively just use extended_metadata as an additional context manager here in conjunction with the change you've already made in #2341.

@beckjake do you have any further thoughts / preferences / ideas on this one?

@beckjake
Copy link
Contributor

I don't really have strong feelings about it. That ModelMetadata is only used once, at the beginning of running a model (rather than the end), which seems acceptable to me. I was pretty satisfied with #2341. Is the concern around logging this information when we're not running a test? Because that doesn't seem so awful to me.

@drewbanin
Copy link
Contributor

I think the goal here is to get depends_on in the same log line as the test failure. Right now, the node metadata is going to show up in the log line for the node start, but not the log line for the node end. I imagine pushing the depends_on list in the same log line as the ultimate status info is the desired goal here

@dcereijodo
Copy link
Contributor Author

If it helps with the discussion around the scope, logging at the beginning of running a model already solves my use case. If logging at the test report line as well is preferable, I have no problem to add the change there too. Using the extended_metadata sounds good to me in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants