-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run results reporting is a mess #2493
Comments
@beckjake is this intended to apply a schema to the current run results format, or is this a breaking change? as a consumer of this data, testing for the existence of a key to detect the result type is a bit painful -- it requires lots of conditionals in the consumer to do the right thing with the data. I'd prefer to standardize the result schema a bit more on the core fields (success/error/warn, status message, etc) and extend it for types that have more data. IMO this would also be easier to extend with new fields in the future. what do you think about something like this?
|
@cmcarthur This is a breaking change. We already have a schema for the results output, it's just not a useful one and consumers are confused by it. The problem here, which I don't believe the above structure solves, is that dbt does a poor job of differentiating between "There was an error in execution" (an exception raised) and "There was an error-severity test failure". Maybe we could add |
I updated and added some examples for a proposed spec. One thing we should keep in mind (and which we should maybe just fix/change permanently?) is that currently "docs generate" and "source freshness" treat their output artifacts as if they were run results internally. I'm not sure that's the best path going forward, they're really different things. |
@beckjake these look totally reasonable to me. the exception is: why do we need two different versions of source freshness results? "a single entry in "results" for a snapshot-freshness that resulted in a warning (API)" wherever possible I'd push to unify these specs. I know there's a practical reason for the top-level structure of sources.json, but it feels like individual nodes could be unified. |
@cmcarthur I think that's fair. I would like to avoid writing node information to freshness results, would it be reasonable for the RPC server to basically return the |
@beckjake like this?
that sounds perfect. it does make me wonder how we'll map this back to nodes though... |
I think we can just add a {
"max_loaded_at": "2019-01-02T01:02:03+00:00",
"snapshotted_at": "2020-07-13T20:09:14.857209+00:00",
"max_loaded_at_time_ago_in_s": 48280031.857209,
"status": "warn",
"criteria": {
"warn_after": {"count": 100, "period": "day"},
"error_after": {"count": 1000, "period": "day"},
},
"unique_id": "source.<project_name>.<source_name>.<table_name>"
} |
Quick note about field naming - the
|
Additional results dict to support adapter-specific info like in #2747 |
Describe the feature
Right now, run results are an ugly overloaded mess:
get_status
method on SQL adapters, which can basically be anything the db/underlying db api want'ERROR'
fail
is true butstatus
is still the number of failed rowsRuns set
error
andstatus
(but they could setfail
andwarn
if they wanted, those attributes exist!)Tests set
error
,fail
,warn
, andstatus
.status
is defined asUnion[None, bool, int, str]
.There's no way we need all of these for everyone! The problem is complicated a bit by the RPC server which has its own set of responses, etc, but I believe this is solvable anyway.
I propose picking one unified key that supports a union of objects. If it's hard to give tests their own from a type system perspective, we can still make
fail
/warn
/rows_failed
available inrun
output, and have them be None in that case.Here's a revised proposal, with some sample type specifications in python and sample outputs:
JSON examples
It might be nice to include a node unique ID explicitly alongside everywhere this spec accepts node, with an eye to deprecating the "node" field. We can do that separately, as it'll require us writing out the manifest either a second time at run end (my preference) or moving the manifest-writing to the end of the run.
This will require some coordination with cloud, I assume it must use this value for displaying run results.
I don't think this issue is imminently urgent, but I would consider it a blocker for 1.0.
Describe alternatives you've considered
We can always not do this, it's just a cosmetic/display kind of issue... the information is there.
Who will this benefit?
developers, consumers of the output json
The text was updated successfully, but these errors were encountered: