Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-376] add ability to calculate metrics based off of other metrics #4884

Closed
mary-hallewell opened this issue Mar 16, 2022 · 5 comments · Fixed by #5027
Closed

[CT-376] add ability to calculate metrics based off of other metrics #4884

mary-hallewell opened this issue Mar 16, 2022 · 5 comments · Fixed by #5027
Labels
enhancement New feature or request
Milestone

Comments

@mary-hallewell
Copy link

In order to calculate any rate-type metrics there has to be a mechanism supported to calculate metrics based off of other metrics.

This could be implemented by expanding the inputs allowed in model. Instead of just referencing a materialized model you could allow referencing a metric dynamically and passing the time aggregation info from your current metric as a variable argument.

You could also enable this functionality by natively supporting more type aggregations such as division and then allowing dynamic metric arguments to be passed in that type field. I believe that can mostly be accomplished by overwriting the type functions but it still leaves a gap where you can't pass other metrics to be used in that type agg.

@joellabes
Copy link
Contributor

Thanks Mary! Moving this over to the dbt Core repo - I wasn't clear in Slack, my bad 😅

@joellabes joellabes transferred this issue from dbt-labs/dbt_metrics Mar 16, 2022
@github-actions github-actions bot changed the title add ability to calculate metrics based off of other metrics [CT-376] add ability to calculate metrics based off of other metrics Mar 16, 2022
@joellabes joellabes added enhancement New feature or request triage metrics labels Mar 16, 2022
@JC-Lightfold
Copy link

Absolutely critical that metrics can input to other metrics. If that's achieved by intermediary models initial, okay, but ultimately metrics need to be able to work like models - they can and will chain into each other. Perfectly fine for them not to be recursive though.

@jtcohen6 jtcohen6 self-assigned this Mar 17, 2022
@jtcohen6
Copy link
Contributor

Agreed, this feels super important! There are two related questions on my mind:

  • Metrics that power other metrics. Primary use case is a ratio. My main questions are all around validation: Does the composite child metric inherit all properties of the parent metrics? What if those properties disagree (e.g. different time scales)? Should it be allowed to override/reconcile them? What about metric types that should not be allowed to power another metric type, e.g. a distinct count of a distinct count?
  • Metrics that power models. I'm already hearing about folks who are creating "summary" models that union together a number of metrics. The problems: metrics cannot be DAG parents, and the metrics package does some creative things with lineage currently to work in all runtime environments. We need to figure out how to preserve lineage when a model depends on a metric (and thereby another model): [CT-344] New context + RefResolver for compile_sql + execute_sql methods #4851 (comment)

We're continuing to collect feedback on the initial implementation of metrics, with plans to batch up larger-scale enhancements (like this one), along with more tactical improvements, for a minor version later this year. In the meantime, let's keep the thread going!

@jtcohen6 jtcohen6 removed the triage label Mar 23, 2022
@jtcohen6 jtcohen6 removed their assignment Mar 23, 2022
@mary-hallewell
Copy link
Author

Thanks for the additional considerations @jtcohen6 👋
Some responses below:

"Metrics that power other metrics"
I agree that primary use case is ratio but there could be others (maybe some calculation where you multiply a value by a metric is needed where some metric is a coefficient). The most flexible solution here would be to allow sql/macros to be called in metric calculations with arguments being passed in.

Re validation I'm not sure I understand the issue here. Child metrics would have to call the parent metric and in that call pass any required properties.
For example, If I want to calculate "% active users" and need activated count/user count in the model where I call the "% active users" metric I would need to specify which args for "activated" and "user" metrics so I would specify the time grain there. I wouldn't be calling already specified metrics, just the framework of that metric definition.

Re metric types that should not be sourced in downstream metrics calculations, it seems like this could be tested eg:

if (source.args.metric != null): # testing that my metric does source other metrics
if(sourc.args.metric.type == local.metric.type): # insert logic here-currently just saying both can't have same agg type
return f"error the metric you've used to calculate your current metric has the same type of: {source.args.metric.type}
which would result in a {avg of an avg || count of a count || etc} and is not supported. Please revisit the logic and
recompile" # return error message

"Metrics that power models"
Not sure I understand the issue here where you say metrics can't be DAG parents/how that would mess with lineage.

@jtcohen6
Copy link
Contributor

Thanks for the responses @mary-hallewell!

Re validation I'm not sure I understand the issue here. Child metrics would have to call the parent metric and in that call pass any required properties... I wouldn't be calling already specified metrics, just the framework of that metric definition.

I was thinking in terms of, a "child" metric that defines dimensions or time grains that are not supported by one of its "parent" metrics. Should that be allowed? Should an error be raised at parse time? Imagine: I'm building a metric off someone else's metric... maybe even a metric defined in a different package... should the person/team who defined the parent metric be able to "govern" my use of their metric?

It feels important to get these validation

Not sure I understand the issue here where you say metrics can't be DAG parents/how that would mess with lineage.

I'm taking the proposal in this issue to be: Metrics should be reffable (DAG parents) for other metrics.

Should we also enable metrics to be reffable within (DAG parents for) models, too? I'm already seeing some use cases that call for this (a model to union together multiple metrics, a snapshot to capture historical metric values, ...). I'm starting to think this would make sense, and I'm sensing that @dave-connors-3 and @matt-winkler might agree :)

Conceptually, the lineage is simple: the model depends on the metric that depends on the model. From a technical perspective, we'll need to figure out how to capture that lineage at parse time (sorta like how ephemeral models work? maybe?)

@jtcohen6 jtcohen6 added this to the v1.2 milestone Apr 12, 2022
emmyoop added a commit that referenced this issue Jul 6, 2022
* wip

* More support for ratio metrics

* Formatting and linting

* Fix unit tests

* Support disabling metrics

* mypy

* address all TODOs

* make pypy happy

* wip

* checkpoint

* refactor, remove ratio_terms

* flake8 and unit tests

* remove debugger

* quickfix for filters

* Experiment with functional testing for 'expression' metrics

* reformatting slightly

* make file and mypy fix

* remove config from metrics - wip

* add metrics back to context

* adding test changes

* fixing test metrics

* revert name audit

* pre-commit fixes

* add changelog

* Bumping manifest version to v6 (#5430)

* Bumping manifest version to v6

* Adding manifest file for tests

* Reverting unneeded changes

* Updating v6

* Updating test to add metrics field

* Adding changelog

* add v5 to backwards compatibility

* Clean up test_previous_version_state, update for v6 (#5440)

* Update test_previous_version_state for v6. Cleanup

* Regenerate, rm breakpoint

* Code checks

* Add assertion that will fail when we bump manifest version

* update tests to automatically tests all previous versions

Co-authored-by: Emily Rockman <[email protected]>

Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Callum McCann <[email protected]>
Co-authored-by: Emily Rockman <[email protected]>
Co-authored-by: leahwicz <[email protected]>
agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this issue Sep 16, 2022
* wip

* More support for ratio metrics

* Formatting and linting

* Fix unit tests

* Support disabling metrics

* mypy

* address all TODOs

* make pypy happy

* wip

* checkpoint

* refactor, remove ratio_terms

* flake8 and unit tests

* remove debugger

* quickfix for filters

* Experiment with functional testing for 'expression' metrics

* reformatting slightly

* make file and mypy fix

* remove config from metrics - wip

* add metrics back to context

* adding test changes

* fixing test metrics

* revert name audit

* pre-commit fixes

* add changelog

* Bumping manifest version to v6 (dbt-labs#5430)

* Bumping manifest version to v6

* Adding manifest file for tests

* Reverting unneeded changes

* Updating v6

* Updating test to add metrics field

* Adding changelog

* add v5 to backwards compatibility

* Clean up test_previous_version_state, update for v6 (dbt-labs#5440)

* Update test_previous_version_state for v6. Cleanup

* Regenerate, rm breakpoint

* Code checks

* Add assertion that will fail when we bump manifest version

* update tests to automatically tests all previous versions

Co-authored-by: Emily Rockman <[email protected]>

Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Callum McCann <[email protected]>
Co-authored-by: Emily Rockman <[email protected]>
Co-authored-by: leahwicz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants