-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for custom evaluation metrics #190
Conversation
plugins/evaluation/__init__.py
Outdated
@@ -212,6 +230,56 @@ def _get_evaluation_type(view, pred_field): | |||
return label_type, eval_type, methods | |||
|
|||
|
|||
def _add_custom_metrics(ctx, inputs, eval_type, method): | |||
supported_metrics = [] | |||
for operator in foo.list_operators(type="operator"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really should just be baked in... having the contract defined here is not maintainable. Something like
foo.list_operators_with_metadata(type="operator", eval_type=eval_type, method=method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the logic here should be colocated with the custom metric implementation, so that part is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I agree, was just getting something working to start.
More generally, we should have a think about the concept of "operator classes" like EvaluationMetric(Operator)
introduced here and how we want that to work holistically. For example, should the concept of tags
be in fiftyone.yml
rather than OperatorConfig
so that we can access information about the type of operators without having to instantiate them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep an EvaluationMetric
operator agnostic to eval_type
and method
here since certain metrics may not fall under any of the eval methods or may be applicable to more than one eval method.
To start with, the users should be free to choose whichever custom metrics they want and if the metric is not applicable to the eval method under consideration, it fails gracefully (which is already the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation here does allow for eval_type
and method
to be optional.
If the EvaluationMetric
defines them, then the model evaluation operator will only show the metric as an option in the dropdown if the user is executing the matching type/method evaluation.
If the EvaluationMetric
does not define them, then the metric will always be included as an available metric in the dropdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your design works fine for what we currently have as evaluation methods.
I was thinking about evaluation methods that don't fall under the ones currently available. For example, if I am evaluating a neural rendering model, I may want to reuse SquaredErrorMetric
which is of type "regression" and also "SSIM" which is not of type "regression".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so you want a way to declare that an evaluation metric should be available to multiple evaluation types, ie type=["regression", "neural-rendering"]
?
That's certainly possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative model would be that authors of metrics encode semantics of the metric is tags:
class MyEvaluationMetric(EvaluationMetric):
@property
def config(self):
return foo.OperatorConfig(
...
tags=["metric", "regression"],
)
and then when new evaluation protocols are added, they have a way to declare that they support any custom metrics with certain tags(s):
def evaluate_neural_rendering(...):
supported_metric_tags = ["regression", ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative model looks good. I am assuming we will have a lot more custom metrics than evaluation methods so my preference is to let the evaluation method choose the metrics it supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments for the final implementation:
- We should require
EvaluationMetric
andisinstance(operator, EvaluationMetric)
- We should define the
EvaluationMetric
specific configuration options in aEvaluationMetricConfig
class - Instead of "tags" name the param
metric_protocols
to keep it specific and simple
2bfe647
to
ade1290
Compare
ade1290
to
814c270
Compare
814c270
to
83a0ed8
Compare
Add absolute error and MAE as custom metrics
@ritch okay I've implemented what you recommended here in these commits:
|
@@ -21,6 +21,8 @@ def config(self): | |||
name="example_metric", | |||
label="Example metric", | |||
description="An example evaluation metric", | |||
aggregate_key="example", | |||
unlisted=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's unlisted
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operators that are unlisted=True
will not appear in the Operator Browser.
All metrics should be marked as unlisted because they aren't intended for direct execution like that (which requires implementing the resolve_input()
and execute()
methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritch and I discussed wanting to have a way to automatically mark all EvaluationMetric
operators as unlisted, but that's not implemented yet.
Setup
custom-metrics
branch of yourfiftyone
installfiftyone-plugins
in editable mode:Example usage
example_metric
absolute_error
andsquared_error
Image dataset
Video dataset