-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for custom evaluation metrics #5279
Conversation
WalkthroughThis pull request introduces a comprehensive enhancement to the evaluation metrics framework across multiple files in the FiftyOne project. The changes primarily focus on adding support for custom metrics in various evaluation contexts, including classification, detection, segmentation, and regression. A new base infrastructure is established to handle custom metrics, with modifications to configuration, evaluation, and results classes to accommodate this functionality. The changes provide more flexibility for users to define and compute their own metrics during model evaluation. Changes
Sequence DiagramsequenceDiagram
participant User
participant EvaluationConfig
participant EvaluationMethod
participant Results
User->>EvaluationConfig: Specify custom metrics
EvaluationConfig->>EvaluationMethod: Pass custom metrics
EvaluationMethod->>EvaluationMethod: Compute standard metrics
EvaluationMethod->>EvaluationMethod: Compute custom metrics
EvaluationMethod->>Results: Generate results with custom metrics
Results-->>User: Return evaluation results
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
c4883ea
to
34030d4
Compare
34030d4
to
6513247
Compare
ea26a26
to
4fa72a0
Compare
8a3a97b
to
9047d07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
fiftyone/operators/evaluation_metric.py (1)
43-43
: Fix the typo in the docstring.There's a typo in the word "evaluation" in the docstring of the
get_parameters
method.Apply this diff to correct the typo:
- """Defines any necessary properties to collect this evaluaiton metric's + """Defines any necessary properties to collect this evaluation metric'sfiftyone/utils/eval/regression.py (1)
587-587
: Simplify the usage ofdict.get
.The default value
None
inkwargs.get("custom_metrics", None)
is unnecessary, asNone
is the default return value ofdict.get
when the key is missing. You can simplify the code by removing it.Apply this diff:
def _parse_config(pred_field, gt_field, method, **kwargs): if method is None: method = fo.evaluation_config.default_regression_backend - custom_metrics = kwargs.get("custom_metrics", None) + custom_metrics = kwargs.get("custom_metrics") if etau.is_str(custom_metrics): kwargs["custom_metrics"] = [custom_metrics]🧰 Tools
🪛 Ruff (0.8.2)
587-587: Use
kwargs.get("custom_metrics")
instead ofkwargs.get("custom_metrics", None)
Replace
kwargs.get("custom_metrics", None)
withkwargs.get("custom_metrics")
(SIM910)
fiftyone/utils/eval/base.py (2)
66-68
: Accesslower_is_better
directly from the config.Instead of accessing
lower_is_better
throughoperator.config.kwargs.get(...)
, you should access it directly as an attribute ofoperator.config
since it's already defined in theEvaluationMetricConfig
.Apply this diff:
results.custom_metrics[operator.uri] = { "value": value, "label": operator.config.label, - "lower_is_better": operator.config.kwargs.get( - "lower_is_better", True - ), + "lower_is_better": operator.config.lower_is_better, }
80-80
: Simplify iteration over dictionary keys.When iterating over a dictionary, you can iterate directly without calling
.keys()
, as this is the default behavior in Python. This makes the code more concise and Pythonic.Apply these diffs:
At line 80:
def get_custom_metric_fields(self, samples, eval_key): fields = [] - for metric in self._get_custom_metrics().keys(): + for metric in self._get_custom_metrics():At line 96:
def rename_custom_metrics(self, samples, eval_key, new_eval_key): - for metric in self._get_custom_metrics().keys(): + for metric in self._get_custom_metrics():At line 108:
def cleanup_custom_metrics(self, samples, eval_key): - for metric in self._get_custom_metrics().keys(): + for metric in self._get_custom_metrics():Also applies to: 96-96, 108-108
🧰 Tools
🪛 Ruff (0.8.2)
80-80: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
fiftyone/utils/eval/detection.py (1)
641-641
: Simplify usage ofkwargs.get()
You can remove the default value
None
sincekwargs.get("custom_metrics")
returnsNone
by default.Apply this diff to simplify the code:
- custom_metrics = kwargs.get("custom_metrics", None) + custom_metrics = kwargs.get("custom_metrics")🧰 Tools
🪛 Ruff (0.8.2)
641-641: Use
kwargs.get("custom_metrics")
instead ofkwargs.get("custom_metrics", None)
Replace
kwargs.get("custom_metrics", None)
withkwargs.get("custom_metrics")
(SIM910)
fiftyone/utils/eval/classification.py (1)
926-926
: Simplify usage ofkwargs.get()
You can remove the default value
None
sincekwargs.get("custom_metrics")
returnsNone
by default.Apply this diff to simplify the code:
- custom_metrics = kwargs.get("custom_metrics", None) + custom_metrics = kwargs.get("custom_metrics")🧰 Tools
🪛 Ruff (0.8.2)
926-926: Use
kwargs.get("custom_metrics")
instead ofkwargs.get("custom_metrics", None)
Replace
kwargs.get("custom_metrics", None)
withkwargs.get("custom_metrics")
(SIM910)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
1-1
: Use optional chaining instead of lodashget
to simplify codeUsing optional chaining and nullish coalescing operators can replace
_.get
, reducing dependencies and simplifying the code.Apply this diff to refactor the code:
- import _ from "lodash"; + // lodash import is no longer needed ... - const customMetrics = _.get( - evaluationMetrics, - "custom_metrics", - {} - ) as CustomMetrics; + const customMetrics = (evaluationMetrics.custom_metrics ?? {}) as CustomMetrics; ... - const compareValue = _.get( - comparisonMetrics, - `custom_metrics.${operatorUri}.value`, - null - ); + const compareValue = comparisonMetrics?.custom_metrics?.[operatorUri]?.value ?? null;Also applies to: 1785-1795
fiftyone/operators/__init__.py (1)
15-15
: Add imported classes to__all__
to resolve unused import warnings.The imported classes
EvaluationMetricConfig
andEvaluationMetric
should be added to__all__
since they are intended to be part of the public API.-from .evaluation_metric import EvaluationMetricConfig, EvaluationMetric +from .evaluation_metric import EvaluationMetricConfig, EvaluationMetric + +__all__ = [k for k, v in globals().items() if not k.startswith("_")] + [ + "EvaluationMetricConfig", + "EvaluationMetric", +]🧰 Tools
🪛 Ruff (0.8.2)
15-15:
.evaluation_metric.EvaluationMetricConfig
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
15-15:
.evaluation_metric.EvaluationMetric
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
fiftyone/utils/eval/segmentation.py (1)
564-567
: Optimize the kwargs.get() call.The
None
default value inkwargs.get("custom_metrics", None)
is redundant as it's the default return value of dict.get().- custom_metrics = kwargs.get("custom_metrics", None) + custom_metrics = kwargs.get("custom_metrics")🧰 Tools
🪛 Ruff (0.8.2)
564-564: Use
kwargs.get("custom_metrics")
instead ofkwargs.get("custom_metrics", None)
Replace
kwargs.get("custom_metrics", None)
withkwargs.get("custom_metrics")
(SIM910)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx
(3 hunks)fiftyone/operators/__init__.py
(1 hunks)fiftyone/operators/evaluation_metric.py
(1 hunks)fiftyone/operators/registry.py
(4 hunks)fiftyone/utils/eval/activitynet.py
(5 hunks)fiftyone/utils/eval/base.py
(6 hunks)fiftyone/utils/eval/classification.py
(20 hunks)fiftyone/utils/eval/coco.py
(5 hunks)fiftyone/utils/eval/detection.py
(16 hunks)fiftyone/utils/eval/openimages.py
(5 hunks)fiftyone/utils/eval/regression.py
(18 hunks)fiftyone/utils/eval/segmentation.py
(17 hunks)plugins/panels/model_evaluation/__init__.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
Pattern **/*.{ts,tsx}
: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.
🪛 Ruff (0.8.2)
fiftyone/operators/__init__.py
15-15: .evaluation_metric.EvaluationMetricConfig
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
15-15: .evaluation_metric.EvaluationMetric
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
fiftyone/utils/eval/detection.py
641-641: Use kwargs.get("custom_metrics")
instead of kwargs.get("custom_metrics", None)
Replace kwargs.get("custom_metrics", None)
with kwargs.get("custom_metrics")
(SIM910)
fiftyone/utils/eval/segmentation.py
564-564: Use kwargs.get("custom_metrics")
instead of kwargs.get("custom_metrics", None)
Replace kwargs.get("custom_metrics", None)
with kwargs.get("custom_metrics")
(SIM910)
fiftyone/utils/eval/base.py
80-80: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
96-96: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
108-108: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
fiftyone/utils/eval/classification.py
926-926: Use kwargs.get("custom_metrics")
instead of kwargs.get("custom_metrics", None)
Replace kwargs.get("custom_metrics", None)
with kwargs.get("custom_metrics")
(SIM910)
fiftyone/utils/eval/regression.py
587-587: Use kwargs.get("custom_metrics")
instead of kwargs.get("custom_metrics", None)
Replace kwargs.get("custom_metrics", None)
with kwargs.get("custom_metrics")
(SIM910)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (11)
fiftyone/operators/registry.py (1)
9-10
: LGTM! The changes enhance operator filtering capabilities.The implementation correctly handles both string and class type parameters, with proper type conversion and error handling.
Also applies to: 46-47, 91-92, 111-114
plugins/panels/model_evaluation/__init__.py (1)
149-154
: LGTM! The custom metrics integration is well-implemented.The implementation includes proper error handling and seamlessly integrates with the existing evaluation data structure.
Also applies to: 373-373
fiftyone/utils/eval/activitynet.py (1)
43-44
: LGTM! Custom metrics support is properly integrated.The implementation includes clear documentation and correctly propagates custom metrics to the parent class.
Also applies to: 55-55, 59-64
fiftyone/utils/eval/segmentation.py (4)
41-42
: LGTM!The
custom_metrics
parameter is well-placed and properly documented.
145-146
: LGTM!The documentation for
custom_metrics
parameter is clear and follows the established format.
149-161
: LGTM!The
custom_metrics
parameter is properly integrated into the constructor and correctly initialized through the superclass.
514-520
: LGTM!The addition of
custom_metrics
to the attributes list is appropriate and maintains consistency.fiftyone/utils/eval/coco.py (2)
70-71
: LGTM!The documentation for
custom_metrics
parameter maintains consistency with other evaluation configs.
92-97
: LGTM!The
custom_metrics
parameter is correctly passed to the superclass constructor.fiftyone/utils/eval/openimages.py (2)
72-73
: LGTM!The documentation for
custom_metrics
parameter maintains consistency with other evaluation configs.
97-102
: LGTM!The
custom_metrics
parameter is correctly passed to the superclass constructor.
e98cea0
to
c656853
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
fiftyone/operators/evaluation_metric.py (3)
25-37
: Add type hints to improve code maintainability.Consider adding type hints to the configuration parameters for better code maintainability and IDE support.
def __init__( self, - name, - label=None, - description=None, - eval_types=None, - lower_is_better=True, + name: str, + label: str | None = None, + description: str | None = None, + eval_types: list[str] | None = None, + lower_is_better: bool = True, **kwargs, ):
43-43
: Fix typo in docstring.There's a typo in "evaluaiton" in the docstring.
- """Defines any necessary properties to collect this evaluaiton metric's + """Defines any necessary properties to collect this evaluation metric's
62-73
: Consider using@abstractmethod
for required implementation.Since
compute
must be implemented by subclasses, consider using the@abstractmethod
decorator to enforce this at the type level.+ from abc import abstractmethod + + @abstractmethod def compute(self, samples, results, **kwargs):
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx
(3 hunks)fiftyone/operators/__init__.py
(1 hunks)fiftyone/operators/evaluation_metric.py
(1 hunks)fiftyone/operators/registry.py
(4 hunks)fiftyone/utils/eval/base.py
(6 hunks)fiftyone/utils/eval/regression.py
(18 hunks)plugins/panels/model_evaluation/__init__.py
(2 hunks)tests/unittests/operators/registry_tests.py
(2 hunks)
👮 Files not reviewed due to content moderation or server errors (4)
- fiftyone/utils/eval/regression.py
- plugins/panels/model_evaluation/init.py
- fiftyone/utils/eval/base.py
- app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx
🧰 Additional context used
📓 Path-based instructions (1)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
Pattern **/*.{ts,tsx}
: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.
🪛 Ruff (0.8.2)
fiftyone/utils/eval/regression.py
587-587: Use kwargs.get("custom_metrics")
instead of kwargs.get("custom_metrics", None)
Replace kwargs.get("custom_metrics", None)
with kwargs.get("custom_metrics")
(SIM910)
fiftyone/utils/eval/base.py
80-80: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
96-96: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
108-108: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
fiftyone/operators/__init__.py
15-15: .evaluation_metric.EvaluationMetricConfig
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
15-15: .evaluation_metric.EvaluationMetric
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (5)
tests/unittests/operators/registry_tests.py (2)
12-23
: LGTM! Improved readability with better indentation.The restructured mock setup is now more readable and maintainable.
55-56
: LGTM! Consistent string quotes.The change to double quotes maintains consistency with the project's style.
fiftyone/operators/evaluation_metric.py (1)
89-109
: LGTM! Robust field renaming implementation.The
rename
method handles both sample and frame fields correctly, with proper error handling.fiftyone/operators/registry.py (2)
111-114
: LGTM! Robust type handling implementation.The implementation properly handles both string-based type names and actual class types, making the API more flexible.
46-47
: LGTM! Improved documentation clarity.The docstring updates clearly explain the enhanced type filtering capabilities.
Also applies to: 91-92
@@ -146,6 +146,12 @@ def get_mar(self, results): | |||
except Exception as e: | |||
return None | |||
|
|||
def get_custom_metrics(self, results): | |||
try: | |||
return results.custom_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still missing metric_property that @ritch suggested: #5389 (comment)
results.custom_metrics should return a list of CustomMetric objects. Working on adding those on top of these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see value in further modifications to this PR. results.custom_metrics
is currently JSON in roughly same shape as what @ritch suggested in #5389 (comment), so it achieves the same thing, just without the class interface. That seems fine to me as results.custom_metrics
is not intended for direct usage by SDK users. It's just an internal data structure to support the Model Evaluation panel.
Note that I refactored so that custom metrics are included in results.metrics()
and results.print_metrics()
. So I would argue those are the only things that SDK users should need to care about.
Also note that introducing a class might be complex as results
need to be serializable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring to @ritch for the app side changes.
Looks good to me on the SDK side. I am okay with merging this without the modifications proposed for the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see value in further modifications to this PR. results.custom_metrics is currently JSON in roughly same shape as what @ritch suggested in #5389 (comment), so it achieves the same thing, just without the class interface.
If devs end up using this feature we can add in the ability to define a fiftyone.operator.types.Property
and use that to render the value. -1 for an adhoc way to define labels, tooltips, type info, min/max values, max number of digits to display etc. That should all be defined via an operator property. That type system already supports a lot of these and is where we should continue to implement this sort of thing (metadata for UI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Ran evaluate_detections
(app + SDK) and evaluate_regression
(SDK) with and without custom metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
fiftyone/utils/eval/base.py (3)
54-81
: Consider moving the initialization of custom_metrics outside the loop.While the implementation is solid, initializing
results.custom_metrics
inside the loop for each successful computation is inefficient. Consider initializing it once before the loop if any custom metrics are configured.def compute_custom_metrics(self, samples, eval_key, results): + if self._get_custom_metrics(): + results.custom_metrics = {} for metric, kwargs in self._get_custom_metrics().items(): try: operator = foo.get_operator(metric) value = operator.compute(samples, results, **kwargs or {}) if value is not None: - if results.custom_metrics is None: - results.custom_metrics = {} key = operator.config.aggregate_key
85-85
: Optimize dictionary key lookup operations.Replace
.keys()
with direct dictionary membership testing for better performance.-for metric in self._get_custom_metrics().keys(): +for metric in self._get_custom_metrics():Also applies to: 101-101, 113-113
🧰 Tools
🪛 Ruff (0.8.2)
85-85: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
179-180
: Rename unused loop variable for better code clarity.The loop variable
uri
is not used within the loop body.- for uri, metric in self.custom_metrics.items(): + for _, metric in self.custom_metrics.items():🧰 Tools
🪛 Ruff (0.8.2)
179-179: Loop control variable
uri
not used within loop bodyRename unused
uri
to_uri
(B007)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx
(3 hunks)fiftyone/operators/evaluation_metric.py
(1 hunks)fiftyone/utils/eval/base.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/operators/evaluation_metric.py
🧰 Additional context used
📓 Path-based instructions (1)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
Pattern **/*.{ts,tsx}
: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.
🪛 Ruff (0.8.2)
fiftyone/utils/eval/base.py
85-85: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
101-101: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
113-113: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
179-179: Loop control variable uri
not used within loop body
Rename unused uri
to _uri
(B007)
🔇 Additional comments (5)
fiftyone/utils/eval/base.py (3)
45-53
: LGTM! Clean implementation of custom metrics retrieval.The method handles both list and dict formats elegantly, with safe access to config attributes.
82-99
: LGTM! Well-structured field collection with proper error handling.The method efficiently collects fields from all operators while gracefully handling any errors that occur during field retrieval.
🧰 Tools
🪛 Ruff (0.8.2)
85-85: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
100-111
: LGTM! Consistent error handling in rename operation.The implementation follows the established pattern of safe operator access and error handling.
🧰 Tools
🪛 Ruff (0.8.2)
101-101: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (2)
1762-1782
: LGTM! Well-structured TypeScript type definitions.The types are clearly defined with appropriate properties and good use of TypeScript features.
1784-1811
: LGTM! Robust implementation of custom metrics formatting.The function safely handles null values, uses proper type annotations, and follows best practices for object property access using lodash.
See voxel51/fiftyone-plugins#190
Summary by CodeRabbit
New Features
Improvements
Technical Enhancements