-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated evaluate_detections with bulk get and set values #5418
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces refactoring across several evaluation utility files in the FiftyOne library. The changes primarily focus on creating new private Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
fiftyone/utils/eval/detection.py (3)
43-46
: Simplifyif
-else
block using a ternary operatorYou can replace the
if
-else
block with a ternary operator for conciseness.Apply this diff to simplify the code:
-if processing_frames: - docs = sample.frames.values() -else: - docs = [sample] +docs = sample.frames.values() if processing_frames else [sample]🧰 Tools
🪛 Ruff (0.8.2)
43-46: Use ternary operator
docs = sample.frames.values() if processing_frames else [sample]
instead ofif
-else
-blockReplace
if
-else
-block withdocs = sample.frames.values() if processing_frames else [sample]
(SIM108)
77-77
: ReplaceUsing
Apply this diff:
- print("Retrieving values from collection") + logger.info("Retrieving values from collection")- print("Finished retrieving values from collection") + logger.info("Finished retrieving values from collection")Also applies to: 89-89
339-342
: Simplifyif
-else
block using a ternary operatorYou can replace the
if
-else
block with a ternary operator for conciseness.Apply this diff to simplify the code:
-if processing_frames: - docs = sample.frames.values() -else: - docs = [sample] +docs = sample.frames.values() if processing_frames else [sample]🧰 Tools
🪛 Ruff (0.8.2)
339-342: Use ternary operator
docs = sample.frames.values() if processing_frames else [sample]
instead ofif
-else
-blockReplace
if
-else
-block withdocs = sample.frames.values() if processing_frames else [sample]
(SIM108)
tests/unittests/evaluation_tests.py (1)
9-9
: Remove unused importpytest
The
pytest
module is imported but not used in the file. You can remove it to clean up the code.Apply this diff:
-import pytest
🧰 Tools
🪛 Ruff (0.8.2)
9-9:
pytest
imported but unusedRemove unused import:
pytest
(F401)
fiftyone/utils/eval/openimages.py (1)
158-159
: Add docstring and type hints to improve code maintainability.Consider adding a docstring and type hints to document the purpose of this placeholder method and improve code maintainability.
Apply this diff to add docstring and type hints:
- def _evaluate(self, gts, preds, eval_key=None): - raise NotImplementedError("Use `evaluate` method instead") + def _evaluate( + self, + gts: "fiftyone.core.labels.Label", + preds: "fiftyone.core.labels.Label", + eval_key: str | None = None, + ) -> None: + """Private evaluation method placeholder. + + This method is not implemented for OpenImages evaluation. Use the public + `evaluate` method instead. + + Args: + gts: ground truth labels + preds: predicted labels + eval_key: optional evaluation key + + Raises: + NotImplementedError: This method is not implemented + """ + raise NotImplementedError("Use `evaluate` method instead")fiftyone/utils/eval/activitynet.py (1)
92-101
: Add docstring and type hints to improve code maintainability.Consider adding a docstring and type hints to document the purpose of this method and improve code maintainability.
Apply this diff to add docstring and type hints:
- def _evaluate(self, gts, preds, eval_key=None): + def _evaluate( + self, + gts: "fiftyone.core.labels.Label", + preds: "fiftyone.core.labels.Label", + eval_key: str | None = None, + ) -> list[tuple]: + """Performs ActivityNet-style evaluation on the given labels. + + Args: + gts: ground truth labels + preds: predicted labels + eval_key: optional evaluation key + + Returns: + a list of matched (gt_label, pred_label, iou, pred_confidence, + gt_id, pred_id) tuples + """ if eval_key is None: # Don't save results on user's data eval_key = "eval" gts = _copy_labels(gts) preds = _copy_labels(preds) return _activitynet_evaluation_single_iou( gts, preds, eval_key, self.config )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
fiftyone/utils/eval/activitynet.py
(2 hunks)fiftyone/utils/eval/coco.py
(2 hunks)fiftyone/utils/eval/detection.py
(4 hunks)fiftyone/utils/eval/openimages.py
(1 hunks)tests/unittests/evaluation_tests.py
(8 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/utils/eval/detection.py
43-46: Use ternary operator docs = sample.frames.values() if processing_frames else [sample]
instead of if
-else
-block
Replace if
-else
-block with docs = sample.frames.values() if processing_frames else [sample]
(SIM108)
339-342: Use ternary operator docs = sample.frames.values() if processing_frames else [sample]
instead of if
-else
-block
Replace if
-else
-block with docs = sample.frames.values() if processing_frames else [sample]
(SIM108)
tests/unittests/evaluation_tests.py
9-9: pytest
imported but unused
Remove unused import: pytest
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: test / test-app
- GitHub Check: lint / eslint
- GitHub Check: build / build
🔇 Additional comments (2)
fiftyone/utils/eval/coco.py (1)
135-143
: Well-structured encapsulation of evaluation logicThe new
_evaluate
method effectively encapsulates the evaluation logic, improving code organization and readability.fiftyone/utils/eval/activitynet.py (1)
126-126
: LGTM! Clean refactoring of the evaluate method.The evaluation logic has been cleanly encapsulated in the
_evaluate
method, making the code more maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
fiftyone/utils/eval/detection.py (2)
30-32
: Add progress callback support for bulk operations.The bulk evaluation function should support progress callbacks like the standard evaluation path to maintain consistency and provide feedback for large datasets.
def _evaluate_detections_bulk( - _samples, gt_field, pred_field, processing_frames, eval_method, eval_key + _samples, gt_field, pred_field, processing_frames, eval_method, eval_key, progress=None ):
298-301
: Consider using a ternary operator for conciseness.The if-else block could be simplified using a ternary operator, though the current form is also clear and readable.
- if processing_frames: - docs = sample.frames.values() - else: - docs = [sample] + docs = sample.frames.values() if processing_frames else [sample]🧰 Tools
🪛 Ruff (0.8.2)
298-301: Use ternary operator
docs = sample.frames.values() if processing_frames else [sample]
instead ofif
-else
-blockReplace
if
-else
-block withdocs = sample.frames.values() if processing_frames else [sample]
(SIM108)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/eval/detection.py
(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/utils/eval/detection.py
298-301: Use ternary operator docs = sample.frames.values() if processing_frames else [sample]
instead of if
-else
-block
Replace if
-else
-block with docs = sample.frames.values() if processing_frames else [sample]
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: test / test-app
- GitHub Check: build / build
🔇 Additional comments (2)
fiftyone/utils/eval/detection.py (2)
106-106
: LGTM! Clean parameter addition.The new
bulk
parameter is well-documented and follows the function's existing parameter pattern.
286-286
: Review commented out field addition code.The commented out line
samples._dataset.add_field(field_name, fof.IntField)
might be important for field initialization. Please clarify if this line should be removed or uncommented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
fiftyone/utils/eval/detection.py (3)
33-64
: Enhance thread safety and error handling in the spinner decorator.The spinner implementation could be improved in the following areas:
- Add error handling in the spinner thread
- Use threading.Event for thread control
- Ensure proper cleanup in case of exceptions
def spinner_decorator(enabled=True): def decorator(func): def wrapper(*args, **kwargs): if not enabled: return func(*args, **kwargs) - spinner = itertools.cycle( - ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"] - ) - stop_spinner = False + spinner_event = threading.Event() + spinner_chars = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"] def spin(): - while not stop_spinner: - sys.stdout.write(next(spinner)) - sys.stdout.flush() - time.sleep(0.1) - sys.stdout.write("\b") + try: + while not spinner_event.is_set(): + for char in spinner_chars: + if spinner_event.is_set(): + break + sys.stdout.write(char) + sys.stdout.flush() + time.sleep(0.1) + sys.stdout.write("\b") + except Exception as e: + logger.error(f"Spinner thread error: {e}") + finally: + sys.stdout.write(" \b") + sys.stdout.flush() spinner_thread = threading.Thread(target=spin) spinner_thread.daemon = True spinner_thread.start() - result = func(*args, **kwargs) - stop_spinner = True - spinner_thread.join() - - return result + try: + return func(*args, **kwargs) + finally: + spinner_event.set() + spinner_thread.join(timeout=1.0)
66-108
: Consider batch processing for memory optimization.The bulk evaluation function loads all data into memory at once, which could be problematic for large datasets. Consider implementing batch processing to manage memory usage more efficiently.
Would you like me to help implement a batched version of the bulk evaluation function that processes data in chunks?
325-328
: Simplify code using ternary operator.Replace the if-else block with a more concise ternary expression.
- if processing_frames: - docs = sample.frames.values() - else: - docs = [sample] + docs = sample.frames.values() if processing_frames else [sample]🧰 Tools
🪛 Ruff (0.8.2)
325-328: Use ternary operator
docs = sample.frames.values() if processing_frames else [sample]
instead ofif
-else
-blockReplace
if
-else
-block withdocs = sample.frames.values() if processing_frames else [sample]
(SIM108)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/eval/detection.py
(5 hunks)
🧰 Additional context used
📓 Learnings (1)
fiftyone/utils/eval/detection.py (1)
Learnt from: minhtuev
PR: voxel51/fiftyone#5418
File: fiftyone/utils/eval/detection.py:263-265
Timestamp: 2025-01-23T05:30:24.943Z
Learning: FiftyOne automatically creates fields if they don't exist when using set_values(), making explicit field existence validation unnecessary.
🪛 Ruff (0.8.2)
fiftyone/utils/eval/detection.py
325-328: Use ternary operator docs = sample.frames.values() if processing_frames else [sample]
instead of if
-else
-block
Replace if
-else
-block with docs = sample.frames.values() if processing_frames else [sample]
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: test / test-app
- GitHub Check: lint / eslint
- GitHub Check: build / build
- GitHub Check: build
🔇 Additional comments (1)
fiftyone/utils/eval/detection.py (1)
219-220
: LGTM! Well-documented bulk parameter addition.The bulk parameter is properly documented with clear explanation of its purpose and default value.
Sorry what's this about? Also please post the results for larger/varying image datasets (no video or group yet right?) that you got. |
@swheaton : haha I got a little bit restless looking at the terminal without seeing anything since fetching from collections can take a while, so I added some animation :) we can remove it from the final PR. I ran the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
fiftyone/utils/eval/detection.py (3)
312-312
: Remove commented code.The commented line
# samples._dataset.add_field(field_name, fof.IntField)
appears to be unnecessary as per the retrieved learning that FiftyOne automatically creates fields.- # samples._dataset.add_field(field_name, fof.IntField)
324-327
: Simplify conditional assignment using ternary operator.The if-else block can be simplified using a ternary operator.
- if processing_frames: - docs = sample.frames.values() - else: - docs = [sample] + docs = sample.frames.values() if processing_frames else [sample]🧰 Tools
🪛 Ruff (0.8.2)
324-327: Use ternary operator
docs = sample.frames.values() if processing_frames else [sample]
instead ofif
-else
-blockReplace
if
-else
-block withdocs = sample.frames.values() if processing_frames else [sample]
(SIM108)
297-302
: Consolidate progress parameter handling.The progress parameter handling is redundant with the spinner decorator's enabled parameter.
- if progress is None: - progress = fo.config.show_progress_bars - - decorated_set_values = spinner_decorator(enabled=progress)( + decorated_set_values = spinner_decorator(enabled=progress if progress is not None else fo.config.show_progress_bars)( samples.set_values )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/eval/detection.py
(5 hunks)
🧰 Additional context used
📓 Learnings (1)
fiftyone/utils/eval/detection.py (1)
Learnt from: minhtuev
PR: voxel51/fiftyone#5418
File: fiftyone/utils/eval/detection.py:263-265
Timestamp: 2025-01-23T05:30:24.943Z
Learning: FiftyOne automatically creates fields if they don't exist when using set_values(), making explicit field existence validation unnecessary.
🪛 Ruff (0.8.2)
fiftyone/utils/eval/detection.py
324-327: Use ternary operator docs = sample.frames.values() if processing_frames else [sample]
instead of if
-else
-block
Replace if
-else
-block with docs = sample.frames.values() if processing_frames else [sample]
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: build / build
- GitHub Check: lint / eslint
- GitHub Check: e2e / test-e2e
- GitHub Check: test / test-app
- GitHub Check: build
🔇 Additional comments (1)
fiftyone/utils/eval/detection.py (1)
66-106
: Consider memory optimization and error handling in bulk evaluation.The function loads all samples into memory at once, which could be problematic for large datasets. Additionally, error handling could be improved.
Consider implementing batched processing:
def _evaluate_detections_bulk( _samples, gt_field, pred_field, eval_method, eval_key, progress=True ): + BATCH_SIZE = 1000 # Configurable batch size matches = [] id_field = "id" + total_samples = _samples.count() + sample_updates = {"sample_tp": {}, "sample_fp": {}, "sample_fn": {}} - logger.info( - f"Retrieving values for {id_field}, {gt_field}, {pred_field} from collection" - ) + for offset in range(0, total_samples, BATCH_SIZE): + batch = _samples.skip(offset).limit(BATCH_SIZE) + logger.info( + f"Processing batch {offset//BATCH_SIZE + 1}/{(total_samples + BATCH_SIZE - 1)//BATCH_SIZE}" + )
# Probably move this into the util library | ||
def spinner_decorator(enabled=True): | ||
def decorator(func): | ||
def wrapper(*args, **kwargs): | ||
if not enabled: | ||
return func(*args, **kwargs) | ||
|
||
spinner = itertools.cycle( | ||
["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"] | ||
) | ||
stop_spinner = False | ||
|
||
def spin(): | ||
while not stop_spinner: | ||
sys.stdout.write(next(spinner)) | ||
sys.stdout.flush() | ||
time.sleep(0.1) | ||
sys.stdout.write("\b") | ||
|
||
spinner_thread = threading.Thread(target=spin) | ||
spinner_thread.start() | ||
|
||
result = func(*args, **kwargs) | ||
stop_spinner = True | ||
spinner_thread.join() | ||
|
||
return result | ||
|
||
return wrapper | ||
|
||
return decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider error handling and cleanup in the spinner decorator.
The spinner thread should be properly cleaned up even when the decorated function raises an exception.
def wrapper(*args, **kwargs):
if not enabled:
return func(*args, **kwargs)
spinner = itertools.cycle(
["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]
)
stop_spinner = False
def spin():
while not stop_spinner:
sys.stdout.write(next(spinner))
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write("\b")
spinner_thread = threading.Thread(target=spin)
spinner_thread.start()
- result = func(*args, **kwargs)
- stop_spinner = True
- spinner_thread.join()
-
- return result
+ try:
+ result = func(*args, **kwargs)
+ return result
+ finally:
+ stop_spinner = True
+ spinner_thread.join()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Probably move this into the util library | |
def spinner_decorator(enabled=True): | |
def decorator(func): | |
def wrapper(*args, **kwargs): | |
if not enabled: | |
return func(*args, **kwargs) | |
spinner = itertools.cycle( | |
["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"] | |
) | |
stop_spinner = False | |
def spin(): | |
while not stop_spinner: | |
sys.stdout.write(next(spinner)) | |
sys.stdout.flush() | |
time.sleep(0.1) | |
sys.stdout.write("\b") | |
spinner_thread = threading.Thread(target=spin) | |
spinner_thread.start() | |
result = func(*args, **kwargs) | |
stop_spinner = True | |
spinner_thread.join() | |
return result | |
return wrapper | |
return decorator | |
# Probably move this into the util library | |
def spinner_decorator(enabled=True): | |
def decorator(func): | |
def wrapper(*args, **kwargs): | |
if not enabled: | |
return func(*args, **kwargs) | |
spinner = itertools.cycle( | |
["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"] | |
) | |
stop_spinner = False | |
def spin(): | |
while not stop_spinner: | |
sys.stdout.write(next(spinner)) | |
sys.stdout.flush() | |
time.sleep(0.1) | |
sys.stdout.write("\b") | |
spinner_thread = threading.Thread(target=spin) | |
spinner_thread.start() | |
try: | |
result = func(*args, **kwargs) | |
return result | |
finally: | |
stop_spinner = True | |
spinner_thread.join() | |
return wrapper | |
return decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fiftyone/utils/eval/detection.py (2)
66-114
: Consider implementing batched processing for large datasets.Loading the entire collection at once could lead to memory issues with large datasets. The comment on line 84 correctly suggests implementing batched processing.
I can help implement a batched version of this function that processes the data in chunks. Would you like me to open an issue to track this enhancement?
333-336
: Simplify conditional assignment using ternary operator.The code can be more concise by using a ternary operator.
Apply this diff to simplify the code:
- if processing_frames: - docs = sample.frames.values() - else: - docs = [sample] + docs = sample.frames.values() if processing_frames else [sample]🧰 Tools
🪛 Ruff (0.8.2)
333-336: Use ternary operator
docs = sample.frames.values() if processing_frames else [sample]
instead ofif
-else
-blockReplace
if
-else
-block withdocs = sample.frames.values() if processing_frames else [sample]
(SIM108)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/eval/detection.py
(5 hunks)
🧰 Additional context used
📓 Learnings (1)
fiftyone/utils/eval/detection.py (1)
Learnt from: minhtuev
PR: voxel51/fiftyone#5418
File: fiftyone/utils/eval/detection.py:263-265
Timestamp: 2025-01-23T05:30:24.943Z
Learning: FiftyOne automatically creates fields if they don't exist when using set_values(), making explicit field existence validation unnecessary.
🪛 Ruff (0.8.2)
fiftyone/utils/eval/detection.py
333-336: Use ternary operator docs = sample.frames.values() if processing_frames else [sample]
instead of if
-else
-block
Replace if
-else
-block with docs = sample.frames.values() if processing_frames else [sample]
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: build / build
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: e2e / test-e2e
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: build
🔇 Additional comments (1)
fiftyone/utils/eval/detection.py (1)
33-63
:⚠️ Potential issueAdd proper cleanup for spinner thread.
The spinner thread is not properly cleaned up when the decorated function raises an exception. This could lead to zombie threads and incorrect console output.
Apply this diff to ensure proper cleanup:
def wrapper(*args, **kwargs): if not enabled: return func(*args, **kwargs) spinner = itertools.cycle( ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"] ) stop_spinner = False def spin(): while not stop_spinner: sys.stdout.write(next(spinner)) sys.stdout.flush() time.sleep(0.1) sys.stdout.write("\b") spinner_thread = threading.Thread(target=spin) spinner_thread.start() - result = func(*args, **kwargs) - stop_spinner = True - spinner_thread.join() - - return result + try: + return func(*args, **kwargs) + finally: + stop_spinner = True + spinner_thread.join()Likely invalid or redundant comment.
What changes are proposed in this pull request?
Updated evaluate_detections with bulk get and set values
How is this patch tested? If it is not, please explain why.
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit
Release Notes
New Features
Improvements
Testing
The changes focus on improving the internal evaluation framework with more robust and flexible processing methods.