-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding min() and max() aggregations #5029
Conversation
WalkthroughThe pull request introduces new aggregation methods for computing minimum and maximum values in the FiftyOne library. Specifically, it adds Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
docs/source/user_guide/using_aggregations.rst (1)
399-418
: Consider adding performance comparison note.While the documentation is well-structured and clear, consider adding a note about the performance advantage of using
max()
overbounds()[1]
when only the maximum value is needed. This aligns with the PR objectives of highlighting efficiency benefits.Add a note after line 407:
compute the maximum of the (non-``None``) values of a field in a collection: + +.. note:: + + Using ``max()`` is more efficient than ``bounds()[1]`` when you only need + the maximum value, as it avoids computing both bounds.tests/unittests/aggregation_tests.py (1)
236-278
: Add test cases for edge scenarios intest_min
.While the test covers the basic functionality well, consider adding the following test cases to improve robustness:
- Empty dataset scenario
- Fields containing None/null values
- Fields containing NaN/Inf values
- Invalid field paths to verify error handling
Example test cases:
def test_min(self): # Existing tests... # Test empty dataset empty_dataset = fo.Dataset() self.assertIsNone(empty_dataset.min("number")) # Test None values d = fo.Dataset() s = fo.Sample(filepath="image.jpeg", number=None) d.add_sample(s) self.assertIsNone(d.min("number")) # Test NaN/Inf d = fo.Dataset() s = fo.Sample(filepath="image.jpeg", number=float('inf')) d.add_sample(s) self.assertTrue(math.isinf(d.min("number"))) # Test invalid field with self.assertRaises(ValueError): d.min("nonexistent_field")fiftyone/core/aggregations.py (1)
1597-1600
: Consider using ternary operators for more concise code.The if-else blocks for setting the
value
variable can be simplified using ternary operators:- if id_to_str: - value = {"$toString": "$" + path} - else: - value = "$" + path + value = {"$toString": "$" + path} if id_to_str else "$" + pathThis change would make the code more concise while maintaining readability.
Also applies to: 1719-1722
🧰 Tools
🪛 Ruff
1597-1600: Use ternary operator
value = {"$toString": "$" + path} if id_to_str else "$" + path
instead ofif
-else
-blockReplace
if
-else
-block withvalue = {"$toString": "$" + path} if id_to_str else "$" + path
(SIM108)
fiftyone/core/collections.py (2)
8087-8089
: Consider definingmake
as a regular function instead of a lambda.Using a regular
def
instead of assigning a lambda expression to a variable can improve readability.Would you like me to rewrite this as a regular function definition for you?
🧰 Tools
🪛 Ruff
8087-8089: Do not assign a
lambda
expression, use adef
Rewrite
make
as adef
(E731)
8171-8173
: Consider definingmake
as a regular function instead of a lambda.Using a regular
def
instead of assigning a lambda expression to a variable can improve readability.I can rewrite this as a regular function definition if you'd like. Let me know if you want me to submit a pull request with the change.
🧰 Tools
🪛 Ruff
8171-8173: Do not assign a
lambda
expression, use adef
Rewrite
make
as adef
(E731)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (7)
docs/source/user_guide/using_aggregations.rst
(1 hunks)fiftyone/__public__.py
(1 hunks)fiftyone/core/aggregations.py
(2 hunks)fiftyone/core/collections.py
(4 hunks)fiftyone/core/dataset.py
(1 hunks)tests/unittests/aggregation_tests.py
(1 hunks)tests/unittests/dataset_tests.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/unittests/dataset_tests.py
🧰 Additional context used
🪛 Ruff
fiftyone/__public__.py
24-24: .core.aggregations.Min
imported but unused
Remove unused import
(F401)
25-25: .core.aggregations.Max
imported but unused
Remove unused import
(F401)
fiftyone/core/aggregations.py
1597-1600: Use ternary operator value = {"$toString": "$" + path} if id_to_str else "$" + path
instead of if
-else
-block
Replace if
-else
-block with value = {"$toString": "$" + path} if id_to_str else "$" + path
(SIM108)
1719-1722: Use ternary operator value = {"$toString": "$" + path} if id_to_str else "$" + path
instead of if
-else
-block
Replace if
-else
-block with value = {"$toString": "$" + path} if id_to_str else "$" + path
(SIM108)
fiftyone/core/collections.py
8087-8089: Do not assign a lambda
expression, use a def
Rewrite make
as a def
(E731)
8171-8173: Do not assign a lambda
expression, use a def
Rewrite make
as a def
(E731)
🔇 Additional comments (6)
fiftyone/__public__.py (1)
24-25
: LGTM! The new aggregation classes are properly exposed.
The addition of Min
and Max
to the public interface aligns with the PR objectives and follows the existing pattern of exposing aggregation classes.
Let's verify the implementation of these classes:
✅ Verification successful
The Min and Max aggregations are properly implemented and integrated
The verification confirms:
- Both
Min
andMax
classes are fully implemented incore/aggregations.py
with proper inheritance fromAggregation
- They include comprehensive documentation, examples, and handle edge cases like
None
values - The classes are integrated into the collections interface with corresponding
min()
andmax()
methods - The implementation follows MongoDB aggregation patterns using
$min
and$max
operators
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the implementation of Min and Max aggregation classes
# Expected: Find class definitions with proper inheritance and required methods
# Test 1: Verify class definitions
ast-grep --pattern 'class Min(Aggregation):
$$$
'
ast-grep --pattern 'class Max(Aggregation):
$$$
'
# Test 2: Verify integration with SampleCollection
rg -A 5 'def (min|max)\(' core/collections.py
Length of output: 15793
Script:
#!/bin/bash
# Let's verify the integration with SampleCollection by searching in the correct path
fd -t f collections.py --exec rg -A 5 'def (min|max)\(' {}
# Also verify if there are any tests for these aggregations
fd -t f test_aggregations.py --exec rg -A 5 '(test_min|test_max)' {}
Length of output: 634
🧰 Tools
🪛 Ruff
24-24: .core.aggregations.Min
imported but unused
Remove unused import
(F401)
25-25: .core.aggregations.Max
imported but unused
Remove unused import
(F401)
docs/source/user_guide/using_aggregations.rst (1)
379-398
: Documentation for min() aggregation looks good!
The section is well-structured, follows the established documentation pattern, and provides a clear, practical example that aligns with the existing bounds()
method documentation.
fiftyone/core/aggregations.py (2)
1485-1604
: LGTM! Well-implemented Min aggregation class.
The implementation is clean and follows the established patterns:
- Proper inheritance from Aggregation base class
- Comprehensive documentation with clear examples
- Correct handling of field type conversion and None values
- Efficient MongoDB aggregation pipeline generation
🧰 Tools
🪛 Ruff
1597-1600: Use ternary operator value = {"$toString": "$" + path} if id_to_str else "$" + path
instead of if
-else
-block
Replace if
-else
-block with value = {"$toString": "$" + path} if id_to_str else "$" + path
(SIM108)
1607-1726
: LGTM! Well-implemented Max aggregation class.
The implementation is clean and follows the established patterns:
- Proper inheritance from Aggregation base class
- Comprehensive documentation with clear examples
- Correct handling of field type conversion and None values
- Efficient MongoDB aggregation pipeline generation
🧰 Tools
🪛 Ruff
1719-1722: Use ternary operator value = {"$toString": "$" + path} if id_to_str else "$" + path
instead of if
-else
-block
Replace if
-else
-block with value = {"$toString": "$" + path} if id_to_str else "$" + path
(SIM108)
fiftyone/core/dataset.py (2)
2086-2087
: LGTM! Verify the new method implementation.
The change to use _max()
for getting the maximum value of "frames.last_modified_at" looks good and improves code clarity.
#!/bin/bash
# Verify that the _max() method exists and is properly implemented
ast-grep --pattern $'class Dataset {
$$$
_max($field) {
$$$
}
$$$
}'
2094-2094
: LGTM! Verify consistent usage.
The change to use _max()
for getting the maximum value of "last_modified_at" is consistent with the previous change.
✅ Verification successful
The change to use _max()
is consistent and verified
Based on the extensive search results, the usage of _max()
for getting the maximum value of "last_modified_at" is consistent throughout the codebase. The method is used in multiple places for the same purpose, particularly in sync_last_modified_at()
functionality. No instances of the old _get_last_modified_at()
method were found, indicating a complete migration to the new approach.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify consistent usage of _max() vs _get_last_modified_at()
rg -A 2 "_get_last_modified_at"
Length of output: 362
Script:
#!/bin/bash
# Search for _max method usage
rg -A 2 "_max\("
# Search for last_modified_at field usage
rg -A 2 "last_modified_at"
# Search for method definitions using AST
ast-grep --pattern $'def _max($_) {
$$$
}'
Length of output: 124009
Script:
#!/bin/bash
# Search for _max method implementation
ast-grep --pattern 'def _max(self, path):
$$$
}'
# Search for _get_last_modified_at method
ast-grep --pattern 'def _get_last_modified_at(self, $_):
$$$
}'
Length of output: 120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just recommend your review comment be in code for future readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
fiftyone/core/collections.py (2)
8023-8104
: Consider replacing lambda with a regular function definitionThe implementation looks good, but using a lambda for the
make
function could be improved for better readability.- make = lambda field_or_expr: foa.Min( - field_or_expr, expr=expr, safe=safe - ) + def make(field_or_expr): + return foa.Min(field_or_expr, expr=expr, safe=safe)🧰 Tools
🪛 Ruff
8101-8103: Do not assign a
lambda
expression, use adef
Rewrite
make
as adef
(E731)
8107-8188
: Consider replacing lambda with a regular function definitionSimilar to the min() method, using a lambda for the
make
function could be improved for better readability.- make = lambda field_or_expr: foa.Max( - field_or_expr, expr=expr, safe=safe - ) + def make(field_or_expr): + return foa.Max(field_or_expr, expr=expr, safe=safe)🧰 Tools
🪛 Ruff
8185-8187: Do not assign a
lambda
expression, use adef
Rewrite
make
as adef
(E731)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
fiftyone/core/collections.py
(4 hunks)
🧰 Additional context used
🪛 Ruff
fiftyone/core/collections.py
8101-8103: Do not assign a lambda
expression, use a def
Rewrite make
as a def
(E731)
8185-8187: Do not assign a lambda
expression, use a def
Rewrite make
as a def
(E731)
🔇 Additional comments (1)
fiftyone/core/collections.py (1)
661-676
: LGTM! Well-optimized implementation with clear documentation
The _get_extremum()
implementation is well thought out, with clear documentation explaining the optimization rationale. The method efficiently handles both sample and frame fields using MongoDB's $sort + $limit 1
approach which is more performant than $group _id: None
when the field is indexed.
Adds
min()
andmax()
aggregations.These can be computed via
bounds()[0]
andbounds()[1]
today, but it's nice to have dedicated methods because:Summary by CodeRabbit
New Features
min()
) and maximum (max()
) values in datasets.Min
andMax
, to enhance aggregation functionalities.Bug Fixes
Tests
min
andmax
functionalities to validate their performance and correctness.