-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExecutionStore + MongoDB integration tests and fixes #5095
Conversation
Caution Review failedThe pull request is closed. WalkthroughThe changes in this pull request involve modifications to several classes and methods within the FiftyOne library, primarily focusing on enhancing the functionality of execution stores. Key updates include the addition of parameters to methods, renaming of existing methods, and the introduction of new methods for managing stores and keys. Additionally, unit tests have been updated for clarity and consistency, ensuring comprehensive coverage of the new functionalities. Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
da025a6
to
4f8ea26
Compare
4f8ea26
to
b727d91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (11)
fiftyone/operators/store/models.py (1)
54-58
: Consider adding type assertion for defensive programming.The implementation looks good and provides a safe way to access the store's metadata. However, for additional safety, consider adding a type assertion to ensure
self.value
is a dictionary when it's not None.@property def metadata(self) -> dict[str, Any]: """The metadata associated with the store.""" - return self.value or {} + if self.value is not None: + assert isinstance(self.value, dict), "Store value must be a dictionary" + return self.value + return {}fiftyone/factory/repo_factory.py (1)
66-72
: Consider implementing cache managementThe repository caching logic works correctly, but the
repos
dictionary could grow unbounded as new dataset_ids are added. Consider implementing a cache eviction strategy or size limit.Example implementation:
class RepositoryFactory(object): MAX_CACHE_SIZE = 1000 # Adjust based on your needs repos = {} @staticmethod def _evict_cache_if_needed(): if len(RepositoryFactory.repos) >= RepositoryFactory.MAX_CACHE_SIZE: # LRU, FIFO, or other eviction strategy RepositoryFactory.repos.pop(next(iter(RepositoryFactory.repos)))tests/unittests/decorators.py (1)
48-49
: Enhance docstring with pattern parameter detailsThe docstring should document the
pattern
parameter and its usage.- """Decorator that drops all stores from the database before running a test.""" + """Decorator that drops stores from the database before running a test. + + Args: + func: The test function to wrap + pattern (str): Unix shell-style wildcard pattern to filter store names. + Defaults to "*" which matches all stores. + """tests/unittests/execution_store_service_tests.py (5)
1-7
: Enhance module docstring with more detailsConsider expanding the docstring to include:
- Purpose and scope of these tests
- Brief description of ExecutionStoreService
- Example usage if applicable
""" -FiftyOne execution store related unit tests. +Unit tests for FiftyOne's ExecutionStoreService. + +These tests verify the functionality of execution stores, including: +- Store creation and management +- Key-value operations +- Dataset-scoped stores +- Metadata handling | Copyright 2017-2024, Voxel51, Inc. | `voxel51.com <https://voxel51.com/>`_
17-27
: Add docstrings to fixturesThe fixtures would benefit from docstrings explaining their purpose and return values.
@pytest.fixture def svc(): + """ + Returns a clean ExecutionStoreService instance for global store testing. + + Returns: + ExecutionStoreService: A service instance without dataset context + """ return ExecutionStoreService() @pytest.fixture def svc_with_dataset(): + """ + Returns an ExecutionStoreService instance with a test dataset context. + + Yields: + ExecutionStoreService: A service instance with dataset context + """ dataset = fo.Dataset(name="test_dataset") dataset.save() yield ExecutionStoreService(dataset_id=dataset._doc.id)
94-113
: Split complex test into smaller, focused testsThe
test_list_global_stores
function tests multiple scenarios. Consider splitting it into separate test functions for better clarity and maintenance.-def test_list_global_stores(svc, svc_with_dataset): +def test_list_global_stores_with_dataset_less_store(svc): NO_DATASET_STORE_NAME = "dataset_less_store" - DATASET_STORE_NAME = "dataset_store" - KEY_ONLY_STORE_NAME = "key_only_store" svc.create_store(NO_DATASET_STORE_NAME) + global_list = svc.list_stores_global() + assert len(global_list) == 1 + assert global_list[0].store_name == NO_DATASET_STORE_NAME + assert global_list[0].dataset_id is None + +def test_list_global_stores_with_dataset_store(svc_with_dataset): + DATASET_STORE_NAME = "dataset_store" svc_with_dataset.create_store(DATASET_STORE_NAME) svc_with_dataset.set_key(DATASET_STORE_NAME, "key", "value") - svc_with_dataset.set_key(KEY_ONLY_STORE_NAME, "key", "value") global_list = svc.list_stores_global() - store_names = [store.store_name for store in global_list] - dataset_ids = [store.dataset_id for store in global_list] - assert len(global_list) == 3 - assert NO_DATASET_STORE_NAME in store_names - assert DATASET_STORE_NAME in store_names - assert KEY_ONLY_STORE_NAME in store_names - assert None in dataset_ids - assert svc_with_dataset._dataset_id in dataset_ids + assert len(global_list) == 1 + assert global_list[0].store_name == DATASET_STORE_NAME + assert global_list[0].dataset_id == svc_with_dataset._dataset_id
31-42
: Define test constants at module levelCommon test values like store names and metadata could be defined as module-level constants to improve maintainability.
+# Test constants +TEST_STORE_NAME = "test_store" +TEST_METADATA = {"test": "value"} + @drop_stores @drop_datasets def test_store_creation(svc): - NAME = "test_store" - created_store = svc.create_store(NAME) + created_store = svc.create_store(TEST_STORE_NAME) assert ( - created_store.store_name == NAME + created_store.store_name == TEST_STORE_NAME ), "Store name should match the given name"Also applies to: 61-74
84-89
: Improve assertion messagesSome assertions could benefit from more descriptive error messages that explain the expected behavior in more detail.
assert ( svc.count_keys(NAME) == 1 - ), "Store should have 1 key after setting it" + ), f"Store '{NAME}' should have exactly 1 key after setting '{KEY}'" assert ( svc.get_key(NAME, KEY).value == VALUE - ), "Retrieved value should match the set value" + ), f"Retrieved value for key '{KEY}' should be '{VALUE}' but got '{svc.get_key(NAME, KEY).value}'"Also applies to: 132-134
fiftyone/operators/store/service.py (2)
Line range hint
53-64
: Update docstring to include the metadata parameterThe docstring should be updated to document the new
metadata
parameter.Apply this diff to update the documentation:
def create_store( self, store_name: str, metadata: Optional[dict[str, Any]] = None ) -> StoreDocument: """Creates a new store with the specified name. Args: store_name: the name of the store + metadata: optional metadata to associate with the store Returns: a :class:`fiftyone.store.models.StoreDocument` """
132-139
: Add return type to the docstringThe docstring should document the return type of the method.
Apply this diff to complete the documentation:
def has_key(self, store_name: str, key: str) -> bool: """Determines whether the specified key exists in the specified store. Args: store_name: the name of the store key: the key to check + + Returns: + True if the key exists, False otherwise """fiftyone/factory/repos/execution_store.py (1)
228-249
: Ensure aggregation pipeline handles large datasets efficientlyThe aggregation pipeline in
list_stores_global
could potentially process a large number of documents. Consider adding indexes onstore_name
anddataset_id
to optimize the grouping operation.If not already indexed, apply this diff to create indexes:
def _create_indexes(self): indices = [idx["name"] for idx in self._collection.list_indexes()] ... + # Ensure indexes for efficient aggregation + self._collection.create_index([("store_name", 1), ("dataset_id", 1)], name="store_dataset_index")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (8)
fiftyone/factory/repo_factory.py
(1 hunks)fiftyone/factory/repos/execution_store.py
(4 hunks)fiftyone/operators/store/models.py
(2 hunks)fiftyone/operators/store/service.py
(4 hunks)fiftyone/operators/store/store.py
(1 hunks)tests/unittests/decorators.py
(2 hunks)tests/unittests/execution_store_service_tests.py
(1 hunks)tests/unittests/execution_store_unit_tests.py
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/unittests/execution_store_unit_tests.py
🧰 Additional context used
📓 Learnings (1)
fiftyone/factory/repo_factory.py (1)
Learnt from: brimoor
PR: voxel51/fiftyone#5086
File: fiftyone/factory/repos/execution_store.py:141-143
Timestamp: 2024-11-12T12:36:39.324Z
Learning: In methods like `get_key` in `ExecutionStoreRepo` (file `fiftyone/factory/repos/execution_store.py`), when `self._dataset_id` is `None`, it's important to include `{"dataset_id": None}` in the MongoDB query, as omitting `dataset_id` is not equivalent to including it with `None`.
🔇 Additional comments (10)
fiftyone/operators/store/models.py (1)
44-44
: LGTM!
The added empty line improves code readability by properly separating methods, following PEP 8 guidelines.
fiftyone/factory/repo_factory.py (1)
60-72
: Implementation looks good!
The changes effectively fix the bug with dataset_ids in cached repos while maintaining clean and maintainable code. The dynamic key generation and repository caching logic are well-implemented.
tests/unittests/decorators.py (2)
11-11
: LGTM: Clean and purposeful imports
The new imports are well-chosen and properly used in the new decorator implementation.
Also applies to: 14-14
48-62
: Verify decorator usage patterns
Let's verify how this decorator integrates with existing test patterns.
✅ Verification successful
Decorator usage patterns verified
No decorator order conflicts detected. The @drop_*
decorators are consistently integrated with the test suite.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check usage patterns of test decorators and potential conflicts
# Find similar decorators in test files
echo "=== Similar decorators ==="
rg -l "def.*drop_.*\(.*\):" tests/
# Find potential usage patterns
echo -e "\n=== Potential usage patterns ==="
rg "@drop_" tests/
# Check for potential decorator order conflicts
echo -e "\n=== Potential decorator conflicts ==="
rg "^@.*\n@drop_|^@drop_\n@.*" tests/
Length of output: 23626
Script:
#!/bin/bash
# Description: Check for potential decorator order conflicts with multiline support
# Check for potential decorator order conflicts with multiline regex
echo "=== Potential decorator conflicts ==="
rg -U "^@.*\n@drop_|^@drop_\n@.*" tests/
Length of output: 1387
fiftyone/operators/store/store.py (1)
113-116
: LGTM! Improved error handling and code reuse
The updated implementation is more robust and consistent with the rest of the class by:
- Reusing the
get_key
method, following DRY principles - Providing better error handling by explicitly checking for non-existent keys
- Maintaining consistency with other methods like
get()
tests/unittests/execution_store_service_tests.py (1)
29-181
: LGTM! Comprehensive test coverage
The test suite provides excellent coverage of ExecutionStoreService functionality, including:
- Store creation and management
- Key-value operations
- Dataset scoping
- Metadata handling
- Error cases
The tests are well-structured and use appropriate fixtures and cleanup decorators.
fiftyone/operators/store/service.py (3)
50-50
: LGTM: Proper initialization of dataset context
The addition of _dataset_id
instance variable correctly maintains the dataset context as described in the class documentation.
66-75
: LGTM: Well-documented store retrieval method
The new get_store
method is properly implemented with complete documentation and correct typing.
234-244
: LGTM: Well-implemented global store deletion
The new delete_store_global
method provides a clean way to manage stores across all contexts with proper documentation and typing.
fiftyone/factory/repos/execution_store.py (1)
60-62
: Initialize StoreDocument
with potential missing fields
When raw_store_doc
is None
and self.has_store(store_name)
returns True
, the newly created StoreDocument
may be missing expected fields like value
. Ensure that all necessary fields are properly initialized to avoid potential attribute errors downstream.
Check whether StoreDocument
initialization covers all required fields:
fiftyone/factory/repo_factory.py
Outdated
repo_key = ( | ||
f"{MongoExecutionStoreRepo.COLLECTION_NAME}_{dataset_id}" | ||
if dataset_id | ||
else MongoExecutionStoreRepo.COLLECTION_NAME | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Dataset ID Formatting Inconsistency Detected
The tests convert dataset_id
to string, whereas the repository expects an ObjectId. This inconsistency may lead to issues with key generation and data retrieval.
🔗 Analysis chain
Verify dataset_id format for key generation
The dynamic key generation looks good, but we should ensure that the dataset_id
string representation doesn't contain characters that could create invalid or conflicting keys.
Let's check the dataset_id format in the codebase:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check dataset_id format usage across the codebase
# Expected: dataset_id should be ObjectId which ensures safe string representation
# Check for dataset_id type hints and usage
ast-grep --pattern 'dataset_id: ObjectId'
# Check for dataset_id string operations
rg -A 2 'dataset_id.*str\('
Length of output: 5901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/unittests/execution_store_service_tests.py (1)
247-247
: Remove unused variableKEY_A
The variable
KEY_A
is declared but never used in the test_cleanup function.- KEY_A = "key_a"
🧰 Tools
🪛 Ruff
247-247: Local variable
KEY_A
is assigned to but never usedRemove assignment to unused variable
KEY_A
(F841)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
tests/unittests/execution_store_service_tests.py
(1 hunks)
🧰 Additional context used
🪛 Ruff
tests/unittests/execution_store_service_tests.py
247-247: Local variable KEY_A
is assigned to but never used
Remove assignment to unused variable KEY_A
(F841)
🔇 Additional comments (3)
tests/unittests/execution_store_service_tests.py (3)
1-16
: LGTM! Well-structured imports and header
The file header is properly documented with copyright notice, and all necessary imports are included.
17-27
: LGTM! Well-designed test fixtures
The fixtures provide good test isolation with both dataset-less and dataset-bound service instances. The cleanup is handled by the @drop_datasets
decorator.
29-240
: LGTM! Comprehensive test coverage with clear assertions
The test suite provides excellent coverage of ExecutionStoreService functionality including:
- Store lifecycle management
- Key operations
- Scoping behavior
- TTL handling
- Edge cases
The tests follow good practices with clear arrange-act-assert patterns and descriptive assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
fiftyone/factory/repos/execution_store.py
(5 hunks)fiftyone/operators/store/models.py
(2 hunks)tests/unittests/execution_store_service_tests.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- fiftyone/operators/store/models.py
- tests/unittests/execution_store_service_tests.py
🔇 Additional comments (6)
fiftyone/factory/repos/execution_store.py (6)
89-107
: LGTM! Efficient implementation using aggregation pipeline
The change from a simple count to an aggregation pipeline is a good optimization, especially for large datasets. The implementation correctly handles empty results and properly groups by both store name and dataset ID.
170-175
: LGTM! Clean and efficient implementation
The method is well-implemented with proper boolean conversion and efficient database query.
241-262
: LGTM! Well-structured aggregation pipeline
The change to return StoreDocument
instances instead of strings provides more functionality while maintaining efficiency through the aggregation pipeline.
265-279
: LGTM! Consistent and efficient implementation
The implementation maintains consistency with count_stores
while properly handling global scope and empty results.
281-286
: LGTM! Clean implementation of global store deletion
The method correctly implements global store deletion without dataset filtering and provides meaningful feedback through the deletion count.
50-65
: 🛠️ Refactor suggestion
Add Optional
type hint for nullable return value
The method can return None
, but the return type annotation doesn't reflect this. Add Optional
type hint for better type safety.
Apply this diff:
+from typing import Any, Dict, Optional
- def get_store(self, store_name) -> StoreDocument:
+ def get_store(self, store_name) -> Optional[StoreDocument]:
Likely invalid or redundant comment.
@@ -34,23 +35,42 @@ def __init__(self, collection: Collection, dataset_id: ObjectId = None): | |||
self._collection = collection | |||
self._dataset_id = dataset_id | |||
|
|||
def create_store(self, store_name) -> StoreDocument: | |||
def create_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw a PR where @imanjra plans to change this to def store()
so just a heads up in case a conflict arises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
tests/unittests/decorators.py (2)
48-49
: Enhance docstring with usage examplesThe docstring would be more helpful with examples showing pattern usage:
- """Decorator that drops all stores from the database before running a test.""" + """Decorator that drops all stores from the database before running a test. + + Args: + func: The test function to wrap + pattern (str, optional): Unix shell-style wildcard pattern for matching store names. + Defaults to "*" which matches all stores. + + Example: + @drop_stores + def test_basic(): + # All stores are dropped before this test + pass + + @drop_stores(pattern="test_*") + def test_specific(): + # Only stores starting with "test_" are dropped + pass + """
51-64
: Add debug logging for better test observabilityConsider adding debug logging to help diagnose test cleanup issues:
@wraps(func) def wrapper(*args, **kwargs): svc = foos.ExecutionStoreService() + logger = fo.get_logger() + logger.debug("Dropping stores matching pattern: %s", pattern) stores = svc.list_stores_global() for store in stores: store_name = store.store_name if fnmatch.fnmatch(store_name, pattern): + logger.debug("Dropping store: %s", store_name) try: svc.delete_store_global(store_name) except Exception as e: + logger.error("Failed to drop store: %s", store_name, exc_info=True) raise RuntimeError( f"Failed to delete store '{store_name}'" ) from efiftyone/factory/repos/execution_store.py (5)
38-45
: Add docstring for the metadata parameterThe
metadata
parameter should be documented to explain its purpose and expected format.def create_store( self, store_name, metadata: Dict[str, Any] = None ) -> StoreDocument: - """Creates a store associated with the current context.""" + """Creates a store associated with the current context. + + Args: + store_name: Name of the store to create + metadata: Optional metadata to associate with the store + + Returns: + StoreDocument: The created store document + """🧰 Tools
🪛 Ruff
39-39: Undefined name
Dict
(F821)
50-65
: Consider simplifying store retrieval logicBased on the previous comment, the
key="__store__"
check in the query might be unnecessary since a store can exist with any key. Consider simplifying the retrieval logic.def get_store(self, store_name) -> StoreDocument: """Gets a store associated with the current context.""" raw_store_doc = self._collection.find_one( dict( store_name=store_name, - key="__store__", dataset_id=self._dataset_id, ) )
89-107
: Add error handling for aggregation pipelineThe aggregation pipeline could fail due to various MongoDB errors. Consider adding error handling.
- result = list(self._collection.aggregate(pipeline)) - return result[0]["total_stores"] if result else 0 + try: + result = list(self._collection.aggregate(pipeline)) + return result[0]["total_stores"] if result else 0 + except Exception as e: + logger.error(f"Failed to count stores: {e}") + return 0
170-175
: Optimize key existence checkThe current implementation retrieves the entire document. Consider using projection to improve performance.
def has_key(self, store_name, key) -> bool: """Determines whether a key exists in the specified store.""" result = self._collection.find_one( - dict(store_name=store_name, key=key, dataset_id=self._dataset_id) + dict(store_name=store_name, key=key, dataset_id=self._dataset_id), + projection={"_id": 1} ) return bool(result)
241-262
: Consider adding pagination to global store listingFor large datasets, retrieving all stores at once could be memory-intensive. Consider adding pagination support.
Example implementation:
def list_stores_global(self, skip: int = 0, limit: int = 100) -> list[StoreDocument]: """Lists stores across all datasets and the global context. Args: skip: Number of documents to skip limit: Maximum number of documents to return """ pipeline = [ # ... existing pipeline stages ... {"$skip": skip}, {"$limit": limit} ] result = self._collection.aggregate(pipeline) return [StoreDocument(**d) for d in result]
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
fiftyone/factory/repos/execution_store.py
(5 hunks)tests/unittests/decorators.py
(2 hunks)
🧰 Additional context used
🪛 Ruff
fiftyone/factory/repos/execution_store.py
39-39: Undefined name Dict
(F821)
🔇 Additional comments (2)
tests/unittests/decorators.py (1)
11-11
: LGTM: Import statements are well-organized
The new imports are properly grouped (standard library followed by local imports) and are necessary for the new functionality.
Also applies to: 14-14
fiftyone/factory/repos/execution_store.py (1)
13-13
:
Add missing Dict
import from typing module
The type annotation Dict
is used but not imported. Add it to the import statement.
-from typing import Any
+from typing import Any, Dict
Likely invalid or redundant comment.
self._repo: ExecutionStoreRepo = repo | ||
|
||
def create_store(self, store_name: str) -> StoreDocument: | ||
def create_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same create_store
comment as before, I don't care which method name is used - I added a try/except for both just in case in DQ panel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍🏽 only noted the naming of create_store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on 💪
I think there's an important change needed to @drop_stores
before this can be merged!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (6)
tests/unittests/decorators.py (1)
51-66
: Add error handling and logging to the collection drop operations.The decorator correctly uses a
finally
block to ensure cleanup, but it should:
- Handle errors during collection drops
- Log collection operations for debugging
Consider this enhanced implementation:
def drop_collection(collection_name): """Decorator that drops a collection from the database before and after running a test.""" + import logging + logger = logging.getLogger(__name__) def decorator(func): @wraps(func) def wrapper(*args, **kwargs): db = foo.get_db_conn() - db.drop_collection(collection_name) + try: + db.drop_collection(collection_name) + logger.debug(f"Dropped collection '{collection_name}' before test") + except Exception as e: + logger.warning(f"Failed to drop collection '{collection_name}': {e}") try: return func(*args, **kwargs) finally: - db.drop_collection(collection_name) + try: + db.drop_collection(collection_name) + logger.debug(f"Dropped collection '{collection_name}' after test") + except Exception as e: + logger.warning(f"Failed to drop collection '{collection_name}': {e}") return wrapper return decoratorfiftyone/operators/store/service.py (2)
Line range hint
55-67
: Document the metadata parameter in the docstringThe docstring is missing documentation for the new
metadata
parameter.Add the metadata parameter to the docstring:
def create_store( self, store_name: str, metadata: Optional[dict[str, Any]] = None ) -> StoreDocument: """Creates a new store with the specified name. Args: store_name: the name of the store + metadata: optional metadata to associate with the store
134-141
: Complete the docstring for has_key methodThe docstring is missing the Returns section and return type documentation.
Update the docstring:
def has_key(self, store_name: str, key: str) -> bool: """Determines whether the specified key exists in the specified store. Args: store_name: the name of the store key: the key to check + + Returns: + True if the key exists, False otherwise """fiftyone/factory/repos/execution_store.py (1)
241-262
: Consider adding error handling for aggregation failuresThe implementation looks good, but consider adding error handling for potential aggregation failures, especially since this is a global operation that could process large amounts of data.
def list_stores_global(self) -> list[StoreDocument]: """Lists stores across all datasets and the global context.""" - pipeline = [ - { - "$group": { - "_id": { - "store_name": "$store_name", - "dataset_id": "$dataset_id", - } - } - }, - { - "$project": { - "_id": 0, - "store_name": "$_id.store_name", - "dataset_id": "$_id.dataset_id", - } - }, - ] - - result = self._collection.aggregate(pipeline) - return [StoreDocument(**d) for d in result] + try: + pipeline = [ + { + "$group": { + "_id": { + "store_name": "$store_name", + "dataset_id": "$dataset_id", + } + } + }, + { + "$project": { + "_id": 0, + "store_name": "$_id.store_name", + "dataset_id": "$_id.dataset_id", + } + }, + ] + + result = self._collection.aggregate(pipeline) + return [StoreDocument(**d) for d in result] + except Exception as e: + logger.error(f"Failed to list global stores: {e}") + return []fiftyone/operators/types.py (2)
1348-1348
: Clarify theoperator
parameter descriptionIn the
Button
class, the description for theoperator
parameter mentions "the uri of the operator to execute when the button is clicked." If theoperator
parameter accepts both URIs and callables, it's helpful to clarify this in the description to avoid confusion.Consider updating the parameter description as follows:
- operator (None): the uri of the operator to execute when the button is clicked + operator (None): the operator to execute when the button is clicked, specified by its URI or as a callable
1373-1414
: Consider inheriting fromButton
to promote code reuseThe
OperatorExecutionButtonView
class shares similar properties and functionality with theButton
class. To enhance maintainability and reduce code duplication, consider havingOperatorExecutionButtonView
inherit fromButton
instead of directly fromView
.Modify the class definition as follows:
-class OperatorExecutionButtonView(View): +class OperatorExecutionButtonView(Button):This change allows
OperatorExecutionButtonView
to inherit existing attributes likeoperator
,params
, andprompt
, and focus on extending or customizing the behavior specific to this class.🧰 Tools
🪛 Ruff
1401-1401: Use
kwargs.get("operator")
instead ofkwargs.get("operator", None)
Replace
kwargs.get("operator", None)
withkwargs.get("operator")
(SIM910)
1403-1403: Use
kwargs.get("params")
instead ofkwargs.get("params", None)
Replace
kwargs.get("params", None)
withkwargs.get("params")
(SIM910)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
fiftyone/factory/repo_factory.py
(1 hunks)fiftyone/factory/repos/execution_store.py
(5 hunks)fiftyone/operators/store/service.py
(5 hunks)fiftyone/operators/types.py
(2 hunks)tests/unittests/decorators.py
(2 hunks)tests/unittests/execution_store_service_tests.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unittests/execution_store_service_tests.py
🧰 Additional context used
🪛 Ruff
fiftyone/operators/types.py
1401-1401: Use kwargs.get("operator")
instead of kwargs.get("operator", None)
Replace kwargs.get("operator", None)
with kwargs.get("operator")
(SIM910)
1403-1403: Use kwargs.get("params")
instead of kwargs.get("params", None)
Replace kwargs.get("params", None)
with kwargs.get("params")
(SIM910)
tests/unittests/decorators.py
11-11: fnmatch
imported but unused
Remove unused import: fnmatch
(F401)
48-48: Redefinition of unused wraps
from line 8
Remove definition: wraps
(F811)
🔇 Additional comments (10)
fiftyone/factory/repo_factory.py (2)
59-63
: LGTM: Dynamic collection name handling looks good
The addition of the optional collection_name
parameter and its handling provides good flexibility while maintaining backward compatibility through the default fallback to MongoExecutionStoreRepo.COLLECTION_NAME
.
64-67
: Removal of repo caching fixes the global state issue
The removal of repo caching and direct instantiation of MongoExecutionStoreRepo
is a good change as it fixes the bug where repos were incorrectly considered global in the cache, leading to incorrect dataset_id
references.
Let's verify there are no remaining global state issues:
✅ Verification successful
Let me gather more context about the repo factory implementation to ensure there are no remaining global state issues.
Global state is properly handled after the changes
The removal of repo caching for execution_store_repo
is indeed the correct fix. The code shows that while there is still a global cache (repos
) being used for delegated_operation_repo
, this is safe because that repo doesn't depend on dataset-specific state like dataset_id
. The execution_store_repo
now correctly creates a new instance for each call, avoiding any global state issues with dataset references.
delegated_operation_repo
: Uses caching but has no dataset-specific stateexecution_store_repo
: No longer cached, creates fresh instances with correct dataset_id
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential global state issues with repos
# Expected: No other global caching of repos with dataset_id
# Check for other repo caches that might have similar issues
rg -A 3 'repos\s*=\s*\{'
# Check for static repo storage
ast-grep --pattern 'class $_ {
$$$
repos = {}
$$$
}'
Length of output: 315
Script:
#!/bin/bash
# Check the full implementation of repo_factory.py to understand the context
cat fiftyone/factory/repo_factory.py
# Look for any other potential static/class variables that could cause global state
ast-grep --pattern 'class $_ {
$$$
$_: Dict = {}
$$$
}'
# Search for any singleton patterns or global instances
rg -l "singleton|instance\s*=\s*None"
Length of output: 2473
tests/unittests/decorators.py (1)
51-66
: Verify test isolation strategy.
Based on the previous review discussion, this decorator implements the agreed-upon approach of using a temporary collection for test isolation. This is a good solution that prevents tests from affecting persistent user data.
Let's verify that this decorator is being used correctly in the test files:
✅ Verification successful
Let me check the value of TEST_COLLECTION_NAME
to ensure it follows test isolation best practices.
Test isolation strategy is properly implemented
The decorator is consistently used with a test-specific collection named "execution_store_test_collection" across all test methods in execution_store_service_tests.py
. The collection name clearly indicates its test purpose, and the decorator ensures proper cleanup before and after each test by dropping the collection, maintaining test isolation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the decorator is used with test-specific collection names
# Expected: Collection names should contain 'test' or be temporary
# Search for usage of the decorator
rg -A 2 '@drop_collection'
Length of output: 4249
Script:
#!/bin/bash
# Find the definition of TEST_COLLECTION_NAME
rg "TEST_COLLECTION_NAME =" tests/
Length of output: 142
fiftyone/operators/store/service.py (3)
39-39
: LGTM: Constructor changes maintain backward compatibility
The addition of the optional collection_name
parameter enhances flexibility while maintaining backward compatibility.
Also applies to: 49-50
68-77
: LGTM: Well-documented get_store method
The new method provides a clean interface for retrieving store documents with proper documentation and type hints.
236-246
: LGTM: Comprehensive global store deletion
The new method properly implements global store deletion with complete documentation and follows the established pattern of global operations.
fiftyone/factory/repos/execution_store.py (4)
13-13
: LGTM: Type annotations and metadata handling improvements
The changes properly implement type hints using Dict
from typing
and correctly handle the optional metadata parameter in create_store
.
Also applies to: 38-45
170-175
: LGTM: Clean implementation of has_key method
The method efficiently checks for key existence using find_one and properly returns a boolean result.
50-65
: 🛠️ Refactor suggestion
Add Optional type annotation for get_store method
The method can return None, but the return type annotation doesn't reflect this. Consider updating the return type to Optional[StoreDocument]
.
- def get_store(self, store_name) -> StoreDocument:
+ def get_store(self, store_name) -> Optional[StoreDocument]:
Likely invalid or redundant comment.
89-107
: Verify performance impact of aggregation pipeline
The new implementation uses a more complex aggregation pipeline instead of a simple count. While this provides more accurate results by properly grouping stores, it might impact performance for large datasets.
✅ Verification successful
Based on the analysis of the codebase, I can now provide a definitive response about the performance impact of the aggregation pipeline:
Aggregation pipeline change is justified and should not impact performance
The new implementation using aggregation pipeline with $group
is actually more accurate as it properly handles duplicate store names within the same dataset. The performance impact should be negligible because:
- The collection already has an index on
dataset_id
as shown by the index creation code infiftyone/factory/repos/execution_store.py
- The aggregation pipeline is simple and only involves a match on an indexed field followed by a group operation
- The result set is minimal - just returning a count
- The execution store is not expected to have a massive number of records per dataset
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Compare execution time of old vs new implementation
# Note: This is a simplified comparison that doesn't account for all factors
echo "Checking collection size..."
mongo --eval 'db.execution_store.count()'
echo "Checking index usage for the aggregation pipeline..."
mongo --eval 'db.execution_store.aggregate([
{$match: {dataset_id: null}},
{$group: {_id: {store_name: "$store_name", dataset_id: "$dataset_id"}}},
{$count: "total_stores"}
], {explain: true})'
Length of output: 571
Script:
#!/bin/bash
# Let's analyze the codebase to understand the context better
# First, check if there are any existing indexes defined
rg -A 5 "create_index|createIndex|ensure_index|ensureIndex"
# Look for any performance-related tests or benchmarks
fd -e py -e yaml test
rg -l "performance|benchmark"
# Check if there are any comments or documentation about performance considerations
rg -i "performance|optimization"
# Look for similar aggregation patterns in the codebase
ast-grep --pattern 'aggregate($$$)'
Length of output: 109379
4cf20d3
to
1206538
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
tests/unittests/decorators.py (1)
13-13
: Consider using a more descriptive alias for the ODM importThe alias 'foo' is not descriptive of the module's purpose. Consider using a more meaningful alias like 'odm' or 'db_odm' to improve code readability.
-import fiftyone.core.odm as foo +import fiftyone.core.odm as odmtests/unittests/execution_store_unit_tests.py (1)
Line range hint
347-365
: Critical: Fix incorrect test method nestingThe
test_create_store_with_dataset_id
method is incorrectly nested insidetest_delete_key_with_dataset_id
. This will prevent the test runner from discovering and executing this test.Fix the indentation to align with other test methods:
- def test_create_store_with_dataset_id(self): - self.store_service.create_store("widgets") - self.mock_collection.insert_one.assert_called_once() - self.mock_collection.insert_one.assert_called_with( - { - "store_name": "widgets", - "key": "__store__", # Include this in your expected call - "value": None, - "dataset_id": self.dataset_id, - "created_at": IsDateTime(), - "updated_at": None, - "expires_at": None, - } - ) + def test_create_store_with_dataset_id(self): + self.store_service.create_store("widgets") + self.mock_collection.insert_one.assert_called_once() + self.mock_collection.insert_one.assert_called_with( + { + "store_name": "widgets", + "key": "__store__", # Include this in your expected call + "value": None, + "dataset_id": self.dataset_id, + "created_at": IsDateTime(), + "updated_at": None, + "expires_at": None, + } + )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
fiftyone/factory/repo_factory.py
(1 hunks)fiftyone/factory/repos/execution_store.py
(5 hunks)fiftyone/operators/store/service.py
(5 hunks)tests/unittests/decorators.py
(2 hunks)tests/unittests/execution_store_service_tests.py
(1 hunks)tests/unittests/execution_store_unit_tests.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- fiftyone/factory/repo_factory.py
- fiftyone/operators/store/service.py
- tests/unittests/execution_store_service_tests.py
🔇 Additional comments (10)
tests/unittests/decorators.py (1)
47-62
: Verify decorator behavior with test collections
Let's ensure the decorator works correctly with test collections and properly cleans up.
✅ Verification successful
Decorator implementation is working as expected
The @drop_collection
decorator is actively used in execution_store_service_tests.py
with proper test collection cleanup. The implementation is verified through:
- Consistent usage across 17 test methods with
TEST_COLLECTION_NAME
- Proper combination with other decorators (
@drop_datasets
) - Collection cleanup both before and after test execution
- Clear test collection naming convention (
execution_store_test_collection
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for existing test implementations using the decorator
# Look for usage patterns of the decorator
rg -l "drop_collection.*__test__"
# Check for any remaining test collections after test runs
ast-grep --pattern 'drop_collection($collection)'
Length of output: 87
Script:
#!/bin/bash
# Search for any usage of drop_collection decorator in test files
rg "@drop_collection" -A 3 tests/
# Search for imports of drop_collection
rg "from.*decorators.*drop_collection" tests/
# Look for any test collection patterns in the codebase
rg "test_collection|collection_name.*test" tests/
# Find all test files that might use database collections
fd "test.*\.py$" tests/ --exec grep -l "collection"
Length of output: 6339
fiftyone/factory/repos/execution_store.py (7)
38-46
: LGTM! The metadata parameter is properly integrated
The addition of the optional metadata parameter enhances the store creation functionality while maintaining backward compatibility.
50-66
: Well-structured implementation of store retrieval logic
The implementation correctly handles both cases where a store exists with or without the __store__
key, aligning with the updated data model requirements.
89-107
: Improved counting logic using aggregation pipeline
The change to use an aggregation pipeline provides a more efficient way to count unique stores while respecting the dataset context.
170-175
: LGTM! Clean implementation of key existence check
The method provides a straightforward way to check for key existence while properly handling the dataset context.
241-262
: Enhanced global store listing with proper document construction
The change to return StoreDocument instances provides a more structured and type-safe approach while maintaining efficiency through aggregation.
265-279
: LGTM! Consistent implementation of global store counting
The method follows the same efficient pattern as other counting methods while properly handling edge cases.
281-286
: Consider adding validation before global deletion
Referring to the previous review comment about adding safety checks and audit logging for global store deletion.
tests/unittests/execution_store_unit_tests.py (2)
79-79
: LGTM: Consistent dataset_id handling
The addition of dataset_id: None
in the test assertion is correct and maintains consistency with the service layer changes.
116-116
: LGTM: Proper dataset_id assertion
The addition of dataset_id: None
in the test assertion is correct and consistent with the service layer changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
fiftyone/factory/repos/execution_store.py (2)
89-107
: Add error handling for aggregation pipelineWhile the aggregation pipeline correctly counts unique stores, consider adding error handling for potential MongoDB errors.
def count_stores(self) -> int: pipeline = [ { "$match": { "dataset_id": self._dataset_id, } }, { "$group": { "_id": { "store_name": "$store_name", "dataset_id": "$dataset_id", } } }, {"$count": "total_stores"}, ] - result = list(self._collection.aggregate(pipeline)) - return result[0]["total_stores"] if result else 0 + try: + result = list(self._collection.aggregate(pipeline)) + return result[0]["total_stores"] if result else 0 + except Exception as e: + logger.error(f"Error counting stores: {e}") + return 0
265-279
: Add consistent error handlingFor consistency with the suggested changes to
count_stores
, consider adding similar error handling here.def count_stores_global(self) -> int: pipeline = [ { "$group": { "_id": { "store_name": "$store_name", "dataset_id": "$dataset_id", } } }, {"$count": "total_stores"}, ] - result = list(self._collection.aggregate(pipeline)) - return result[0]["total_stores"] if result else 0 + try: + result = list(self._collection.aggregate(pipeline)) + return result[0]["total_stores"] if result else 0 + except Exception as e: + logger.error(f"Error counting global stores: {e}") + return 0tests/unittests/execution_store_unit_tests.py (2)
Line range hint
341-357
: Fix indentation of test_create_store_with_dataset_id methodThe method is incorrectly indented, making it a nested function within
test_delete_store_with_dataset_id
. This prevents the test from being discovered and executed by the test runner.Fix the indentation:
- def test_create_store_with_dataset_id(self): - self.store_service.create_store("widgets") - self.mock_collection.insert_one.assert_called_once() - self.mock_collection.insert_one.assert_called_with( - { - "store_name": "widgets", - "key": "__store__", # Include this in your expected call - "value": None, - "dataset_id": self.dataset_id, - "created_at": IsDateTime(), - "updated_at": None, - "expires_at": None, - } - ) + def test_create_store_with_dataset_id(self): + self.store_service.create_store("widgets") + self.mock_collection.insert_one.assert_called_once() + self.mock_collection.insert_one.assert_called_with( + { + "store_name": "widgets", + "key": "__store__", + "value": None, + "dataset_id": self.dataset_id, + "created_at": IsDateTime(), + "updated_at": None, + "expires_at": None, + } + )
Line range hint
256-357
: Consider adding negative test cases for dataset_id operationsThe test suite thoroughly covers the happy path for dataset_id integration, but could benefit from additional test cases:
- Attempting to access a store with incorrect dataset_id
- Attempting to modify a store with mismatched dataset_id
- Edge cases for dataset_id validation
Would you like me to provide example test cases for these scenarios?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
fiftyone/factory/repo_factory.py
(1 hunks)fiftyone/factory/repos/execution_store.py
(5 hunks)fiftyone/operators/store/service.py
(5 hunks)tests/unittests/decorators.py
(2 hunks)tests/unittests/execution_store_service_tests.py
(1 hunks)tests/unittests/execution_store_unit_tests.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- fiftyone/factory/repo_factory.py
- fiftyone/operators/store/service.py
- tests/unittests/decorators.py
- tests/unittests/execution_store_service_tests.py
🔇 Additional comments (6)
fiftyone/factory/repos/execution_store.py (5)
13-13
: LGTM: Metadata support added to store creation
The addition of metadata support to create_store
is well-implemented and properly typed.
Also applies to: 38-46
50-66
: LGTM: Robust store retrieval implementation
The implementation correctly handles the case where a store exists without a __store__
key by creating a new StoreDocument
. This aligns with the data model where a store can exist through any key with matching store_name.
170-176
: LGTM: Clean key existence check implementation
The has_key
implementation is straightforward and consistent with other existence checks in the codebase.
241-262
: LGTM: Enhanced store listing with full document return
Good improvement to return complete StoreDocument
instances instead of just store names, providing more information to callers.
281-286
: LGTM with existing safety recommendations
Implementation looks correct. Note that there's an existing review comment suggesting safety checks for this method.
tests/unittests/execution_store_unit_tests.py (1)
79-79
: LGTM: Proper dataset_id integration in test assertions
The addition of dataset_id: None
in the $setOnInsert block correctly validates the immutable nature of the dataset_id field.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1206538
to
f0d4db3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary by CodeRabbit
New Features
Bug Fixes
Tests
ExecutionStoreService
functionalities.