-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: log search queries and results #166
Conversation
Important Review skippedReview was skipped as selected files did not have any reviewable changes. 💤 Files selected but had no reviewable changes (1)
You can disable this status message by setting the WalkthroughThe pull request introduces several significant changes across multiple files, primarily focused on enhancing search functionality and managing search history. A new function Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 24
🧹 Outside diff range and nitpick comments (23)
cognee-frontend/src/modules/chat/getHistory.ts (1)
3-8
: Add JSDoc documentation.Adding documentation will help other developers understand the purpose and usage of this function.
Add this documentation above the function:
+/** + * Fetches the search history from the API. + * @returns Promise containing the search history items + * @throws {Error} If the network request fails or returns non-OK status + */ export default function getHistory() {cognee/api/v1/search/get_search_history.py (1)
5-5
: Add comprehensive docstring with return type details.The function lacks documentation about its purpose and the structure of the returned list.
Add a docstring:
async def get_search_history(user: User = None) -> list: + """Retrieve search history for a given user or the default user. + + Args: + user (User, optional): The user whose search history to retrieve. + If None, the default user's history will be fetched. + + Returns: + list: A list of search history entries, where each entry is a dict containing + search query details (specify the actual structure here). + + Raises: + ValueError: If unable to retrieve the default user or search history. + """cognee/modules/search/operations/log_result.py (1)
5-15
: Consider architectural improvements for scalability.As this function handles search logging, consider these architectural improvements:
- For high-volume scenarios, consider implementing batch processing of logs
- Add metrics collection for monitoring (e.g., logging latency, success rate)
- Implement a data retention policy for search logs
- Consider adding an index on (user_id, query_id) for efficient querying
Would you like assistance in implementing any of these improvements?
cognee/modules/search/operations/log_query.py (1)
5-19
: Consider adding query result logging.Since this is a search query logging system, it might be valuable to also log the number of results returned or other relevant metrics about the search operation.
Consider extending the Query model and this function to include:
- Number of results returned
- Search execution time
- Filters applied
- Sort criteria
This would provide valuable analytics data for improving search functionality.cognee/modules/search/models/Result.py (3)
6-7
: Add a docstring to document the model's purpose.Adding a docstring would improve code documentation and help other developers understand the purpose and usage of this model.
class Result(Base): + """ + Represents a search result in the database. + + This model stores individual search results along with their associated + query and user information, enabling search history tracking and analysis. + """ __tablename__ = "results"
9-9
: Fix PEP 8 spacing around operators.Remove spaces around the
=
operator in keyword arguments to follow PEP 8 style guidelines.- id = Column(UUID, primary_key = True, default = uuid4) + id = Column(UUID, primary_key=True, default=uuid4)
11-13
: Add foreign key constraints and indexes for better data integrity and performance.Consider the following improvements:
- Add foreign key constraints for
query_id
anduser_id
to ensure referential integrity- Add indexes for fields that will be frequently queried
- query_id = Column(UUID) - user_id = Column(UUID) + query_id = Column(UUID, ForeignKey("queries.id"), index=True) + user_id = Column(UUID, ForeignKey("users.id"), index=True) + + # Add relationships for easier querying + query = relationship("Query", back_populates="results") + user = relationship("User", back_populates="search_results")Don't forget to import the necessary SQLAlchemy components:
from sqlalchemy import ForeignKey from sqlalchemy.orm import relationshipcognee/modules/search/models/Query.py (3)
6-7
: Consider using a more specific table name.The table name "queries" is quite generic and could potentially conflict with other tables in different schemas. Consider using a more specific name like "search_queries" or defining a specific schema for this table.
class Query(Base): - __tablename__ = "queries" + __tablename__ = "search_queries"
9-9
: Fix spacing around operators to match PEP 8.While the UUID primary key implementation is good, the spacing around operators should be consistent with PEP 8.
- id = Column(UUID, primary_key = True, default = uuid4) + id = Column(UUID, primary_key=True, default=uuid4)
15-16
: Simplify timestamp implementations and fix spacing.The timestamp columns can be simplified by removing the lambda functions, and the spacing should be consistent with PEP 8.
- created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc)) - updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc)) + created_at = Column(DateTime(timezone=True), default=datetime.now(timezone.utc)) + updated_at = Column(DateTime(timezone=True), onupdate=datetime.now(timezone.utc))cognee/modules/search/operations/get_queries.py (1)
6-17
: Consider performance optimizations for scalability.
- Consider implementing cursor-based pagination for better performance with large datasets.
- Select only necessary fields instead of retrieving all columns, especially if the Query model contains large fields.
Example optimization:
select(Query.id, Query.text, Query.created_at) # Select specific fieldsAlso, consider adding an index on (user_id, created_at) to optimize the query performance.
cognee/modules/search/operations/get_results.py (1)
9-15
: Consider adding pagination support.While the
limit
parameter helps control the result set size, implementing proper pagination would be beneficial for handling large datasets efficiently.Consider adding
offset
orcursor
-based pagination to allow clients to fetch results in chunks. This would improve performance and resource utilization when dealing with large result sets.cognee/api/v1/search/search_v2.py (1)
44-44
: Keep parameter naming consistent across functions.For consistency with the main
search
function, consider renaming thequery
parameter toquery_text
.-async def specific_search(query_type: SearchType, query: str, user) -> list: +async def specific_search(query_type: SearchType, query_text: str, user) -> list:cognee/tests/test_library.py (1)
38-43
: Enhance test assertions for INSIGHTS search.While the basic non-empty check is good, consider adding more specific assertions:
- Validate the structure and content of search results
- Remove or guard print statements behind a debug flag
Example improvement:
search_results = await cognee.search(SearchType.INSIGHTS, query_text = random_node_name) - assert len(search_results) != 0, "The search results list is empty." - print("\n\nExtracted sentences are:\n") - for result in search_results: - print(f"{result}\n") + assert len(search_results) > 0, "The search results list is empty." + for result in search_results: + assert isinstance(result, str), "Search result should be a string" + assert len(result.strip()) > 0, "Search result should not be empty" + if logging.getLogger().isEnabledFor(logging.DEBUG): + print(f"Found insight: {result}")cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
Line range hint
1-92
: Consider architectural improvements for better maintainabilityThe component currently handles multiple responsibilities. Consider these improvements:
- Extract message management logic into a custom hook (e.g.,
useMessages
)- Move API calls to a separate service layer
- Implement request debouncing/throttling for search submissions
- Add retry mechanism for failed API calls
Example structure:
// hooks/useMessages.ts export function useMessages() { const [messages, setMessages] = useState<Message[]>([]); const addMessage = useCallback((message: Message) => { setMessages(prev => [...prev, message]); }, []); return { messages, addMessage }; } // services/searchService.ts export class SearchService { static async search(query: string, type: string) { return fetch('/v1/search', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ query, searchType: type }), }); } }cognee/tests/test_neo4j.py (1)
54-58
: Fix typo in print statement.There's a typo in the print statement:
"\n\Extracted"
should be"\n\nExtracted"
for consistency with other print statements.- print("\n\Extracted summaries are:\n") + print("\n\nExtracted summaries are:\n")cognee/tests/test_weaviate.py (1)
41-41
: LGTM! Consider enhancing test assertions.The parameter rename from
query
toquery_text
improves clarity. While the basic non-empty assertions are good, consider adding more specific assertions to verify:
- Result content/format matches expected search type
- Result relevance to the search query
- Result ordering/scoring if applicable
Example enhancement for INSIGHTS search:
def test_insight_result_format(result): assert isinstance(result, str), "Insight should be a string" assert len(result.split()) >= 5, "Insight should be a complete sentence" assert random_node_name.lower() in result.lower(), "Result should contain search term" for result in search_results: test_insight_result_format(result)Also applies to: 47-47, 53-53
cognee/tests/test_pgvector.py (4)
71-71
: Fix spacing around the assignment operator.The parameter rename from
query
toquery_text
improves clarity, but the spacing around=
should be fixed to comply with PEP 8.- search_results = await cognee.search(SearchType.INSIGHTS, query_text = random_node_name) + search_results = await cognee.search(SearchType.INSIGHTS, query_text=random_node_name)
77-77
: Fix spacing around the assignment operator.The parameter rename improves clarity, but the spacing needs to be fixed.
- search_results = await cognee.search(SearchType.CHUNKS, query_text = random_node_name) + search_results = await cognee.search(SearchType.CHUNKS, query_text=random_node_name)
83-83
: Fix spacing around the assignment operator.The parameter rename improves clarity, but the spacing needs to be fixed.
- search_results = await cognee.search(SearchType.SUMMARIES, query_text = random_node_name) + search_results = await cognee.search(SearchType.SUMMARIES, query_text=random_node_name)
Line range hint
10-91
: Consider restructuring tests using pytest fixtures and separate test functions.The current structure combines multiple test scenarios into a single function, making it harder to maintain and debug. Consider breaking this into separate test functions using pytest fixtures for common setup.
Example structure:
import pytest @pytest.fixture async def setup_cognee(): # Current setup code (config, data directory, etc.) await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True) # Add test data yield # Cleanup if needed @pytest.mark.asyncio async def test_insights_search(setup_cognee): # Test INSIGHTS search @pytest.mark.asyncio async def test_chunks_search(setup_cognee): # Test CHUNKS search @pytest.mark.asyncio async def test_summaries_search(setup_cognee): # Test SUMMARIES search @pytest.mark.asyncio async def test_search_history(setup_cognee): # Test search historycognee/api/client.py (1)
Line range hint
326-353
: Add logging to the search endpoint.To fulfill the PR objective of logging search queries and results, the search endpoint should also include logging.
Here's the suggested implementation:
@app.post("/api/v1/search", response_model = list) async def search(payload: SearchPayloadDTO, user: User = Depends(get_authenticated_user)): """ This endpoint is responsible for searching for nodes in the graph.""" from cognee.api.v1.search import search as cognee_search + logger.info(f"Processing search request - Type: {payload.search_type}, Query: {payload.query}, User: {user.id}") try: results = await cognee_search(payload.search_type, payload.query, user) + logger.info(f"Search completed - Found {len(results)} results") return results except Exception as error: + logger.error(f"Search failed - Error: {str(error)}") return JSONResponse( status_code = 409, content = {"error": str(error)} )🧰 Tools
🪛 Ruff
372-372: Module level import not at top of file
(E402)
notebooks/cognee_demo.ipynb (1)
Line range hint
803-848
: Add example for viewing search historySince this PR introduces search logging functionality, it would be valuable to include an example demonstrating how to retrieve and display the search history.
Consider adding a new cell with something like:
# Demonstrate how to access search history history = await cognee.getHistory() print("\nSearch History:\n") for entry in history: print(f"Query: {entry.query}") print(f"Type: {entry.search_type}") print(f"Timestamp: {entry.timestamp}\n")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (21)
- cognee-frontend/src/modules/chat/getHistory.ts (1 hunks)
- cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (3 hunks)
- cognee/init.py (1 hunks)
- cognee/api/client.py (2 hunks)
- cognee/api/v1/search/init.py (1 hunks)
- cognee/api/v1/search/get_search_history.py (1 hunks)
- cognee/api/v1/search/search_v2.py (3 hunks)
- cognee/modules/search/models/Query.py (1 hunks)
- cognee/modules/search/models/Result.py (1 hunks)
- cognee/modules/search/operations/init.py (1 hunks)
- cognee/modules/search/operations/get_history.py (1 hunks)
- cognee/modules/search/operations/get_queries.py (1 hunks)
- cognee/modules/search/operations/get_results.py (1 hunks)
- cognee/modules/search/operations/log_query.py (1 hunks)
- cognee/modules/search/operations/log_result.py (1 hunks)
- cognee/tests/test_library.py (1 hunks)
- cognee/tests/test_neo4j.py (1 hunks)
- cognee/tests/test_pgvector.py (1 hunks)
- cognee/tests/test_qdrant.py (1 hunks)
- cognee/tests/test_weaviate.py (1 hunks)
- notebooks/cognee_demo.ipynb (3 hunks)
🔥 Files not summarized due to errors (1)
- notebooks/cognee_demo.ipynb: Error: Server error: no LLM provider could handle the message
✅ Files skipped from review due to trivial changes (1)
- cognee/modules/search/operations/init.py
🔇 Additional comments (26)
cognee/api/v1/search/__init__.py (1)
2-2
: LGTM! Clean import that follows best practices.The import statement is well-organized and makes the new search history functionality available at the package level, which aligns with the PR's objective of logging search queries and results.
cognee-frontend/src/modules/chat/getHistory.ts (1)
4-6
: Verify authentication requirements.The endpoint might require authentication. Let's verify if authentication headers are needed.
✅ Verification successful
Authentication headers are automatically added by the fetch utility
The custom
fetch
utility inutils/fetch.ts
automatically injects the Authorization header with a Bearer token for all API calls. No explicit authentication headers are needed in thegetHistory.ts
implementation as they are handled by the utility wrapper.
- The fetch utility adds
Authorization: Bearer ${localStorage.getItem('access_token')}
to all requests- All API calls are prefixed with
http://127.0.0.1:8000/api
base URL- Server errors are automatically handled via
handleServerErrors
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if other API calls in the codebase use authentication headers rg -l "fetch.*headers.*Authorization" --type tsLength of output: 49
Script:
#!/bin/bash # Let's try a broader search for fetch calls with headers rg "fetch.*headers" --type ts -A 3 # Also search for any authentication-related code rg "Authorization.*Bearer" --type ts # Check if there's any auth middleware or interceptors fd -e ts -e js authLength of output: 459
Script:
#!/bin/bash # Let's check the fetch utility implementation since we found it cat cognee-frontend/src/utils/fetch.ts # Also check if the search endpoint is used elsewhere with headers rg "/v1/search" --type ts -A 3Length of output: 1050
cognee/__init__.py (1)
5-5
: LGTM! Clean addition of the search history function import.The import is well-organized, grouped logically with related search functionality, and follows the existing import patterns.
Let's verify that the imported function exists in the source module:
✅ Verification successful
Import verified and correctly implemented
The
get_search_history
function is properly defined incognee/api/v1/search/get_search_history.py
and correctly exported throughcognee/api/v1/search/__init__.py
. The function is also actively used across multiple test files, confirming its integration into the codebase.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the existence of get_search_history function in the search module # Expected: The function should be defined in the search module ast-grep --pattern 'def get_search_history'Length of output: 45
Script:
#!/bin/bash # Let's try a different approach to locate the function definition # First, let's find the search.py file fd "search.py" --type f # Then let's examine its content to find the function rg "get_search_history" -A 3Length of output: 2032
cognee/api/v1/search/get_search_history.py (1)
1-3
: LGTM! Well-organized imports.The imports are properly structured and all are utilized in the implementation.
cognee/modules/search/operations/log_result.py (1)
1-3
: LGTM! Clean and well-organized imports.The imports are properly structured with standard library imports first, followed by internal infrastructure and models.
cognee/modules/search/operations/log_query.py (2)
1-3
: LGTM! Imports are well-organized.The imports follow the standard pattern of stdlib first, followed by local imports, and all imports are utilized in the code.
5-19
: Verify database index for query lookups.Ensure that appropriate indexes exist for efficient query retrieval, especially if you plan to analyze search patterns or implement search history features.
Let's check if indexes exist for the Query table:
cognee/modules/search/models/Result.py (1)
1-4
: LGTM! All necessary imports are present.cognee/modules/search/models/Query.py (2)
1-4
: LGTM! Imports are well-organized and specific.The imports are correctly structured and include all necessary dependencies for the model.
1-16
: Verify no duplicate query logging implementations exist.Let's ensure this new query logging implementation doesn't duplicate existing functionality.
✅ Verification successful
Let me check the content of the query logging related files to understand their implementation.
Let me check one final thing to see if there are any other tables or models that might be storing queries in a different way.
Based on the comprehensive analysis of the codebase, I can now provide the final response:
No duplicate query logging implementations found - implementation is well organized
The current Query model works together with the Result model in a clean architecture where:
Query
model stores the search queriesResult
model stores the corresponding resultslog_query.py
andlog_result.py
provide the logging operationsThe implementation is properly isolated in the search module without any duplicate functionality elsewhere in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other potential query logging implementations echo "Searching for other query-related models..." ast-grep --pattern 'class $_(Base): $$$ query = Column($$$) $$$' echo "Searching for query logging related code..." rg -l 'log.*query|query.*log' --type pyLength of output: 656
Script:
#!/bin/bash # Check the implementation of query logging operations echo "Content of log_query.py:" cat cognee/modules/search/operations/log_query.py echo -e "\nContent of log_result.py:" cat cognee/modules/search/operations/log_result.py # Search for any other Query model definitions echo -e "\nSearching for Query class definitions:" rg "class Query" -A 5Length of output: 1697
Script:
#!/bin/bash # Look for any other database tables or models that might store queries echo "Searching for table definitions:" rg "__tablename__.*=.*" -B2 -A2 # Look for any other query-related columns in models echo -e "\nSearching for query-related columns:" ast-grep --pattern "Column($$$query$$$)"Length of output: 7986
cognee/modules/search/operations/get_queries.py (1)
1-4
: LGTM! Imports are well-organized.The imports follow a logical order and include all necessary dependencies.
cognee/modules/search/operations/get_results.py (1)
1-4
: LGTM: Imports are well organized.The imports follow a good organization pattern (stdlib -> third-party -> local) and include all necessary dependencies.
cognee/modules/search/operations/get_history.py (3)
1-6
: LGTM! Imports are well-organized and complete.The imports cover all necessary dependencies for database operations and model access.
10-16
: LGTM! Query construction is secure and well-structured.The query properly selects necessary fields and includes source identification through the "user" label.
18-24
: LGTM! Results query matches queries structure for proper union.The field mapping and filtering are correctly implemented for union compatibility.
cognee/api/v1/search/search_v2.py (2)
19-19
: LGTM! Parameter names are more descriptive now.The renaming of parameters from
search_type
toquery_type
andquery
toquery_text
improves code clarity by better reflecting their purpose.
40-40
: Verify JSON serialization of search results.The
json.dumps(filtered_search_results)
call might fail if the results contain non-serializable objects (e.g., datetime objects, custom classes).Let's check the search results structure:
✅ Verification successful
JSON serialization of search results is safe
Based on the code analysis:
- The
filtered_search_results
is constructed fromsearch_results
which only contains simple dictionary objects with basic types:
document_id
is converted to UUID or None- Other fields are accessed through dictionary access
- The query functions return simple data structures:
query_summaries
returns a list of payloadsquery_chunks
returns a list of dictionariesquery_graph_connections
returns a list of string tuples- No complex objects (like datetime) are included in the search results structure
The JSON serialization is safe as the data structure contains only basic types that are JSON-serializable.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for potential non-serializable types in search results # Look for return type hints or docstrings in the query functions echo "Checking query functions return types..." rg -A 5 "def query_(summaries|graph_connections|chunks)" # Look for actual result construction echo "Checking result construction in query functions..." rg "return.*{.*}" --type pyLength of output: 1728
Script:
#!/bin/bash # Let's check the actual construction of filtered_search_results and the search results # Check the filtered_search_results construction echo "Checking filtered_search_results construction..." rg -B 5 -A 2 "filtered_search_results" # Check the actual return values from query functions echo "Checking return values from query functions..." rg -B 3 "return" cognee/tasks/summarization/query_summaries.py cognee/tasks/graph/query_graph_connections.py cognee/tasks/chunking/query_chunks.py # Check if there are any datetime or complex object usages echo "Checking for datetime and complex object usage..." rg "datetime|date|time" --type pyLength of output: 24297
cognee/tests/test_library.py (1)
38-38
: Verify consistent parameter naming across codebase.The parameter rename from
query
toquery_text
should be consistently applied across all search function calls in the codebase.Also applies to: 44-44, 50-50
✅ Verification successful
Let me verify the implementation details of the search function to ensure the parameter naming is consistent.
Let me try a different approach to check the search function implementation.
Parameter naming is consistent across the codebase
The search implementation in
cognee/api/v1/search/search_v2.py
and all test files correctly usequery_text
parameter. The oldquery
parameter insearch.legacy.py
is part of the legacy implementation and doesn't affect the current codebase. All vector database adapters and search modules consistently usequery_text
when needed.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining instances of the old parameter name echo "Checking for old parameter usage..." rg -l 'cognee\.search\([^)]*query\s*=' --type py echo "Verifying new parameter usage..." rg -l 'cognee\.search\([^)]*query_text\s*=' --type pyLength of output: 406
Script:
#!/bin/bash # Check the search function implementation and its parameter name ast-grep --pattern 'async def search($$$) {$$$}'Length of output: 50
Script:
#!/bin/bash # Search for the search function definition and its usage echo "Checking search function implementation..." rg -p "async def search" --type py -A 5 echo "Checking all search function calls..." rg "\.search\(" --type py -A 1Length of output: 10831
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
9-9
: LGTM: Clean import additionThe new import for getHistory is properly structured and follows the project's module organization.
cognee/tests/test_qdrant.py (2)
43-43
: LGTM! Parameter renaming improves API clarity.The consistent renaming of
query
toquery_text
across all search types makes the API more explicit about the expected input type.Also applies to: 49-49, 55-55
61-63
:⚠️ Potential issueImprove test robustness and documentation for search history assertion.
The assertion uses a magic number (6) without explaining why this exact number is expected. This makes the test brittle and harder to maintain.
Consider these improvements:
history = await cognee.get_search_history() - assert len(history) == 6, "Search history is not correct." + # We expect 2 entries per search operation (query + results) + expected_history_length = 2 * 3 # 3 search operations + assert len(history) == expected_history_length, f"Expected {expected_history_length} history entries from 3 search operations, but got {len(history)}"Additionally, consider adding test cases to verify:
- The content/structure of history entries
- How history behaves with failed searches
- History entry order (newest first/last)
Let's check if other test files have similar assertions:
cognee/tests/test_neo4j.py (3)
42-46
: LGTM! Parameter rename improves clarity.The change from
query
toquery_text
makes the parameter's purpose more explicit.
48-52
: LGTM! Consistent with previous parameter rename.
60-62
: 🛠️ Refactor suggestionImprove search history verification robustness.
The current test has several potential issues:
- Using a magic number (6) makes the test brittle and unclear
- Only verifying the count without checking the actual history content
- No cleanup of search history between test runs
Let's verify the search history implementation:
Consider improving the test:
history = await cognee.get_search_history() - assert len(history) == 6, "Search history is not correct." + # Verify history length matches the number of searches performed + expected_entries = 3 # Number of search calls in this test + assert len(history) == expected_entries, f"Expected {expected_entries} history entries, got {len(history)}" + + # Verify history content + assert any(entry["query"] == random_node_name for entry in history), "Search query not found in history" + assert all(entry["search_type"] in [SearchType.INSIGHTS, SearchType.CHUNKS, SearchType.SUMMARIES] + for entry in history), "Invalid search type in history"cognee/tests/test_weaviate.py (1)
59-59
: 🛠️ Refactor suggestionVerify search history entry structure.
Since this PR's objective is to log search queries and results, we should verify that each history entry contains all required logging fields.
Consider adding structure verification:
def verify_history_entry(entry): required_fields = {"type", "query", "timestamp", "results"} assert all(field in entry for field in required_fields), \ f"History entry missing required fields. Expected {required_fields}, got {entry.keys()}" assert isinstance(entry["timestamp"], (int, float)), "Timestamp should be numeric" assert isinstance(entry["results"], list), "Results should be a list" for entry in history: verify_history_entry(entry)cognee/api/client.py (1)
18-18
: LGTM!The import statement is correctly placed and follows Python naming conventions.
return fetch( | ||
'/v1/search', | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding query parameters for pagination and filtering.
Search history could grow large. Consider implementing pagination and filtering capabilities.
Here's a suggested implementation:
+interface HistoryParams {
+ page?: number;
+ limit?: number;
+ startDate?: string;
+ endDate?: string;
+}
+
-export default function getHistory() {
+export default function getHistory(params?: HistoryParams) {
+ const queryParams = new URLSearchParams();
+ if (params?.page) queryParams.append('page', params.page.toString());
+ if (params?.limit) queryParams.append('limit', params.limit.toString());
+ if (params?.startDate) queryParams.append('startDate', params.startDate);
+ if (params?.endDate) queryParams.append('endDate', params.endDate);
+
return fetch(
- '/v1/search',
+ `/v1/search${queryParams.toString() ? `?${queryParams.toString()}` : ''}`,
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return fetch( | |
'/v1/search', | |
) | |
interface HistoryParams { | |
page?: number; | |
limit?: number; | |
startDate?: string; | |
endDate?: string; | |
} | |
export default function getHistory(params?: HistoryParams) { | |
const queryParams = new URLSearchParams(); | |
if (params?.page) queryParams.append('page', params.page.toString()); | |
if (params?.limit) queryParams.append('limit', params.limit.toString()); | |
if (params?.startDate) queryParams.append('startDate', params.startDate); | |
if (params?.endDate) queryParams.append('endDate', params.endDate); | |
return fetch( | |
`/v1/search${queryParams.toString() ? `?${queryParams.toString()}` : ''}`, | |
) |
async def log_result(query_id: UUID, result: str, user_id: UUID): | ||
db_engine = get_relational_engine() | ||
|
||
async with db_engine.get_async_session() as session: | ||
session.add(Result( | ||
value = result, | ||
query_id = query_id, | ||
user_id = user_id, | ||
)) | ||
|
||
await session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling and input validation.
While the basic implementation works, several important aspects need to be addressed:
- Add error handling for database operations
- Validate the result string (e.g., check for empty strings, max length)
- Add proper transaction handling with rollback support
- Include logging for operation status and errors
Here's a suggested implementation:
from uuid import UUID
+from typing import Optional
+import logging
from cognee.infrastructure.databases.relational import get_relational_engine
from ..models.Result import Result
+from sqlalchemy.exc import SQLAlchemyError
+logger = logging.getLogger(__name__)
+
+async def log_result(query_id: UUID, result: str, user_id: UUID) -> Optional[UUID]:
+ if not result or not result.strip():
+ raise ValueError("Result cannot be empty")
+
+ if len(result) > 10000: # adjust max length as needed
+ raise ValueError("Result exceeds maximum length")
-async def log_result(query_id: UUID, result: str, user_id: UUID):
db_engine = get_relational_engine()
async with db_engine.get_async_session() as session:
- session.add(Result(
- value = result,
- query_id = query_id,
- user_id = user_id,
- ))
+ async with session.begin():
+ try:
+ new_result = Result(
+ value=result.strip(),
+ query_id=query_id,
+ user_id=user_id,
+ )
+ session.add(new_result)
+ await session.commit()
+ logger.info(f"Successfully logged result for query {query_id}")
+ return new_result.id
+ except SQLAlchemyError as e:
+ await session.rollback()
+ logger.error(f"Failed to log result for query {query_id}: {str(e)}")
+ raise
The changes include:
- Input validation for the result string
- Proper error handling with rollback support
- Logging for operation tracking
- Return value for operation confirmation
- Transaction management using
session.begin()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def log_result(query_id: UUID, result: str, user_id: UUID): | |
db_engine = get_relational_engine() | |
async with db_engine.get_async_session() as session: | |
session.add(Result( | |
value = result, | |
query_id = query_id, | |
user_id = user_id, | |
)) | |
await session.commit() | |
from uuid import UUID | |
from typing import Optional | |
import logging | |
from cognee.infrastructure.databases.relational import get_relational_engine | |
from ..models.Result import Result | |
from sqlalchemy.exc import SQLAlchemyError | |
logger = logging.getLogger(__name__) | |
async def log_result(query_id: UUID, result: str, user_id: UUID) -> Optional[UUID]: | |
if not result or not result.strip(): | |
raise ValueError("Result cannot be empty") | |
if len(result) > 10000: # adjust max length as needed | |
raise ValueError("Result exceeds maximum length") | |
db_engine = get_relational_engine() | |
async with db_engine.get_async_session() as session: | |
async with session.begin(): | |
try: | |
new_result = Result( | |
value=result.strip(), | |
query_id=query_id, | |
user_id=user_id, | |
) | |
session.add(new_result) | |
await session.commit() | |
logger.info(f"Successfully logged result for query {query_id}") | |
return new_result.id | |
except SQLAlchemyError as e: | |
await session.rollback() | |
logger.error(f"Failed to log result for query {query_id}: {str(e)}") | |
raise |
history = await cognee.get_search_history() | ||
|
||
assert len(history) == 6, "Search history is not correct." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve the search history verification test.
The current test has several potential issues:
- The magic number
6
is not explained (3 searches × 2?) - The assertion message could be more descriptive
- The exact count check might be brittle
Consider making the test more robust:
history = await cognee.get_search_history()
- assert len(history) == 6, "Search history is not correct."
+ # We expect history entries for INSIGHTS, CHUNKS, and SUMMARIES searches
+ expected_searches = 3
+ assert len(history) >= expected_searches, f"Search history should contain at least {expected_searches} entries (one per search type), but found {len(history)}"
+
+ # Verify that all search types are present in the history
+ search_types = {entry['type'] for entry in history}
+ assert search_types.issuperset({SearchType.INSIGHTS, SearchType.CHUNKS, SearchType.SUMMARIES}), "Not all search types were recorded in history"
Committable suggestion was skipped due to low confidence.
cognee/api/client.py
Outdated
@app.get("/api/v1/search", response_model = list[SearchHistoryItem]) | ||
async def get_search_history(user: User = Depends(get_authenticated_user)): | ||
try: | ||
history = await get_history(user.id) | ||
|
||
return history | ||
except Exception as error: | ||
return JSONResponse( | ||
status_code = 500, | ||
content = {"error": str(error)} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance error handling and add logging.
Several improvements could be made to align with the codebase patterns and PR objectives:
- Error response structure should match other endpoints (e.g., see
/api/v1/datasets
). - Add logging to track API usage and errors.
- Consider using more specific error handling.
Here's the suggested implementation:
@app.get("/api/v1/search", response_model = list[SearchHistoryItem])
async def get_search_history(user: User = Depends(get_authenticated_user)):
+ logger.info(f"Retrieving search history for user: {user.id}")
try:
history = await get_history(user.id)
+ logger.info(f"Retrieved {len(history)} search history items")
return history
except Exception as error:
+ logger.error(f"Error retrieving search history: {str(error)}")
- return JSONResponse(
- status_code = 500,
- content = {"error": str(error)}
- )
+ raise HTTPException(
+ status_code = 500,
+ detail = f"Error retrieving search history: {str(error)}"
+ ) from error
Committable suggestion was skipped due to low confidence.
notebooks/cognee_demo.ipynb
Outdated
"search_results = await cognee.search(SearchType.INSIGHTS, query_text = node_name)\n", | ||
"print(\"\\n\\nExtracted sentences are:\\n\")\n", | ||
"for result in search_results:\n", | ||
" print(f\"{result}\\n\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent terminology in print message
The print message says "Extracted sentences" while the search type is INSIGHTS. This could be confusing for users following the demo.
- "print(\"\\n\\nExtracted sentences are:\\n\")\n",
+ "print(\"\\n\\nExtracted insights are:\\n\")\n",
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"search_results = await cognee.search(SearchType.INSIGHTS, query_text = node_name)\n", | |
"print(\"\\n\\nExtracted sentences are:\\n\")\n", | |
"for result in search_results:\n", | |
" print(f\"{result}\\n\")" | |
"search_results = await cognee.search(SearchType.INSIGHTS, query_text = node_name)\n", | |
"print(\"\\n\\nExtracted insights are:\\n\")\n", | |
"for result in search_results:\n", | |
" print(f\"{result}\\n\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
cognee/api/v1/search/routers/get_search_router.py (4)
1-8
: Consider reorganizing imports for better readabilityConsider organizing imports into standard library, third-party, and local application imports with a blank line between each group:
# Standard library from uuid import UUID from datetime import datetime # Third-party from fastapi import Depends, APIRouter from fastapi.responses import JSONResponse # Local application from cognee.api.v1.search import SearchType from cognee.api.DTO import InDTO, OutDTO from cognee.modules.users.models import User from cognee.modules.search.operations import get_history from cognee.modules.users.methods import get_authenticated_user
19-24
: Add field constraints and documentationConsider adding:
- Field constraints (e.g., max length for text)
- Class docstring describing the purpose
- Field descriptions using Field(..., description="...")
class SearchHistoryItem(OutDTO): + """Data transfer object representing a search history item.""" id: UUID - text: str + text: str = Field(..., max_length=1000, description="The search query text") user: str created_at: datetime
Line range hint
37-51
: Standardize error handling across endpointsThe search endpoint uses status code 409 while the history endpoint uses 500. Consider standardizing error handling:
- Use consistent status codes for similar error types
- Implement a common error response structure
from enum import Enum from pydantic import BaseModel class ErrorCode(Enum): SEARCH_ERROR = "search_error" HISTORY_ERROR = "history_error" class ErrorResponse(BaseModel): code: ErrorCode message: str details: dict | None = None
Line range hint
25-51
: Add rate limiting to protect endpointsBoth search and history endpoints could benefit from rate limiting to prevent abuse. Consider implementing rate limiting using FastAPI's dependencies:
from fastapi import Request from fastapi.middleware.throttling import ThrottlingMiddleware async def rate_limit(request: Request): # Implement rate limiting logic pass @router.get("/", dependencies=[Depends(rate_limit)]) async def get_search_history(...): ... @router.post("/", dependencies=[Depends(rate_limit)]) async def search(...): ...notebooks/cognee_demo.ipynb (2)
794-794
: Consider implementing search result cachingSince the same
node_name
is being used across multiple search types, implementing a caching mechanism could improve performance and reduce redundant searches.Example implementation:
from functools import lru_cache from typing import List, Any class SearchCache: @lru_cache(maxsize=1000) async def get_search_results(self, search_type: SearchType, query: str) -> List[Any]: return await self._perform_search(search_type, query)Also applies to: 815-815, 836-836
794-794
: Add structured logging with metadataTo make search analytics more valuable, consider adding structured logging with additional metadata such as timestamp, user context, and search performance metrics.
Example log structure:
{ "timestamp": "2024-11-20T10:00:00Z", "search_type": "SUMMARIES", "query": "node_name", "results_count": len(results), "execution_time_ms": execution_time, "user_id": user_context.id }Also applies to: 815-815, 836-836
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
cognee/api/v1/search/routers/get_search_router.py
(3 hunks)cognee/api/v1/search/search_v2.py
(3 hunks)cognee/infrastructure/pipeline/models/Operation.py
(2 hunks)cognee/modules/search/operations/get_queries.py
(1 hunks)cognee/modules/search/operations/get_results.py
(1 hunks)cognee/tests/test_library.py
(1 hunks)cognee/tests/test_neo4j.py
(1 hunks)cognee/tests/test_pgvector.py
(1 hunks)cognee/tests/test_qdrant.py
(1 hunks)cognee/tests/test_weaviate.py
(1 hunks)notebooks/cognee_demo.ipynb
(3 hunks)
🔥 Files not summarized due to errors (1)
- notebooks/cognee_demo.ipynb: Error: Server error: no LLM provider could handle the message
🚧 Files skipped from review as they are similar to previous changes (8)
- cognee/api/v1/search/search_v2.py
- cognee/modules/search/operations/get_queries.py
- cognee/modules/search/operations/get_results.py
- cognee/tests/test_library.py
- cognee/tests/test_neo4j.py
- cognee/tests/test_pgvector.py
- cognee/tests/test_qdrant.py
- cognee/tests/test_weaviate.py
🔇 Additional comments (3)
cognee/infrastructure/pipeline/models/Operation.py (2)
1-1
: LGTM! Good practice adding explicit timezone import.
The addition of the timezone import supports proper timezone handling in the datetime operations.
27-27
: Verify handling of timezone-aware timestamps in dependent code.
The change from datetime.utcnow
to datetime.now(timezone.utc)
is a good improvement as it provides timezone-aware timestamps. However, this change requires verification of dependent code that might assume naive datetimes.
Let's verify the usage of this timestamp in queries and dependent code:
notebooks/cognee_demo.ipynb (1)
794-794
: Verify error handling for search operations
The search operations should include proper error handling to ensure failed searches are also logged appropriately.
@router.get("/", response_model = list[SearchHistoryItem]) | ||
async def get_search_history(user: User = Depends(get_authenticated_user)): | ||
try: | ||
history = await get_history(user.id) | ||
|
||
return history | ||
except Exception as error: | ||
return JSONResponse( | ||
status_code = 500, | ||
content = {"error": str(error)} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling and add pagination
- The generic exception handling could mask specific errors. Consider handling specific exceptions:
try:
history = await get_history(user.id)
return history
- except Exception as error:
+ except ValueError as error:
return JSONResponse(
status_code = 500,
content = {"error": str(error)}
)
+ except Exception as error:
+ # Log the unexpected error
+ return JSONResponse(
+ status_code = 500,
+ content = {"error": "Internal server error"}
+ )
- Consider adding pagination to handle large result sets:
@router.get("/", response_model=Page[SearchHistoryItem])
async def get_search_history(
user: User = Depends(get_authenticated_user),
page: int = Query(1, ge=1),
size: int = Query(20, ge=1, le=100)
):
@@ -791,7 +791,7 @@ | |||
"node = (await vector_engine.search(\"Entity_name\", \"[email protected]\"))[0]\n", | |||
"node_name = node.payload[\"text\"]\n", | |||
"\n", | |||
"search_results = await cognee.search(SearchType.SUMMARIES, query = node_name)\n", | |||
"search_results = await cognee.search(SearchType.SUMMARIES, query_text = node_name)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider implementing search result logging middleware
Given that this PR aims to log search queries and results, and there are multiple search types (SUMMARIES, CHUNKS, INSIGHTS), consider implementing a middleware or decorator to handle logging consistently across all search operations.
Example implementation:
from functools import wraps
from typing import Any, Callable
import logging
def log_search(func: Callable) -> Callable:
@wraps(func)
async def wrapper(self, search_type: SearchType, query_text: str, *args, **kwargs) -> Any:
try:
results = await func(self, search_type, query_text, *args, **kwargs)
await self._log_search_operation(
search_type=search_type,
query=query_text,
results=results,
status="success"
)
return results
except Exception as e:
await self._log_search_operation(
search_type=search_type,
query=query_text,
error=str(e),
status="error"
)
raise
return wrapper
# Usage:
# @log_search
# async def search(self, search_type: SearchType, query_text: str) -> List[Any]:
Also applies to: 815-815, 836-836
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
cognee/modules/search/models/Result.py (1)
15-16
: Fix PEP 8 spacing while maintaining correct timezone handling.The timestamp fields correctly use timezone-aware DateTime with UTC, which is great! However, there are spacing issues to fix:
- created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc)) - updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc)) + created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)) + updated_at = Column(DateTime(timezone=True), onupdate=lambda: datetime.now(timezone.utc))cognee/modules/users/models/User.py (1)
Line range hint
25-34
: Consider enhancing schema documentationThe comment "Keep these schemas in sync with User model" could be more specific about what needs to be synchronized. Consider documenting:
- Which fields from the User model should be reflected in each schema
- When synchronization is required (e.g., after adding new User fields)
-# Keep these schemas in sync with User model +# These Pydantic schemas must be updated when adding/modifying fields in the User model: +# - UserRead: Fields exposed in API responses +# - UserCreate: Fields required when creating a new user +# - UserUpdate: Fields that can be modified after user creationcognee/modules/users/models/ACL.py (1)
Line range hint
23-28
: Consider updating other relationship type hints for consistency.Good update using
Mapped
type hint for the resources relationship. For consistency, consider updating the other relationships (principal
andpermission
) to also use theMapped
type hint:- principal = relationship("Principal") - permission = relationship("Permission") + principal: Mapped["Principal"] = relationship("Principal") + permission: Mapped["Permission"] = relationship("Permission")This follows SQLAlchemy 2.0 style guidelines and improves type safety.
cognee/modules/data/models/Data.py (1)
Line range hint
34-34
: Consider cleaning up commented code in to_json method.The commented line for datasets serialization should either be implemented or removed to maintain code cleanliness.
- # "datasets": [dataset.to_json() for dataset in self.datasets]
If you need the datasets in the JSON representation, consider implementing it with proper lazy loading control:
"datasets": [dataset.to_json() for dataset in self.datasets] if self.datasets.loaded else None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (19)
cognee/api/v1/search/search_v2.py
(3 hunks)cognee/infrastructure/databases/relational/__init__.py
(0 hunks)cognee/infrastructure/databases/relational/data_types/UUID.py
(0 hunks)cognee/modules/data/models/Data.py
(1 hunks)cognee/modules/data/models/Dataset.py
(1 hunks)cognee/modules/data/models/DatasetData.py
(1 hunks)cognee/modules/pipelines/models/Pipeline.py
(1 hunks)cognee/modules/pipelines/models/PipelineRun.py
(1 hunks)cognee/modules/pipelines/models/PipelineTask.py
(1 hunks)cognee/modules/search/models/Query.py
(1 hunks)cognee/modules/search/models/Result.py
(1 hunks)cognee/modules/users/models/ACL.py
(1 hunks)cognee/modules/users/models/ACLResources.py
(1 hunks)cognee/modules/users/models/Group.py
(1 hunks)cognee/modules/users/models/Permission.py
(1 hunks)cognee/modules/users/models/Principal.py
(1 hunks)cognee/modules/users/models/Resource.py
(1 hunks)cognee/modules/users/models/User.py
(1 hunks)cognee/modules/users/models/UserGroup.py
(1 hunks)
💤 Files with no reviewable changes (2)
- cognee/infrastructure/databases/relational/init.py
- cognee/infrastructure/databases/relational/data_types/UUID.py
✅ Files skipped from review due to trivial changes (9)
- cognee/modules/data/models/DatasetData.py
- cognee/modules/pipelines/models/PipelineRun.py
- cognee/modules/pipelines/models/PipelineTask.py
- cognee/modules/users/models/ACLResources.py
- cognee/modules/users/models/Group.py
- cognee/modules/users/models/Permission.py
- cognee/modules/users/models/Principal.py
- cognee/modules/users/models/Resource.py
- cognee/modules/users/models/UserGroup.py
🚧 Files skipped from review as they are similar to previous changes (2)
- cognee/api/v1/search/search_v2.py
- cognee/modules/search/models/Query.py
🔇 Additional comments (11)
cognee/modules/search/models/Result.py (3)
1-4
: LGTM! Imports are well-organized and complete.
6-7
: LGTM! Class definition follows SQLAlchemy conventions.
9-9
: Fix PEP 8 spacing violations.
Remove spaces around the =
operator in keyword arguments.
- id = Column(UUID, primary_key = True, default = uuid4)
+ id = Column(UUID, primary_key=True, default=uuid4)
cognee/modules/pipelines/models/Pipeline.py (3)
3-5
: LGTM! Good improvement in import organization
The switch to using SQLAlchemy's UUID type directly instead of a custom implementation improves standardization and reduces infrastructure dependencies. The imports are also well-organized following the standard pattern (stdlib → third-party → local).
Line range hint 13-13
: LGTM! Well-structured primary key definition
The UUID primary key with automatic uuid4 generation is a good practice for distributed systems and data portability.
Line range hint 21-24
: Verify the bidirectional relationship configuration
The relationship with Task model is well-defined using modern SQLAlchemy 2.0 style with proper type hints. Let's verify the corresponding relationship in the Task model and the association table structure.
✅ Verification successful
Let me try a different search pattern to find the relationship configuration.
Based on the search results, I can now generate the final response as I have found all the necessary relationship configurations.
The bidirectional relationship is correctly configured
The relationship configuration is properly set up between Pipeline and Task models through the PipelineTask association table:
- Pipeline model has
tasks
relationship withback_populates="pipeline"
- Task model has
datasets
relationship withback_populates="task"
- PipelineTask association table has correct foreign keys:
pipeline_id = Column("pipeline", UUID, ForeignKey("pipeline.id"))
task_id = Column("task", UUID, ForeignKey("task.id"))
However, there's a minor inconsistency in the Task model where the relationship is named datasets
instead of pipelines
, which might be confusing but doesn't affect functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify relationship configuration across related models
# Check Task model for corresponding relationship
echo "Checking Task model for 'pipeline' relationship..."
ast-grep --pattern 'class Task($_):
$$$
pipeline = $$$relationship($$$)
$$$'
# Check PipelineTask association table structure
echo "Checking PipelineTask association table definition..."
ast-grep --pattern 'class PipelineTask($_):
$$$
pipeline_id = Column(UUID, ForeignKey($$$))
task_id = Column(UUID, ForeignKey($$$))
$$$'
Length of output: 465
Script:
#!/bin/bash
# Search for Task and PipelineTask class definitions with more relaxed patterns
echo "Searching for Task class definition..."
rg "class Task" -A 20
echo "\nSearching for PipelineTask class definition..."
rg "class PipelineTask" -A 20
# Also search for any relationship definitions involving Pipeline or Task
echo "\nSearching for relationship definitions..."
rg "relationship\(" -A 2 -B 2
Length of output: 8492
cognee/modules/users/models/User.py (3)
1-8
: LGTM! Clear separation of UUID types
The imports correctly distinguish between uuid.UUID
for schema type hints and sqlalchemy.UUID
for column definitions, following FastAPI+SQLAlchemy best practices.
Line range hint 10-22
: LGTM! Well-structured SQLAlchemy 2.0 relationship
The groups relationship is properly defined using:
- Type-safe
Mapped[list["Group"]]
annotation - Correct secondary table reference
- Bi-directional relationship with back_populates
Line range hint 10-22
: Verify search history relationship requirements
Given that this PR implements search query logging, should the User model include a relationship to track search history?
Let's check if there's a separate search history model that references users:
cognee/modules/users/models/ACL.py (1)
4-5
: LGTM! Good practice using direct imports from sqlalchemy.
The change to import UUID directly from sqlalchemy instead of through an intermediate module is a good practice as it reduces indirection and potential dependency issues.
cognee/modules/data/models/Data.py (1)
5-5
: LGTM! Verify database compatibility.
The change to import UUID directly from SQLAlchemy aligns with best practices and the project's standardization efforts.
Let's verify if this change affects any existing migrations or data:
✅ Verification successful
Import change is safe and consistent with codebase patterns
The change to import UUID directly from SQLAlchemy is verified to be safe because:
- The Data model uses UUID consistently as a primary key type (
id = Column(UUID, primary_key = True, default = uuid4)
) - Other models in the codebase consistently import and use UUID from SQLAlchemy for database columns (e.g., Result, Query, Pipeline models)
- The pattern of importing UUID from SQLAlchemy for column definitions and from uuid for type hints/operations is well established across the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any UUID-related migrations and verify column type consistency
# Look for existing migrations containing UUID definitions
rg -l "UUID" | grep -i "migration"
# Check for any UUID column definitions in models
ast-grep --pattern 'Column($_, UUID, $$$)'
Length of output: 347
Script:
#!/bin/bash
# Let's check the original UUID import and usage in the Data model
rg -A 5 "class Data" cognee/modules/data/models/Data.py
# Check for any other models using UUID to understand the pattern
rg "from.*UUID" --type py
Length of output: 5275
value = Column(Text) | ||
query_id = Column(UUID) | ||
user_id = Column(UUID, index = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding a foreign key constraint for query_id.
The query_id
field appears to reference another table but lacks a foreign key constraint. This could lead to data integrity issues.
- query_id = Column(UUID)
+ query_id = Column(UUID, ForeignKey("queries.id"), nullable=False)
Also, fix PEP 8 spacing in user_id
definition:
- user_id = Column(UUID, index = True)
+ user_id = Column(UUID, index=True)
Committable suggestion skipped: line range outside the PR's diff.
from datetime import datetime, timezone | ||
from uuid import uuid4 | ||
from sqlalchemy import Column, DateTime, Text, UUID | ||
from cognee.infrastructure.databases.relational import Base | ||
|
||
class Result(Base): | ||
__tablename__ = "results" | ||
|
||
id = Column(UUID, primary_key = True, default = uuid4) | ||
|
||
value = Column(Text) | ||
query_id = Column(UUID) | ||
user_id = Column(UUID, index = True) | ||
|
||
created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc)) | ||
updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Add a foreign key constraint for query_id referencing Query.id
The Query model exists at cognee/modules/search/models/Query.py
with a matching UUID primary key, but the Result model's query_id
column lacks a proper foreign key constraint. You should modify the query_id
column definition to:
query_id = Column(UUID, ForeignKey("queries.id"))
- Location:
cognee/modules/search/models/Result.py
, line 12
🔗 Analysis chain
Verify Query model exists for the foreign key relationship.
Let me gather more information about the Query model structure to verify the relationship properly.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify Query model exists and its structure
# Look for Query model definition
echo "Searching for Query model definition..."
rg -l "class Query.*Base" --type py
# Check Query model structure
echo "Checking Query model structure..."
ast-grep --pattern 'class Query {
$$$
id = Column($$$)
$$$
}'
Length of output: 311
Script:
#!/bin/bash
# Get the content of Query model to verify its structure
echo "Content of Query model:"
cat cognee/modules/search/models/Query.py
# Check if there are any existing relationships between Query and Result
echo -e "\nChecking for existing relationships:"
rg "relationship|ForeignKey.*results|ForeignKey.*queries" --type py
Length of output: 11320
from sqlalchemy import Column, Text, DateTime, UUID | ||
from cognee.infrastructure.databases.relational import Base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Standardize UUID imports across remaining model files
Two model files still use the local UUID import:
cognee/modules/pipelines/models/TaskRun.py
cognee/modules/pipelines/models/Task.py
These files should be updated to import UUID directly from SQLAlchemy for consistency with the rest of the codebase.
🔗 Analysis chain
LGTM! Good standardization of UUID type handling.
The change to import UUID directly from SQLAlchemy instead of a local module is a good practice as it:
- Uses SQLAlchemy's native type system
- Ensures consistent UUID handling across different database backends
- Reduces dependency on local implementations
Let's verify this change is consistently applied across all model files:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining local UUID imports in model files
# and verify all UUID column definitions are using SQLAlchemy's UUID type
# Check for any remaining imports from local UUID module
echo "Checking for remaining local UUID imports..."
rg "from cognee\.infrastructure\.databases\.relational import.*UUID" "cognee/modules"
# Verify UUID column definitions in model files
echo "Checking UUID column definitions..."
rg "Column\(UUID" "cognee/modules"
Length of output: 3343
4efdeda
to
09f27e8
Compare
bcf9965
to
e1a3241
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two questions on naming, but looks good to me overall!
@@ -14,15 +17,17 @@ class SearchType(Enum): | |||
INSIGHTS = "INSIGHTS" | |||
CHUNKS = "CHUNKS" | |||
|
|||
async def search(search_type: SearchType, query: str, user: User = None) -> list: | |||
async def search(query_type: SearchType, query_text: str, user: User = None) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe rename SearchType
to QueryType
?
from cognee.infrastructure.databases.relational import Base | ||
|
||
class Result(Base): | ||
__tablename__ = "results" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we rename results
to query_results
?
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
13947935 | Triggered | Generic High Entropy Secret | 0e14b0c | cognee/shared/utils.py | View secret |
9573981 | Triggered | Generic Password | 8b2f8c2 | notebooks/cognee_llama_index.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Summary by CodeRabbit
Release Notes
New Features
SearchView
component to automatically fetch chat history on mount.Bug Fixes
Tests