-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cog 539 implementing additional retriever approaches #262
Feature/cog 539 implementing additional retriever approaches #262
Conversation
WalkthroughThe changes introduce a new class named Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
…mplementing-additional-retriever-approaches
test: Update gh actions so they can run outside of PR to main
… into fix/graph-extraction-from-model
…mplementing-additional-retriever-approaches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
cognee/tasks/summarization/models.py (3)
17-17
: Add class documentationPlease add a docstring to document the purpose and usage of the
CodeSummary
class.class CodeSummary(DataPoint): + """Represents a summary of code content with indexable text. + + Attributes: + text (str): The summary text content + made_from (CodeFile): The source code file this summary is derived from + """ __tablename__ = "code_summary"
Line range hint
17-22
: Consider using more explicit type hintsThe type hints could be more explicit using the typing module for better static type checking.
+from typing import ClassVar, Dict, List + class CodeSummary(DataPoint): __tablename__ = "code_summary" - text: str - made_from: CodeFile + text: str + made_from: CodeFile - _metadata: dict = { + _metadata: ClassVar[Dict[str, List[str]]] = { "index_fields": ["text"], }
Line range hint
21-22
: Consider code-specific indexing strategyThe current indexing strategy is identical to
TextSummary
. Consider whether code summaries might benefit from additional indexed fields specific to code content (e.g., function names, class names).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
cognee/tasks/summarization/models.py
(1 hunks)
🔇 Additional comments (1)
cognee/tasks/summarization/models.py (1)
Line range hint 17-22
: Verify CodeFile relationship implementation
Let's verify that the CodeFile
relationship is properly implemented in the database schema and that all necessary foreign key constraints are in place.
This is work in progress please do not merge to main! |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9573981 | Triggered | Generic Password | 51b825e | .github/workflows/test_deduplication.yml | View secret |
9573981 | Triggered | Generic Password | 51b825e | cognee/tests/test_deduplication.py | View secret |
9573981 | Triggered | Generic Password | 3ef3376 | .github/workflows/test_deduplication.yml | View secret |
9573981 | Triggered | Generic Password | 3ef3376 | cognee/tests/test_deduplication.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
…aches' of https://github.com/topoteretes/cognee into feature/cog-539-implementing-additional-retriever-approaches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (9)
cognee/modules/retrieval/description_to_codepart_search.py (3)
12-12
: PEP 8 Style: Remove spaces around '=' in default argumentsIn the function definition, there are extra spaces around the
=
in default arguments. According to PEP 8 style guidelines, there should be no spaces around=
in function signatures with default parameter values.Apply this diff to fix the style issue:
- async def code_description_to_code_part_search(query: str, user: User = None, top_k = 2) -> list: + async def code_description_to_code_part_search(query: str, user: User=None, top_k=2) -> list:
69-69
: Reminder: Address the TODO comment related to codegraph structureThe TODO comment indicates that this section needs updating when the codegraph structure changes. Proactively addressing this will prevent future compatibility issues.
Would you like assistance in updating this section to accommodate the upcoming codegraph structure changes?
84-94
: Consider moving test code under__main__
to a separate test moduleWhile the test code in the
__main__
block is useful for demonstration, it's better practice to place tests in a separate test file using a testing framework likepytest
. This improves maintainability and aligns with best testing practices.cognee/modules/graph/cognee_graph/CogneeGraphElements.py (2)
68-73
: Add return type annotations to new methodsFor better code clarity and type checking, consider adding return type annotations to the
get_skeleton_edges
andget_skeleton_neighbours
methods in theNode
class.Apply this diff:
def get_skeleton_edges(self) -> List["Edge"]: return self.skeleton_edges def get_skeleton_neighbours(self) -> List["Node"]: return self.skeleton_neighbours
121-126
: Add return type annotations to new methodsTo improve code readability and facilitate type checking, add return type annotations to the
get_node_from
andget_node_to
methods in theEdge
class.Apply this diff:
def get_node_from(self) -> "Node": return self.node1 def get_node_to(self) -> "Node": return self.node2cognee/modules/graph/cognee_graph/CogneeGraph.py (4)
Line range hint
98-100
: Replace print statements with logging and reconsider exception handling inproject_graph_from_db
The method catches exceptions and prints error messages, which is not ideal for production code. Use the
logging
module to log errors, and consider re-raising exceptions or handling them appropriately to avoid silent failures.Apply this diff:
except (ValueError, TypeError) as e: - print(f"Error projecting graph: {e}") + logging.error(f"Error projecting graph: {e}") + raise except Exception as ex: - print(f"Unexpected error: {ex}") + logging.exception("Unexpected error in project_graph_from_db") + raiseMake sure to import the
logging
module if it's not already imported.
Line range hint
109-110
: Replace print statements with logging inmap_vector_distances_to_graph_nodes
Using
Apply this diff:
else: - print(f"Node with id {node_id} not found in the graph.") + logging.warning(f"Node with id {node_id} not found in the graph.")
Line range hint
142-143
: Replace print statements with logging inmap_vector_distances_to_graph_edges
Replace
logging
to improve log management and avoid cluttering the console.Apply this diff:
- print(f"Edge {edge} has an unknown or missing relationship type.") + logging.warning(f"Edge {edge} has an unknown or missing relationship type.")
Line range hint
145-146
: Replace print statement with logging in exception handlingUsing
logging.exception
to log the error along with the traceback, which aids in debugging.Apply this diff:
except Exception as ex: - print(f"Error mapping vector distances to edges: {ex}") + logging.exception("Error mapping vector distances to edges")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
cognee/modules/graph/cognee_graph/CogneeGraph.py
(1 hunks)cognee/modules/graph/cognee_graph/CogneeGraphElements.py
(2 hunks)cognee/modules/retrieval/description_to_codepart_search.py
(1 hunks)
… our current codegraph generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
cognee/tests/unit/modules/retriever/test_description_to_codepart_search.py (7)
4-6
: Remove unnecessary empty linesMultiple consecutive empty lines can be reduced to one for better readability.
from unittest.mock import AsyncMock, patch - - @pytest.mark.asyncio
7-21
: Enhance test coverage with parameter verificationWhile the test correctly verifies the empty result case, it would be more robust to also verify that the mock was called with the expected parameters.
from cognee.modules.retrieval.description_to_codepart_search import code_description_to_code_part result = await code_description_to_code_part("search query", mock_user, 2) assert result == [] + mock_vector_engine.search.assert_called_once_with("search query", 2)
24-31
: Consider using parametrize for comprehensive input validationThe test could be enhanced to cover more invalid input cases using pytest's parametrize decorator.
+@pytest.mark.parametrize("invalid_query", [ + "", + None, + 123, + [], +]) @pytest.mark.asyncio -async def test_code_description_to_code_part_invalid_query(): +async def test_code_description_to_code_part_invalid_query(invalid_query): """Test that code_description_to_code_part raises ValueError for invalid query.""" mock_user = AsyncMock() with pytest.raises(ValueError, match="The query must be a non-empty string."): from cognee.modules.retrieval.description_to_codepart_search import code_description_to_code_part - await code_description_to_code_part("", mock_user, 2) + await code_description_to_code_part(invalid_query, mock_user, 2)
33-40
: Extend top_k validation testsThe test should cover more invalid cases for the top_k parameter.
+@pytest.mark.parametrize("invalid_top_k,expected_message", [ + (0, "top_k must be a positive integer."), + (-1, "top_k must be a positive integer."), + (1.5, "top_k must be a positive integer."), + ("2", "top_k must be a positive integer."), +]) @pytest.mark.asyncio -async def test_code_description_to_code_part_invalid_top_k(): +async def test_code_description_to_code_part_invalid_top_k(invalid_top_k, expected_message): """Test that code_description_to_code_part raises ValueError for invalid top_k.""" mock_user = AsyncMock() - with pytest.raises(ValueError, match="top_k must be a positive integer."): + with pytest.raises(ValueError, match=expected_message): from cognee.modules.retrieval.description_to_codepart_search import code_description_to_code_part - await code_description_to_code_part("search query", mock_user, 0) + await code_description_to_code_part("search query", mock_user, invalid_top_k)
42-52
: Verify exception chain and test different initialization scenariosConsider testing different initialization failure scenarios and verifying the exception chain.
+@pytest.mark.parametrize("init_error,expected_message", [ + (Exception("Engine init failed"), "System initialization error. Please try again later."), + (ConnectionError("Database unreachable"), "System initialization error. Please try again later."), + (ValueError("Invalid configuration"), "System initialization error. Please try again later."), +]) @pytest.mark.asyncio -async def test_code_description_to_code_part_initialization_error(): +async def test_code_description_to_code_part_initialization_error(init_error, expected_message): """Test that code_description_to_code_part raises RuntimeError for engine initialization errors.""" mock_user = AsyncMock() - with patch("cognee.modules.retrieval.description_to_codepart_search.get_vector_engine", side_effect=Exception("Engine init failed")), \ + with patch("cognee.modules.retrieval.description_to_codepart_search.get_vector_engine", side_effect=init_error), \ patch("cognee.modules.retrieval.description_to_codepart_search.get_graph_engine", return_value=AsyncMock()): from cognee.modules.retrieval.description_to_codepart_search import code_description_to_code_part - with pytest.raises(RuntimeError, match="System initialization error. Please try again later."): + with pytest.raises(RuntimeError, match=expected_message) as exc_info: await code_description_to_code_part("search query", mock_user, 2) + assert exc_info.value.__cause__ == init_error
54-68
: Enhance execution error testingConsider testing different execution error scenarios and verifying the search attempt was made.
+@pytest.mark.parametrize("exec_error,expected_message", [ + (Exception("Execution error"), "An error occurred while processing your request."), + (TimeoutError("Search timeout"), "An error occurred while processing your request."), + (ValueError("Invalid search parameters"), "An error occurred while processing your request."), +]) @pytest.mark.asyncio -async def test_code_description_to_code_part_execution_error(): +async def test_code_description_to_code_part_execution_error(exec_error, expected_message): """Test that code_description_to_code_part raises RuntimeError for execution errors.""" mock_user = AsyncMock() mock_user.id = "user123" mock_vector_engine = AsyncMock() - mock_vector_engine.search.side_effect = Exception("Execution error") + mock_vector_engine.search.side_effect = exec_error with patch("cognee.modules.retrieval.description_to_codepart_search.get_vector_engine", return_value=mock_vector_engine), \ patch("cognee.modules.retrieval.description_to_codepart_search.get_graph_engine", return_value=AsyncMock()), \ patch("cognee.modules.retrieval.description_to_codepart_search.CogneeGraph", return_value=AsyncMock()): from cognee.modules.retrieval.description_to_codepart_search import code_description_to_code_part - with pytest.raises(RuntimeError, match="An error occurred while processing your request."): + with pytest.raises(RuntimeError, match=expected_message) as exc_info: await code_description_to_code_part("search query", mock_user, 2) + # Verify search was attempted + mock_vector_engine.search.assert_called_once() + assert exc_info.value.__cause__ == exec_error
1-68
: Consider adding tests for missing scenariosWhile the current test coverage is good for error cases, consider adding tests for:
- Success case with actual search results (including result transformation)
- Edge cases with large top_k values
- Authentication/authorization scenarios
- Rate limiting scenarios
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
cognee/modules/graph/cognee_graph/CogneeGraphElements.py
(2 hunks)cognee/modules/retrieval/description_to_codepart_search.py
(1 hunks)cognee/tests/unit/modules/retriever/test_description_to_codepart_search.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- cognee/modules/retrieval/description_to_codepart_search.py
- cognee/modules/graph/cognee_graph/CogneeGraphElements.py
Summary by CodeRabbit
New Features
CodeSummary
class for improved data handling related to code summaries.Node
andEdge
classes with new methods for better data retrieval.Bug Fixes
add_edge
method to log messages instead of raising exceptions.Tests
test_add_duplicate_edge
function from the test suite for theCogneeGraph
class.code_description_to_code_part
function, covering various scenarios including input validation and error handling.