-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COG 870 Remove duplicate edges from the code graph #293
Conversation
WalkthroughThe pull request introduces significant modifications across four files, primarily focusing on the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
evals/eval_swe_bench.py (2)
Line range hint
44-49
: Consider moving imports inside the function to the top-level scopeThe imports within the
run_code_graph_pipeline
function can be moved to the top of the module unless there is a specific reason to keep them inside the function (e.g., to avoid circular dependencies). This improves readability and follows Python's best practices.Proposed code change:
-async def run_code_graph_pipeline(repo_path): - import os - import pathlib - import cognee - from cognee.infrastructure.databases.relational import create_db_and_tables +import os +import pathlib +import cognee +from cognee.infrastructure.databases.relational import create_db_and_tables +async def run_code_graph_pipeline(repo_path):
71-71
: Remove unused parametersearch_type
The parameter
search_type
in the functiongenerate_patch_with_cognee
is not used in the function body. Consider removing it to clean up the function signature.Proposed code change:
-async def generate_patch_with_cognee(instance, llm_client, search_type=SearchType.CHUNKS): +async def generate_patch_with_cognee(instance, llm_client):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
cognee/tasks/summarization/summarize_code.py
(1 hunks)evals/eval_swe_bench.py
(3 hunks)examples/python/code_graph_example.py
(1 hunks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
cognee/tasks/summarization/summarize_code.py (2)
14-14
: 🛠️ Refactor suggestionUpdate return type annotation to reflect generator
The function
summarize_code
now yieldsDataPoint
objects instead of returning a list. Update the return type annotation to-> AsyncGenerator[DataPoint, None]
to accurately represent its behavior.Apply this diff:
- ) -> list[DataPoint]: + ) -> AsyncGenerator[DataPoint, None]:
3-3
: 🛠️ Refactor suggestionImport
AsyncGenerator
fromtyping
Since
summarize_code
is now an async generator, importAsyncGenerator
from thetyping
module to properly annotate the return type.Apply this diff:
- from typing import Type + from typing import Type, AsyncGenerator
🧹 Nitpick comments (6)
cognee/api/v1/cognify/code_graph_pipeline.py (5)
113-116
: Consider moving imports to the module levelThe imports inside the
run_code_graph_pipeline
function (lines 113-116) can be moved to the top of the file. This enhances readability and adheres to standard Python conventions.Apply this diff to move the imports:
+ import os + import pathlib + import cognee + from cognee.infrastructure.databases.relational import create_db_and_tables async def run_code_graph_pipeline(repo_path): - import os - import pathlib - import cognee - from cognee.infrastructure.databases.relational import create_db_and_tables
114-114
: Remove redundant import ofpathlib
The module
pathlib
is already imported at the module level on line 3 (from pathlib import Path
). The import on line 114 is redundant and can be removed.Apply this diff to remove the redundant import:
- import pathlib
116-116
: Movecreate_db_and_tables
import to the module levelFor consistency and clarity, consider moving the import of
create_db_and_tables
to the module level with other imports.Apply this diff:
+ from cognee.infrastructure.databases.relational import create_db_and_tables async def run_code_graph_pipeline(repo_path): - from cognee.infrastructure.databases.relational import create_db_and_tables
119-122
: Simplify path manipulations usingPath
objectsCurrently, both
os.path.join
andpathlib.Path
are used for path manipulations. For better readability and consistency, consider usingPath
objects throughout.Apply this diff to refactor the path handling:
- data_directory_path = str(pathlib.Path(os.path.join(file_path, ".data_storage/code_graph")).resolve()) - cognee_directory_path = str(pathlib.Path(os.path.join(file_path, ".cognee_system/code_graph")).resolve()) + data_directory_path = (file_path / ".data_storage" / "code_graph").resolve() + cognee_directory_path = (file_path / ".cognee_system" / "code_graph").resolve()
112-136
: Add a docstring torun_code_graph_pipeline
Adding a docstring to the
run_code_graph_pipeline
function will improve code readability and maintainability by explaining the purpose and usage of the function.cognee/tasks/summarization/summarize_code.py (1)
18-18
: Ensure generator function concludes appropriatelyThe
return
statement at line 18 exits the generator function without yielding any values ifcode_graph_nodes
is empty. This is acceptable, but for clarity, consider usingreturn None
to explicitly indicate the end of the generator.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cognee/api/v1/cognify/code_graph_pipeline.py
(3 hunks)cognee/tasks/summarization/summarize_code.py
(1 hunks)evals/eval_swe_bench.py
(2 hunks)examples/python/code_graph_example.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/python/code_graph_example.py
🔇 Additional comments (2)
evals/eval_swe_bench.py (2)
10-10
: Refactored run_code_graph_pipeline
import improves modularity
Good job moving run_code_graph_pipeline
to code_graph_pipeline.py
and importing it here. This enhances code organization and reusability.
39-44
: Function generate_patch_with_cognee
correctly utilizes the pipeline
The updated function properly calls run_code_graph_pipeline
and processes the results asynchronously, which aligns with best practices for async operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
cognee/api/v1/cognify/code_graph_pipeline.py (3)
1-4
: Enhance deprecation notice with timelineThe deprecation notice is clear but could be more helpful by including:
- When these functions will be removed
- Migration guide or link to documentation
# NOTICE: This module contains deprecated functions. # Use only the run_code_graph_pipeline function; all other functions are deprecated. +# These functions will be removed in version X.Y.Z. See migration guide at <link> # Related issue: COG-906
133-139
: Make task configurations configurableThe task configurations use hardcoded batch sizes which might not be optimal for all scenarios.
Consider making these configurable:
+ def get_batch_size(task_name: str) -> int: + return cognee.config.get(f"tasks.{task_name}.batch_size", default=50) + tasks = [ Task(get_repo_file_dependencies), - Task(enrich_dependency_graph, task_config={"batch_size": 50}), - Task(expand_dependency_graph, task_config={"batch_size": 50}), - Task(summarize_code, summarization_model=SummarizedContent, task_config={"batch_size": 50}), - Task(add_data_points, task_config={"batch_size": 50}), + Task(enrich_dependency_graph, task_config={"batch_size": get_batch_size("enrich_dependency_graph")}), + Task(expand_dependency_graph, task_config={"batch_size": get_batch_size("expand_dependency_graph")}), + Task(summarize_code, summarization_model=SummarizedContent, task_config={"batch_size": get_batch_size("summarize_code")}), + Task(add_data_points, task_config={"batch_size": get_batch_size("add_data_points")}), ]
124-127
: Consider using temporary directories for isolationUsing fixed directory names under the package directory could cause conflicts between different runs or in different environments.
Consider using
tempfile.mkdtemp()
to create isolated directories for each run, or make the paths configurable through environment variables.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/api/v1/cognify/code_graph_pipeline.py
(4 hunks)
🔇 Additional comments (1)
cognee/api/v1/cognify/code_graph_pipeline.py (1)
62-62
: Verify error handling migration
While the deprecation notice is good, ensure that the error handling and telemetry capabilities from run_pipeline
are properly migrated to run_code_graph_pipeline
.
✅ Verification successful
Error handling and telemetry are properly migrated
The new run_code_graph_pipeline
implementation includes comprehensive error handling and telemetry:
- Sends telemetry events at pipeline start, completion, and error states
- Wraps core execution in try-except block
- Updates pipeline status appropriately for both success and error cases
- Maintains consistent error reporting pattern with the deprecated version
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for error handling and telemetry in the new implementation
rg -A 5 "try|except|send_telemetry" cognee/api/v1/cognify/code_graph_pipeline.py
Length of output: 1600
Refactored the
summarize_code
task into a generator that does not call theadd_points
function.summarize_code
now:This ensures all data points reach the
add_datapoints
function.Moved the
add_datapoints
task to the end of the pipeline.Extracted
run_code_graph_pipeline
into a standalone function.Added a script example for running the code graph pipeline with the new function.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Deprecation Notices
Documentation