feat: Added base modal runner and changed eval script #448

Vasilije1990 · 2025-01-16T22:33:37Z

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

Release Notes

New Features
- Added Modal application for processing repository data.
- Introduced new utility scripts for repository evaluation and patch generation.
- Implemented a function to calculate the square of a number remotely.
Infrastructure
- Created new Dockerfile for Python 3.11 application.
- Added environment configuration and dependency management scripts.
Development Tools
- Implemented asynchronous processing for repository instances.
- Added package installation and error handling utilities.
Cleanup
- Removed legacy entrypoint script.
Dependency Updates
- Expanded library capabilities with multiple new dependencies in the project configuration.

coderabbitai · 2025-01-16T22:33:44Z

Walkthrough

This pull request introduces several new files and modifications to support a Modal-based evaluation system for processing repositories. The changes include creating evals/get_started.py as a simple Modal application example, process_single_repo.py for repository processing, eval_with_modal.py for running evaluations, and a new Dockerfile.modal to set up the application environment. The implementation focuses on providing a flexible framework for generating patches for repository instances, with options to use or disable the Cognee component.

Changes

File	Change Summary
`evals/get_started.py`	New file with a Modal application example, including `square()` and `main()` functions
`evals/process_single_repo.py`	New script for processing single repository instances, with functions for patch generation and dependency management
`eval_with_modal.py`	New file defining a Modal application for concurrent repository data processing
`Dockerfile.modal`	New Dockerfile configuring a Python 3.11 environment with specific dependencies and application setup
`entrypoint-old.sh`	Deleted shell script previously used for application startup
`pyproject.toml`	Updated to include multiple new dependencies for enhanced functionality
`cognee-mcp/cognee_mcp/server.py`	Modified `handle_call_tool` function to change search type from `INSIGHTS` to `COMPLETION`

Possibly related PRs

Code graph pipeline improvements and fixes #414: The changes in this PR involve modifications to the run_code_graph_pipeline function, which may relate to the new functionality introduced in evals/get_started.py that sets up a modal application for processing data, potentially utilizing similar pipeline structures.
Feature/cog 971 preparing swe bench run #424: This PR modifies the run_code_graph_pipeline function and introduces a max_tokens parameter in the extract_chunks_from_documents task, which aligns with the new square function in evals/get_started.py that may also involve token management in its processing.
feat: Add data visualization for Anthropic #432: The addition of visualization tools and enhancements in this PR could relate to the new modal application setup in evals/get_started.py, as both involve processing and potentially visualizing data.

Suggested labels

run-checks

Suggested reviewers

lxobr

Poem

🐰 A Modal Rabbit's Eval Dance
Hopping through repos with Modal's grace,
Patches generated at lightning pace,
Cognee whispers, code takes flight,
Processing instances, pure delight!
A bunny's code quest, swift and bright! 🚀

Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

evals/eval_with_modal.py (1)
23-24: Document GPU and concurrency requirements.

The function uses specific GPU type and concurrency limits, but lacks documentation about these requirements.

Add docstring explaining resource requirements and limitations:
 @app.function(image=image, gpu="T4", concurrency_limit=5)
 def run_single_repo(instance_data: dict, disable_cognee: bool = False):
+    """Process a single repository using Modal with T4 GPU.
+    
+    Args:
+        instance_data: Repository instance data
+        disable_cognee: Flag to disable Cognee processing
+        
+    Note:
+        Uses T4 GPU with max 5 concurrent executions
+    """
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter
evals/process_single_repo.py (1)
40-42: Improve logging instead of print statements.

Using print statements for logging makes it difficult to track and manage output.
+import logging
+
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
     async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs):
-        print(result)
+        logger.info("Pipeline result: %s", result)
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c6ba32 and a09250d.

📒 Files selected for processing (3)

evals/eval_with_modal.py (1 hunks)
evals/get_started.py (1 hunks)
evals/process_single_repo.py (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: ruff format

evals/get_started.py

[warning] File requires formatting with Ruff formatter

evals/process_single_repo.py

[warning] File requires formatting with Ruff formatter

evals/eval_with_modal.py

[warning] File requires formatting with Ruff formatter

🪛 Ruff (0.8.2)

evals/eval_with_modal.py

27-27: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 GitHub Actions: ruff lint

evals/eval_with_modal.py

[error] 27-27: Redefinition of unused 'os' from line 4. The 'os' module is already imported on line 4 and is not being used.

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Build Cognee Backend Docker App Image
GitHub Check: docker-compose-test

🔇 Additional comments (1)

evals/get_started.py (1)

1-14: Fix formatting issues flagged by Ruff.

The file requires formatting with the Ruff formatter.

Run the following to fix formatting:

Additionally, since this appears to be an example file, consider whether it should be included in the production codebase or moved to documentation/examples directory.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T22:36:58Z

evals/eval_with_modal.py

+# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))
+LOCAL_COGNEE_PATH = "/Users/vasilije/cognee"


⚠️ Potential issue

Replace hardcoded path with environment variable or configuration.

The hardcoded local path /Users/vasilije/cognee is not portable and will fail for other users.

Replace with:

-LOCAL_COGNEE_PATH = "/Users/vasilije/cognee" +LOCAL_COGNEE_PATH = os.getenv("COGNEE_PATH", os.path.dirname(os.path.abspath(__file__)))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))

LOCAL_COGNEE_PATH = "/Users/vasilije/cognee"

# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))

LOCAL_COGNEE_PATH = os.getenv("COGNEE_PATH", os.path.dirname(os.path.abspath(__file__)))

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T22:36:58Z

evals/eval_with_modal.py

+    work_dir = "/root/cognee"
+    subprocess.run(cmd, cwd=work_dir, check=True)
+


⚠️ Potential issue

Add error handling for subprocess execution.

The subprocess execution lacks error handling and could fail silently.

work_dir = "/root/cognee" - subprocess.run(cmd, cwd=work_dir, check=True) + try: + subprocess.run(cmd, cwd=work_dir, check=True, capture_output=True, text=True) + except subprocess.CalledProcessError as e: + print(f"Error executing process_single_repo.py: {e.stderr}", file=sys.stderr) + return handle_subprocess_error(e)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T22:36:58Z

evals/eval_with_modal.py

+    results = []
+    for call in calls:
+        filename, content = call
+        if content:
+            with open(filename, "w") as f:
+                f.write(content)
+            print(f"Saved {filename} locally.")
+            results.append(filename)
+


🛠️ Refactor suggestion

Add cleanup for temporary files.

The code saves files locally but never cleans them up, which could lead to disk space issues over time.

results = [] + temp_files = [] for call in calls: filename, content = call if content: with open(filename, "w") as f: f.write(content) print(f"Saved {filename} locally.") results.append(filename) + temp_files.append(filename) + + # After merging files, clean up temporary files + def cleanup_temp_files(): + for temp_file in temp_files: + try: + os.remove(temp_file) + print(f"Cleaned up {temp_file}") + except OSError as e: + print(f"Error cleaning up {temp_file}: {e}", file=sys.stderr)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T22:36:58Z

evals/eval_with_modal.py

+def run_single_repo(instance_data: dict, disable_cognee: bool = False):
+    import subprocess
+    import json
+    import os
+


⚠️ Potential issue

Fix duplicate import and add error handling.

The os module is imported twice, and there's no error handling for the subprocess execution.

- import os import subprocess import json - import os + from typing import Tuple + + def handle_subprocess_error(error: subprocess.CalledProcessError) -> Tuple[str, str]: + return ( + f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_data['instance_id']}.json", + json.dumps({"error": str(error)}) + )

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

27-27: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 27-27: Redefinition of unused 'os' from line 4. The 'os' module is already imported on line 4 and is not being used.

coderabbitai · 2025-01-16T22:36:58Z

evals/process_single_repo.py

+if __name__ == "__main__":
+    import asyncio
+
+    asyncio.run(main(), debug=True)


⚠️ Potential issue

Remove debug mode in production.

Running asyncio with debug=True in production can impact performance.

if __name__ == "__main__": import asyncio - asyncio.run(main(), debug=True) + asyncio.run(main())

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if __name__ == "__main__":

import asyncio

asyncio.run(main(), debug=True)

if __name__ == "__main__":

import asyncio

asyncio.run(main())

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T22:36:58Z

evals/process_single_repo.py

+def check_install_package(package_name):
+    """Check if a pip package is installed and install it if not."""
+    try:
+        __import__(package_name)
+        return True
+    except ImportError:
+        try:
+            subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
+            return True
+        except subprocess.CalledProcessError:
+            return False
+


⚠️ Potential issue

Replace runtime package installation with proper dependency management.

Installing packages at runtime is risky and could lead to inconsistent environments.

Move dependencies to pyproject.toml or requirements.txt:

-def check_install_package(package_name): - """Check if a pip package is installed and install it if not.""" - try: - __import__(package_name) - return True - except ImportError: - try: - subprocess.check_call([sys.executable, "-m", "pip", "install", package_name]) - return True - except subprocess.CalledProcessError: - return False

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-16T22:36:58Z

evals/process_single_repo.py

+async def generate_patch_with_cognee(instance):
+    repo_path = download_github_repo(instance, "../RAW_GIT_REPOS")
+    include_docs = True
+    problem_statement = instance["problem_statement"]
+    instructions = read_query_prompt("patch_gen_kg_instructions.txt")
+
+    async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs):
+        print(result)
+
+    retrieved_codeparts = await code_description_to_code_part_search(
+        problem_statement, include_docs=include_docs
+    )
+
+    prompt = "\n".join(
+        [
+            problem_statement,
+            "<patch>",
+            PATCH_EXAMPLE,
+            "</patch>",
+            "Additional context to solve the problem:",
+            retrieved_codeparts,
+        ]
+    )
+
+    llm_client = get_llm_client()
+    answer_prediction = await llm_client.acreate_structured_output(
+        text_input=prompt,
+        system_prompt=instructions,
+        response_model=str,
+    )
+
+    return answer_prediction
+


🛠️ Refactor suggestion

Add error handling and type hints to async functions.

The async functions lack proper error handling and type hints.

-async def generate_patch_with_cognee(instance): +async def generate_patch_with_cognee(instance: dict) -> str: + """Generate patch using Cognee. + + Args: + instance: Repository instance data + + Returns: + str: Generated patch + + Raises: + ValueError: If required instance data is missing + RuntimeError: If patch generation fails + """ + if not instance.get("problem_statement"): + raise ValueError("Missing problem statement in instance") + repo_path = download_github_repo(instance, "../RAW_GIT_REPOS") include_docs = True problem_statement = instance["problem_statement"] instructions = read_query_prompt("patch_gen_kg_instructions.txt") - async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs): - print(result) + try: + async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs): + print(result) + except Exception as e: + raise RuntimeError(f"Failed to run code graph pipeline: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def generate_patch_with_cognee(instance):

repo_path = download_github_repo(instance, "../RAW_GIT_REPOS")

include_docs = True

problem_statement = instance["problem_statement"]

instructions = read_query_prompt("patch_gen_kg_instructions.txt")

async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs):

print(result)

retrieved_codeparts = await code_description_to_code_part_search(

problem_statement, include_docs=include_docs

)

prompt = "\n".join(

[

problem_statement,

"<patch>",

PATCH_EXAMPLE,

"</patch>",

"Additional context to solve the problem:",

retrieved_codeparts,

]

)

llm_client = get_llm_client()

answer_prediction = await llm_client.acreate_structured_output(

text_input=prompt,

system_prompt=instructions,

response_model=str,

)

return answer_prediction

async def generate_patch_with_cognee(instance: dict) -> str:

"""Generate patch using Cognee.

Args:

instance: Repository instance data

Returns:

str: Generated patch

Raises:

ValueError: If required instance data is missing

RuntimeError: If patch generation fails

"""

if not instance.get("problem_statement"):

raise ValueError("Missing problem statement in instance")

repo_path = download_github_repo(instance, "../RAW_GIT_REPOS")

include_docs = True

problem_statement = instance["problem_statement"]

instructions = read_query_prompt("patch_gen_kg_instructions.txt")

try:

async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs):

print(result)

except Exception as e:

raise RuntimeError(f"Failed to run code graph pipeline: {e}")

retrieved_codeparts = await code_description_to_code_part_search(

problem_statement, include_docs=include_docs

)

prompt = "\n".join(

[

problem_statement,

"<patch>",

PATCH_EXAMPLE,

"</patch>",

"Additional context to solve the problem:",

retrieved_codeparts,

]

)

llm_client = get_llm_client()

answer_prediction = await llm_client.acreate_structured_output(

text_input=prompt,

system_prompt=instructions,

response_model=str,

)

return answer_prediction

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

evals/eval_with_modal.py (4)
10-11: ⚠️ Potential issue

Replace hardcoded path with environment variable or configuration.

The hardcoded local path /Users/vasilije/cognee is not portable and will fail for other users.

Apply this fix:
-# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))
-LOCAL_COGNEE_PATH = "/Users/vasilije/cognee"
+LOCAL_COGNEE_PATH = os.getenv("COGNEE_PATH", os.path.dirname(os.path.abspath(__file__)))
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

24-28: ⚠️ Potential issue

Fix duplicate import and add type hints.

The os module is imported twice, and the function could benefit from better type hints.
-    import os
     import subprocess
     import json
-    import os
+    from typing import Tuple
+
+    def handle_subprocess_error(error: subprocess.CalledProcessError) -> Tuple[str, str]:
+        return (
+            f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_data['instance_id']}.json",
+            json.dumps({"error": str(error)})
+        )
🧰 Tools

🪛 Ruff (0.8.2)

27-27: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 27-27: Redefinition of unused 'os' from line 4. Remove redundant import.

39-41: ⚠️ Potential issue

Add error handling for subprocess execution.

The subprocess execution lacks error handling and could fail silently.
     work_dir = "/root/cognee"
-    subprocess.run(cmd, cwd=work_dir, check=True)
+    try:
+        subprocess.run(cmd, cwd=work_dir, check=True, capture_output=True, text=True)
+    except subprocess.CalledProcessError as e:
+        print(f"Error executing process_single_repo.py: {e.stderr}", file=sys.stderr)
+        return handle_subprocess_error(e)
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

76-84: 🛠️ Refactor suggestion

Add cleanup for temporary files.

The code saves files locally but never cleans them up, which could lead to disk space issues over time.
     results = []
+    temp_files = []
     for call in calls:
         filename, content = call
         if content:
             with open(filename, "w") as f:
                 f.write(content)
             print(f"Saved {filename} locally.")
             results.append(filename)
+            temp_files.append(filename)
+
+    # After merging files, clean up temporary files
+    def cleanup_temp_files():
+        for temp_file in temp_files:
+            try:
+                os.remove(temp_file)
+                print(f"Cleaned up {temp_file}")
+            except OSError as e:
+                print(f"Error cleaning up {temp_file}: {e}", file=sys.stderr)
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🧹 Nitpick comments (1)

evals/eval_with_modal.py (1)
69-75: Add progress tracking for long-running operations.

For better user experience, consider adding progress tracking for the parallel processing of repositories.
     swe_dataset = load_swebench_dataset(dataset_name, split="test")
     swe_dataset = swe_dataset[:num_samples]
 
+    print(f"Processing {len(swe_dataset)} repositories...")
     calls = []
     for instance in swe_dataset:
+        print(f"Submitting repository {instance['instance_id']}...")
         calls.append(run_single_repo.remote(instance, disable_cognee=disable_cognee))
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a09250d and 469c94e.

📒 Files selected for processing (1)

evals/eval_with_modal.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

evals/eval_with_modal.py

27-27: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 GitHub Actions: ruff format

evals/eval_with_modal.py

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

evals/eval_with_modal.py

[error] 27-27: Redefinition of unused 'os' from line 4. Remove redundant import.

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_simple_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: docker-compose-test
GitHub Check: profiler

🔇 Additional comments (1)

evals/eval_with_modal.py (1)

13-20: LGTM! Good use of slim base image and poetry.

The image setup follows best practices with a slim base image and proper dependency management using poetry.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

evals/eval_with_modal.py (4)
10-11: ⚠️ Potential issue

Replace hardcoded path with environment variable or configuration.

The hardcoded local path /Users/vasilije/cognee is not portable and will fail for other users.
-# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))
-LOCAL_COGNEE_PATH = "/Users/vasilije/cognee"
+LOCAL_COGNEE_PATH = os.getenv("COGNEE_PATH", os.path.dirname(os.path.abspath(__file__)))
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

37-39: ⚠️ Potential issue

Fix duplicate imports.

The os module is imported twice (line 4 and line 39).
     import subprocess
     import json
-    import os
🧰 Tools

🪛 Ruff (0.8.2)

39-39: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 GitHub Actions: ruff lint

[error] 39-39: Redefinition of unused 'os' from line 4. Remove redundant import.

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

51-53: ⚠️ Potential issue

Add error handling for subprocess execution.

The subprocess execution lacks error handling and could fail silently.
     work_dir = "/root/cognee"
-    subprocess.run(cmd, cwd=work_dir, check=True)
+    try:
+        result = subprocess.run(cmd, cwd=work_dir, check=True, capture_output=True, text=True)
+        if result.stderr:
+            print(f"Warning from subprocess: {result.stderr}", file=sys.stderr)
+    except subprocess.CalledProcessError as e:
+        print(f"Error executing process_single_repo.py: {e.stderr}", file=sys.stderr)
+        return (filename, json.dumps({"error": str(e)}))
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

88-96: 🛠️ Refactor suggestion

Add cleanup for temporary files.

The code saves files locally but never cleans them up, which could lead to disk space issues over time.
     results = []
+    temp_files = []
     for call in calls:
         filename, content = call
         if content:
             with open(filename, "w") as f:
                 f.write(content)
             print(f"Saved {filename} locally.")
             results.append(filename)
+            temp_files.append(filename)
+
+    try:
+        # Process the results...
+    finally:
+        # Clean up temporary files
+        for temp_file in temp_files:
+            try:
+                os.remove(temp_file)
+                print(f"Cleaned up {temp_file}")
+            except OSError as e:
+                print(f"Error cleaning up {temp_file}: {e}", file=sys.stderr)
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🧹 Nitpick comments (4)

evals/eval_with_modal.py (4)
6-6: Remove unused import.

The Optional type hint is imported but never used in the code.
-from typing import Optional
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

13-32: Specify Python version explicitly in the image configuration.

The image configuration doesn't specify a Python version, which could lead to version mismatches across different environments. Consider pinning the Python version.
 image = (
     modal.Image.debian_slim()
+    .run_commands("apt-get update && apt-get install -y python3.9")
     .pip_install("poetry")
     .add_local_dir(LOCAL_COGNEE_PATH, "/root/cognee", copy=True,
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

62-63: Log when output file is missing.

The function silently returns an empty string when the output file is missing, which could mask issues.
     else:
+        print(f"Warning: Output file {path_in_container} not found", file=sys.stderr)
         return (filename, "")
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

84-87: Add progress tracking for long-running operations.

For large datasets, it would be helpful to track progress of the remote executions.
     calls = []
+    total = len(swe_dataset)
+    print(f"Processing {total} repositories...")
     for instance in swe_dataset:
         calls.append(run_single_repo.remote(instance, disable_cognee=disable_cognee))
+
+    completed = 0
+    print("Waiting for results...")
And update the results processing:
     for call in calls:
         filename, content = call
+        completed += 1
+        print(f"Progress: {completed}/{total} repositories processed")
         if content:
🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 469c94e and d1b92ee.

📒 Files selected for processing (1)

evals/eval_with_modal.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

evals/eval_with_modal.py

39-39: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 GitHub Actions: ruff lint

evals/eval_with_modal.py

[error] 39-39: Redefinition of unused 'os' from line 4. Remove redundant import.

🪛 GitHub Actions: ruff format

evals/eval_with_modal.py

[warning] File requires formatting with Ruff formatter

borisarzentar · 2025-01-17T16:15:14Z

evals/eval_with_modal.py

+app = modal.App("cognee-runner")
+
+# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))
+LOCAL_COGNEE_PATH = "/Users/vasilije/cognee"


Hardcoded path.

will fix once i finish running it

borisarzentar · 2025-01-17T16:16:24Z

evals/eval_with_modal.py

+@app.function(image=image, gpu="T4", concurrency_limit=5)
+def run_single_repo(instance_data: dict, disable_cognee: bool = False):
+    import subprocess
+    import json


Double imports here

borisarzentar · 2025-01-17T16:20:52Z

evals/get_started.py

+
+
+@app.function()
+def square(x):


Do we need this?

I will turn it into a test

borisarzentar · 2025-01-17T16:21:54Z

evals/eval_with_modal.py

+
+
+@app.local_entrypoint()
+def main(disable_cognee: bool = False, num_samples: int = 2):


How do we run it?

modal python_function

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

eval_with_modal.py (2)
144-145: Remove unnecessary f prefix from strings

The print statements at lines 144 and 145 use f-strings without any placeholders. The f prefix is unnecessary and can be removed to clean up the code.

Apply this diff to remove the extraneous f prefix:
- print(f"Configuration:")
+ print("Configuration:")
- print(f"• Running in Modal mode")
+ print("• Running in Modal mode")
🧰 Tools

🪛 Ruff (0.8.2)

144-144: f-string without any placeholders

Remove extraneous f prefix

(F541)

145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 144-144: f-string without any placeholders. Remove extraneous f prefix

[error] 145-145: f-string without any placeholders. Remove extraneous f prefix

67-120: Remove large block of commented-out code for clarity

Lines 67-120 contain a large block of commented-out code. Keeping obsolete or commented-out code can clutter the codebase and reduce readability. Consider removing it to improve code clarity.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter
Dockerfile.modal (1)
19-23: Remove redundant ENV and WORKDIR commands

The ENV PYTHONPATH=/app and WORKDIR /app commands are set twice (lines 6 and 22, and lines 19 and 23, respectively). This redundancy is unnecessary and can be removed to simplify the Dockerfile.

Apply this diff to remove the redundant commands:
- ENV PYTHONPATH=/app
- WORKDIR /app

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1b92ee and 2100122.

📒 Files selected for processing (4)

Dockerfile.modal (1 hunks)
entrypoint-old.sh (0 hunks)
eval_with_modal.py (1 hunks)
process_single_repo.py (1 hunks)

💤 Files with no reviewable changes (1)

entrypoint-old.sh

🧰 Additional context used

🪛 GitHub Actions: ruff format

process_single_repo.py

[warning] File requires formatting with Ruff formatter

eval_with_modal.py

[warning] File requires formatting with Ruff formatter

🪛 Ruff (0.8.2)

eval_with_modal.py

49-49: Redefinition of unused json from line 6

Remove definition: json

(F811)

125-125: Redefinition of unused json from line 6

Remove definition: json

(F811)

144-144: f-string without any placeholders

Remove extraneous f prefix

(F541)

145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: ruff lint

eval_with_modal.py

[error] 49-49: Redefinition of unused json from line 6

[error] 125-125: Redefinition of unused json from line 6

[error] 144-144: f-string without any placeholders. Remove extraneous f prefix

[error] 145-145: f-string without any placeholders. Remove extraneous f prefix

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: docker-compose-test

coderabbitai · 2025-01-19T12:20:11Z

eval_with_modal.py

+
+    # Install required dependencies
+    pip.main(['install', "pydantic>=2.0.0", "pydantic-settings>=2.0.0"])


🛠️ Refactor suggestion

Avoid installing packages at runtime; manage dependencies via Poetry

Installing packages at runtime using pip.main() is not recommended. Dependencies should be managed through poetry and installed prior to runtime to ensure consistency and avoid potential issues.

Consider adding pydantic>=2.0.0 and pydantic-settings>=2.0.0 to your pyproject.toml and installing them via poetry during the image build process.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-19T12:20:11Z

eval_with_modal.py

+#     # if disable_cognee:
+#     #     cmd.append("--disable-cognee")
+#     #
+#     # subprocess.run(cmd, cwd="/app", check=True, env={
+#     #             "ENV": os.getenv('ENV'),
+#     #             "LLM_API_KEY": os.getenv("LLM_API_KEY")
+#     #         })
+#     venv_python = os.path.join("venv", "bin", "python")  # Use "Scripts" instead of "bin" on Windows
+#
+#     cmd = [
+#         "poetry",
+#         "run",
+#         "process_single_repo.py",
+#         f"--instance_json={instance_json_str}",
+#     ]
+#     if disable_cognee:
+#         cmd.append("--disable-cognee")
+#
+#     subprocess.run(cmd, cwd="/app", check=True, env={
+#         "ENV": os.getenv('ENV'),
+#         "LLM_API_KEY": os.getenv("LLM_API_KEY")
+#     })
+#     instance_id = instance_data["instance_id"]
+#     filename = f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_id}.json"
+#     path_in_container = os.path.join("/app", filename)
+#
+#     if os.path.exists(path_in_container):
+#         with open(path_in_container, "r") as f:
+#             content = f.read()
+#         return (filename, content)
+#     else:
+#         return (filename, "")
+
+
+@app.local_entrypoint()
+async def main(disable_cognee: bool = False, num_samples: int = 2):
+    import subprocess
+    import json
+    import os
+    def check_install_package(package_name):
+        """Check if a pip package is installed and install it if not."""
+        try:
+            __import__(package_name)
+            return True
+        except ImportError:
+            try:
+                subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
+                return True
+            except subprocess.CalledProcessError:
+                return False
+
+    for dependency in ["transformers", "sentencepiece", "swebench","python-dotenv"]:
+        check_install_package(dependency)
+
+    from swebench.harness.utils import load_swebench_dataset
+
+    print(f"Configuration:")
+    print(f"• Running in Modal mode")
+    print(f"• Disable Cognee: {disable_cognee}")
+    print(f"• Number of samples: {num_samples}")
+
+    dataset_name = (
+        "princeton-nlp/SWE-bench_Lite_bm25_13K" if disable_cognee
+        else "princeton-nlp/SWE-bench_Lite"
+    )
+
+    swe_dataset = load_swebench_dataset(dataset_name, split="test")
+    swe_dataset = swe_dataset[:num_samples]
+
+    print(f"Processing {num_samples} samples from {dataset_name}")
+    import pip
+
+    # Install required dependencies
+    pip.main(['install', "pydantic>=2.0.0", "pydantic-settings>=2.0.0"])
+
+    tasks = [
+        run_single_repo.remote(instance, disable_cognee=disable_cognee)
+        for instance in swe_dataset
+    ]
+    import asyncio
+    # Run all tasks concurrently
+    results = await asyncio.gather(*tasks)
+
+    # Process results
+    merged = []
+    for filename, content in results:
+        if content:
+            with open(filename, "w") as f:
+                f.write(content)
+            print(f"Saved {filename} locally.")
+            merged.append(json.loads(content))
+
+    # Save merged results
+    merged_filename = "merged_nocognee.json" if disable_cognee else "merged_cognee.json"
+    with open(merged_filename, "w") as f:
+        json.dump(merged, f, indent=2)
+
+    print(f"Merged {len(merged)} repos into {merged_filename}!")
+    print("Done!")
+


⚠️ Potential issue

Format code using Ruff formatter

The file requires formatting according to Ruff formatter, as indicated by the pipeline failures. Please format the code to comply with the project's style guidelines.

🧰 Tools

🪛 Ruff (0.8.2)

49-49: Redefinition of unused json from line 6

Remove definition: json

(F811)

125-125: Redefinition of unused json from line 6

Remove definition: json

(F811)

144-144: f-string without any placeholders

Remove extraneous f prefix

(F541)

145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 49-49: Redefinition of unused json from line 6

[error] 125-125: Redefinition of unused json from line 6

[error] 144-144: f-string without any placeholders. Remove extraneous f prefix

[error] 145-145: f-string without any placeholders. Remove extraneous f prefix

coderabbitai · 2025-01-19T12:20:11Z

eval_with_modal.py

+@app.local_entrypoint()
+async def main(disable_cognee: bool = False, num_samples: int = 2):
+    import subprocess
+    import json


⚠️ Potential issue

Remove duplicate import of json module

The json module is already imported at line 6. The re-import at line 125 is unnecessary and can be removed to avoid redundancy.

Apply this diff to remove the redundant import:

- import json

🧰 Tools

🪛 Ruff (0.8.2)

125-125: Redefinition of unused json from line 6

Remove definition: json

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 125-125: Redefinition of unused json from line 6

coderabbitai · 2025-01-19T12:20:12Z

eval_with_modal.py

+)
+async def run_single_repo(instance_data: dict, disable_cognee: bool = False):
+    import os
+    import json


⚠️ Potential issue

Remove duplicate import of json module

The json module is already imported at line 6. The re-import at line 49 is unnecessary and can be removed to avoid redundancy.

Apply this diff to remove the redundant import:

- import json

🧰 Tools

🪛 Ruff (0.8.2)

49-49: Redefinition of unused json from line 6

Remove definition: json

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 49-49: Redefinition of unused json from line 6

coderabbitai · 2025-01-19T12:20:12Z

process_single_repo.py

+
+async def generate_patch_with_cognee(instance):
+    repo_path = download_github_repo(instance, "../RAW_GIT_REPOS")
+    include_docs = True
+    problem_statement = instance["problem_statement"]
+    instructions = read_query_prompt("patch_gen_kg_instructions.txt")
+
+    async for result in run_code_graph_pipeline(repo_path, include_docs=include_docs):
+        print(result)
+
+    retrieved_codeparts = await code_description_to_code_part_search(
+        problem_statement, include_docs=include_docs
+    )
+
+    prompt = "\n".join(
+        [
+            problem_statement,
+            "<patch>",
+            PATCH_EXAMPLE,
+            "</patch>",
+            "Additional context to solve the problem:",
+            retrieved_codeparts,
+        ]
+    )
+
+    llm_client = get_llm_client()
+    answer_prediction = await llm_client.acreate_structured_output(
+        text_input=prompt,
+        system_prompt=instructions,
+        response_model=str,
+    )
+
+    return answer_prediction
+
+
+async def generate_patch_without_cognee(instance, llm_client):
+    instructions = read_query_prompt("patch_gen_instructions.txt")
+    answer_prediction = await llm_client.acreate_structured_output(
+        text_input=instance["text"],
+        system_prompt=instructions,
+        response_model=str,
+    )
+    return answer_prediction
+
+
+async def process_repo(instance, disable_cognee=False):
+    """
+    Process a single repository (a single instance).
+    """
+    if not disable_cognee:
+        model_patch = await generate_patch_with_cognee(instance)
+        model_name = "with_cognee"
+    else:
+        llm_client = get_llm_client()
+        model_patch = await generate_patch_without_cognee(instance, llm_client)
+        model_name = "without_cognee"
+
+    return {
+        "instance_id": instance["instance_id"],
+        "model_patch": model_patch,
+        "model_name_or_path": model_name,
+    }
+
+
+async def main():
+    """
+    Main entry: expects a single repository (instance) in JSON form.
+    Example usage:
+      python process_single_repo.py --instance_json='{"instance_id": "abc123", ...}'
+      or called as an imported function from Modal.
+    """
+    parser = argparse.ArgumentParser(description="Process a single repo from SWE-Bench")
+    parser.add_argument("--instance_json", type=str, required=True)
+    parser.add_argument("--disable-cognee", action="store_true", help="Disable Cognee for evaluation")
+    args = parser.parse_args()
+
+    # Install dependencies if needed
+    for dependency in ["transformers", "sentencepiece", "swebench"]:
+        check_install_package(dependency)
+
+    # Parse the instance JSON from CLI
+    instance = json.loads(args.instance_json)
+
+    # Get the prediction
+    result = await process_repo(instance, disable_cognee=args.disable_cognee)
+
+    # Construct a file name for the single result
+    instance_id = instance["instance_id"]
+    out_name = f"pred_{'nocognee' if args.disable_cognee else 'cognee'}_{instance_id}.json"
+
+    with open(out_name, "w") as f:
+        json.dump(result, f, indent=2)
+
+    print(f"Finished processing instance_id={instance_id}. Saved to {out_name}")
+
+
+if __name__ == "__main__":
+    import asyncio
+
+    asyncio.run(main(), debug=True)


⚠️ Potential issue

Format code using Ruff formatter

The file requires formatting according to Ruff formatter, as indicated by the pipeline failures. Please format the code to comply with the project's style guidelines.

🧰 Tools

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

[warning] File requires formatting with Ruff formatter

coderabbitai · 2025-01-19T12:20:12Z

Dockerfile.modal

+RUN pip install python-dotenv
+RUN pip install transformers
+RUN pip install swebench
+RUN pip install sentencepiece


🛠️ Refactor suggestion

Consolidate dependency management with Poetry

The packages python-dotenv, transformers, swebench, and sentencepiece are being installed separately using pip after running poetry install. To maintain consistency and proper dependency management, it's recommended to add these packages to your pyproject.toml and install them via poetry.

Apply this diff to remove the pip installations:

- RUN pip install python-dotenv - RUN pip install transformers - RUN pip install swebench - RUN pip install sentencepiece

Then add these dependencies to your pyproject.toml and re-run poetry install during the build process.

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (1)

eval_with_modal.py (1)

160-162: ⚠️ Potential issue

Avoid installing packages at runtime

Installing packages at runtime using pip.main() is not recommended for security and reliability reasons.

Add these dependencies to your pyproject.toml and install them during the image build process.

🧹 Nitpick comments (5)

eval_with_modal.py (3)

19-19: Fix inconsistent spacing in path definition

Remove extra spaces in the parentheses for better code style.

-MODAL_DOCKERFILE_PATH = Path( "Dockerfile.modal")
+MODAL_DOCKERFILE_PATH = Path("Dockerfile.modal")

34-39: Improve readability of Modal image configuration

The image configuration is too condensed and hard to read. Consider breaking it into multiple lines for better maintainability.

-image = modal.Image.from_dockerfile(
-    path=MODAL_DOCKERFILE_PATH,
-    gpu="T4",
-    force_build=False,
-    ignore=IGNORE_PATTERNS
-).copy_local_file("pyproject.toml", "pyproject.toml").copy_local_file("poetry.lock", "poetry.lock").env({"ENV": os.getenv('ENV'), "LLM_API_KEY": os.getenv("LLM_API_KEY")}).poetry_install_from_file(poetry_pyproject_toml="pyproject.toml")
+image = (
+    modal.Image.from_dockerfile(
+        path=MODAL_DOCKERFILE_PATH,
+        gpu="T4",
+        force_build=False,
+        ignore=IGNORE_PATTERNS
+    )
+    .copy_local_file("pyproject.toml", "pyproject.toml")
+    .copy_local_file("poetry.lock", "poetry.lock")
+    .env({
+        "ENV": os.getenv('ENV'),
+        "LLM_API_KEY": os.getenv("LLM_API_KEY")
+    })
+    .poetry_install_from_file(poetry_pyproject_toml="pyproject.toml")
+)

145-146: Remove unnecessary f-strings

These strings don't contain any placeholders, so the f-prefix is unnecessary.

-    print(f"Configuration:")
-    print(f"• Running in Modal mode")
+    print("Configuration:")
+    print("• Running in Modal mode")

🧰 Tools

🪛 Ruff (0.8.2)

145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

146-146: f-string without any placeholders

Remove extraneous f prefix

(F541)

cognee-mcp/cognee_mcp/server.py (1)

Line range hint 176-183: Consider adding error handling for the search operation.

The search operation could potentially fail, but there's no explicit error handling. Consider wrapping the search operation in a try-catch block to handle potential exceptions gracefully.

-                search_results = await cognee.search(SearchType.COMPLETION, query_text=search_query)
-
-                results = retrieved_edges_to_string(search_results)
-
-                return [
-                    types.TextContent(
-                        type="text",
-                        text=results,
-                    )
-                ]
+                try:
+                    search_results = await cognee.search(SearchType.COMPLETION, query_text=search_query)
+                    results = retrieved_edges_to_string(search_results)
+                    return [
+                        types.TextContent(
+                            type="text",
+                            text=results,
+                        )
+                    ]
+                except Exception as e:
+                    raise ValueError(f"Search operation failed: {str(e)}")

pyproject.toml (1)

80-103: Standardize version constraint format.

The dependencies use inconsistent version constraint formats:

Some use ^ (e.g., protobuf = "^5.29.3")

Others use exact versions (e.g., asttokens = "3.0.0")

Consider standardizing to use ^ for all dependencies unless there's a specific reason for pinning exact versions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2100122 and 34f3a97.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

Dockerfile.modal (1 hunks)
cognee-mcp/cognee_mcp/server.py (1 hunks)
eval_with_modal.py (1 hunks)
pyproject.toml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Dockerfile.modal

🧰 Additional context used

🪛 Ruff (0.8.2)

eval_with_modal.py

50-50: Redefinition of unused json from line 6

Remove definition: json

(F811)

126-126: Redefinition of unused json from line 6

Remove definition: json

(F811)

145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

146-146: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (4)

eval_with_modal.py (2)

49-51: Remove duplicate imports

The json module is already imported at line 6. The re-import is unnecessary.
-    import json
🧰 Tools

🪛 Ruff (0.8.2)

50-50: Redefinition of unused json from line 6

Remove definition: json

(F811)

125-127: Remove duplicate imports

The json module is already imported at line 6. The re-import is unnecessary.
-    import json
🧰 Tools

🪛 Ruff (0.8.2)

126-126: Redefinition of unused json from line 6

Remove definition: json

(F811)
pyproject.toml (2)

87-89: Verify HTTP stack compatibility.

The addition of low-level HTTP dependencies (h2, hpack, hyperframe) alongside requests-toolbelt might cause version conflicts with the existing requests = "2.32.3" dependency.

Run this script to check for potential conflicts:

Also applies to: 99-99

✅ Verification successful

HTTP stack dependencies appear compatible

The new HTTP-related dependencies (h2, hpack, hyperframe) are not directly used in the codebase and appear to be indirect dependencies. The existing requests usage in the codebase is isolated and doesn't interact with these packages, suggesting minimal risk of conflicts.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check compatibility between HTTP-related packages

# Get dependency tree for requests
poetry show requests --tree > /tmp/requests_deps

# Check if any of the new dependencies conflict
for dep in h2 hpack hyperframe requests-toolbelt; do
  if grep -q "^$dep" /tmp/requests_deps; then
    echo "Potential conflict: $dep is already a dependency of requests"
  fi
done
Length of output: 496

Script:
#!/bin/bash
# Search for imports and requirements files
echo "=== Searching for imports ==="
rg "^from (requests|h2|hpack|hyperframe|requests_toolbelt) import" -g "!{test,tests}/*"

echo -e "\n=== Searching for requirements files ==="
fd "requirements.*\.txt$"

echo -e "\n=== Checking package imports co-location ==="
rg "import (requests|h2|hpack|hyperframe|requests_toolbelt)" -l | while read -r file; do
  echo "=== $file ==="
  rg "^import .*(requests|h2|hpack|hyperframe|requests_toolbelt).*" "$file"
done
Length of output: 864
90-92: Verify if notebook-related dependencies are necessary.

The project already includes notebook-related dependencies in [tool.poetry.extras] under the notebook extra. The following dependencies might be redundant:

ipython

matplotlib-inline

Consider removing these if they're already covered by the notebook extras.

Run this script to check for dependency overlap:

coderabbitai · 2025-01-19T19:49:04Z

eval_with_modal.py

+# async def run_single_repo(instance_data: dict, disable_cognee: bool = False):
+#     import subprocess
+#     import json
+#     import os
+#
+#     # Install project dependencies
+#     subprocess.run(
+#         ["poetry", "install", "--no-interaction"],
+#         cwd="/app",
+#         check=True
+#     )
+#
+#     instance_json_str = json.dumps(instance_data)
+#
+#     # cmd = [
+#     #     "poetry",
+#     #     "run",
+#     #     "python",
+#     #     "process_single_repo.py",
+#     #     f"--instance_json={instance_json_str}",
+#     # ]
+#     # if disable_cognee:
+#     #     cmd.append("--disable-cognee")
+#     #
+#     # subprocess.run(cmd, cwd="/app", check=True, env={
+#     #             "ENV": os.getenv('ENV'),
+#     #             "LLM_API_KEY": os.getenv("LLM_API_KEY")
+#     #         })
+#     venv_python = os.path.join("venv", "bin", "python")  # Use "Scripts" instead of "bin" on Windows
+#
+#     cmd = [
+#         "poetry",
+#         "run",
+#         "process_single_repo.py",
+#         f"--instance_json={instance_json_str}",
+#     ]
+#     if disable_cognee:
+#         cmd.append("--disable-cognee")
+#
+#     subprocess.run(cmd, cwd="/app", check=True, env={
+#         "ENV": os.getenv('ENV'),
+#         "LLM_API_KEY": os.getenv("LLM_API_KEY")
+#     })
+#     instance_id = instance_data["instance_id"]
+#     filename = f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_id}.json"
+#     path_in_container = os.path.join("/app", filename)
+#
+#     if os.path.exists(path_in_container):
+#         with open(path_in_container, "r") as f:
+#             content = f.read()
+#         return (filename, content)
+#     else:
+#         return (filename, "")
+


🛠️ Refactor suggestion

Remove commented out code

Large blocks of commented out code should be removed as they're already tracked in version control. This improves code readability and maintainability.

coderabbitai · 2025-01-19T19:49:04Z

eval_with_modal.py

+    ]
+    import asyncio
+    # Run all tasks concurrently
+    results = await asyncio.gather(*tasks)


🛠️ Refactor suggestion

Add error handling for concurrent execution

The asyncio.gather() call should include error handling to manage potential failures in concurrent tasks.

- results = await asyncio.gather(*tasks) + try: + results = await asyncio.gather(*tasks) + except Exception as e: + print(f"Error during concurrent execution: {e}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

results = await asyncio.gather(*tasks)

try:

results = await asyncio.gather(*tasks)

except Exception as e:

print(f"Error during concurrent execution: {e}")

raise

coderabbitai · 2025-01-19T19:49:04Z

eval_with_modal.py

+    with open(path_in_container, "w") as f:
+        json.dump(result, f, indent=2)
+
+    with open(path_in_container, "r") as f:
+        content = f.read()
+


🛠️ Refactor suggestion

Add error handling for file operations

File operations should include proper error handling to manage potential I/O exceptions.

- with open(path_in_container, "w") as f: - json.dump(result, f, indent=2) - - with open(path_in_container, "r") as f: - content = f.read() + try: + with open(path_in_container, "w") as f: + json.dump(result, f, indent=2) + + with open(path_in_container, "r") as f: + content = f.read() + except IOError as e: + print(f"Error handling file {path_in_container}: {e}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with open(path_in_container, "w") as f:

json.dump(result, f, indent=2)

with open(path_in_container, "r") as f:

content = f.read()

try:

with open(path_in_container, "w") as f:

json.dump(result, f, indent=2)

with open(path_in_container, "r") as f:

content = f.read()

except IOError as e:

print(f"Error handling file {path_in_container}: {e}")

raise

coderabbitai · 2025-01-19T19:49:05Z

cognee-mcp/cognee_mcp/server.py

@@ -173,7 +173,7 @@ async def handle_call_tool(

                search_query = arguments.get("query")

-                search_results = await cognee.search(SearchType.INSIGHTS, query_text=search_query)
+                search_results = await cognee.search(SearchType.COMPLETION, query_text=search_query)


💡 Codebase verification

Incorrect search type usage may cause runtime errors

The change from SearchType.INSIGHTS to SearchType.COMPLETION is problematic because:

The retrieved_edges_to_string function expects search results in triplet format (node1, edge, node2), which is provided by SearchType.INSIGHTS

SearchType.COMPLETION uses a different task module that may not return results in the expected format

Consider using SearchType.GRAPH_COMPLETION if graph-based completion is needed, or keep using SearchType.INSIGHTS for triplet-based results

🔗 Analysis chain

Verify the impact of changing the search type from INSIGHTS to COMPLETION.

The change from SearchType.INSIGHTS to SearchType.COMPLETION could significantly alter the search behavior and results. Please ensure this change:

Aligns with the intended functionality in the Modal-based evaluation system

Maintains backward compatibility with existing clients

Returns the expected result format for the retrieved_edges_to_string function

Let's verify the SearchType usage across the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check how SearchType is used across the codebase rg -A 3 "SearchType\." # Check for any documentation or tests related to search types fd -e md -e rst -e txt | xargs rg -l "SearchType" # Look for any configuration files that might reference search types fd -e yaml -e json -e toml | xargs rg -l "search.*type"

Length of output: 13984

Script:

#!/bin/bash # Check the implementation of query_completion and retrieved_edges_to_string ast-grep --pattern 'async def query_completion($$$) { $$$ }' ast-grep --pattern 'def retrieved_edges_to_string($$$) { $$$ }' # Check for any tests or examples using COMPLETION type rg -B 2 -A 5 "query_completion|retrieved_edges_to_string"

Length of output: 9698

coderabbitai · 2025-01-19T19:49:05Z

pyproject.toml

+ipython = "8.31.0"
+jedi = "0.19.2"


🛠️ Refactor suggestion

Move development-related dependencies to the dev group.

Several dependencies appear to be development/debugging tools and should be moved to [tool.poetry.group.dev.dependencies]:

ipython, jedi (IDE/REPL support)

asttokens, executing, pure-eval, stack-data (debugging tools)

pexpect, ptyprocess (terminal interaction)

This helps keep the production dependencies lean and separates development tools from runtime requirements.

Also applies to: 81-85, 96-97, 101-102

coderabbitai · 2025-01-19T19:49:05Z

pyproject.toml

+h2 = "4.1.0"
+hpack = "4.0.0"
+hyperframe = "6.0.1"
+ipython = "8.31.0"


💡 Codebase verification

Consider using a more flexible version constraint for ipython

The specific version 8.31.0 is unnecessarily strict. Since the project uses IPython primarily for Jupyter notebooks and no code requires this exact version, consider using ^8.31.0 or >=8.31.0. Other dependencies only require >=7.23.1, and using a flexible constraint will help avoid future dependency conflicts while maintaining compatibility.

🔗 Analysis chain

Verify specific ipython version requirement.

The unusually specific version ipython = "8.31.0" might cause dependency conflicts. Consider using a more flexible version constraint unless this exact version is required.

Run this script to check for potential conflicts:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check if other dependencies require different ipython versions # Get all dependencies that depend on ipython poetry show --tree | rg -B 5 'ipython' > /tmp/ipython_deps # Check for version constraints cat /tmp/ipython_deps | rg 'ipython.*[0-9]' || echo "No version conflicts found"

Length of output: 244

Script:

#!/bin/bash # Search for ipython imports and usage echo "=== Searching for ipython imports and usage ===" rg -l "import.*ipython|from.*ipython" || echo "No ipython imports found" echo -e "\n=== Searching for ipython in documentation ===" fd -e md -e rst | xargs rg -l "ipython" || echo "No ipython mentions in docs" echo -e "\n=== Searching for ipython in test files ===" fd -e py "test_" | xargs rg -l "ipython" || echo "No ipython usage in tests" echo -e "\n=== Checking other requirement files ===" fd "requirements.*\.txt|setup\.py" | xargs rg "ipython" || echo "No other requirement files with ipython"

Length of output: 5132

Added base modal runner and changed eval script

a09250d

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Fix copy from local repo

469c94e

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Vasilije1990 added 2 commits January 16, 2025 23:51

Fix copy from local repo

45aeeef

fix and add ignore

d1b92ee

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

Vasilije1990 changed the title ~~Added base modal runner and changed eval script~~ feat: Added base modal runner and changed eval script Jan 17, 2025

borisarzentar reviewed Jan 17, 2025

View reviewed changes

Vasilije1990 added 3 commits January 19, 2025 13:14

fix and add ignore

9a20a0d

fix and add ignore

7b9a5d6

fix and add ignore

2100122

coderabbitai bot reviewed Jan 19, 2025

View reviewed changes

fix modal

34f3a97

coderabbitai bot reviewed Jan 19, 2025

View reviewed changes

Vasilije1990 closed this Jan 24, 2025

		# LOCAL_COGNEE_PATH = os.path.dirname(os.path.abspath(__file__))
		LOCAL_COGNEE_PATH = "/Users/vasilije/cognee"

		work_dir = "/root/cognee"
		subprocess.run(cmd, cwd=work_dir, check=True)



		@app.local_entrypoint()
		def main(disable_cognee: bool = False, num_samples: int = 2):


		# Install required dependencies
		pip.main(['install', "pydantic>=2.0.0", "pydantic-settings>=2.0.0"])

feat: Added base modal runner and changed eval script #448

feat: Added base modal runner and changed eval script #448

Conversation

Vasilije1990 commented Jan 16, 2025 • edited by coderabbitai bot Loading

Description

DCO Affirmation

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 16, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

borisarzentar Jan 17, 2025

Choose a reason for hiding this comment

Vasilije1990 Jan 17, 2025

Choose a reason for hiding this comment

borisarzentar Jan 17, 2025

Choose a reason for hiding this comment

borisarzentar Jan 17, 2025

Choose a reason for hiding this comment

Vasilije1990 Jan 17, 2025

Choose a reason for hiding this comment

borisarzentar Jan 17, 2025

Choose a reason for hiding this comment

Vasilije1990 Jan 17, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2025

Vasilije1990 commented Jan 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)