Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added base modal runner and changed eval script #448

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions Dockerfile.modal
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
FROM python:3.11-slim

# Set environment variables
ENV PIP_NO_CACHE_DIR=true
ENV PATH="${PATH}:/root/.poetry/bin"
ENV PYTHONPATH=/app
ENV RUN_MODE=modal
ENV SKIP_MIGRATIONS=true

# System dependencies
RUN apt-get update && apt-get install -y \
gcc \
libpq-dev \
git \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app


ENV PYTHONPATH=/app
WORKDIR /app
COPY pyproject.toml poetry.lock /app/


RUN pip install poetry

# Don't create virtualenv since docker is already isolated
RUN poetry config virtualenvs.create false

# Install the dependencies
RUN poetry install --all-extras --no-root --without dev

RUN pip install python-dotenv
RUN pip install transformers
RUN pip install swebench
RUN pip install sentencepiece
Comment on lines +28 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate dependency management with Poetry

The packages python-dotenv, transformers, swebench, and sentencepiece are being installed separately using pip after running poetry install. To maintain consistency and proper dependency management, it's recommended to add these packages to your pyproject.toml and install them via poetry.

Apply this diff to remove the pip installations:

- RUN pip install python-dotenv
- RUN pip install transformers
- RUN pip install swebench
- RUN pip install sentencepiece

Then add these dependencies to your pyproject.toml and re-run poetry install during the build process.

Committable suggestion skipped: line range outside the PR's diff.



COPY cognee/ /app/cognee
COPY evals/ /app/evals
COPY README.md /app/README.md
COPY process_single_repo.py /app/

35 changes: 0 additions & 35 deletions entrypoint-old.sh

This file was deleted.

187 changes: 187 additions & 0 deletions eval_with_modal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# File: eval_with_modal.py
from shutil import ignore_patterns

import modal
import os
import json
from pathlib import Path
from typing import Optional
import sys
import dotenv

dotenv.load_dotenv()
app = modal.App("cognee-runner")

# Get the parent directory path
PARENT_DIR = Path(__file__).resolve().parent.parent


MODAL_DOCKERFILE_PATH = Path( "Dockerfile.modal")


# Define ignore patterns
IGNORE_PATTERNS = [
".venv/**/*",
"__pycache__",
"*.pyc",
".git",
".pytest_cache",
"*.egg-info",
"RAW_GIT_REPOS/**/*"
]

# Create image from Modal-specific Dockerfile
image = modal.Image.from_dockerfile(
path=MODAL_DOCKERFILE_PATH,
gpu="T4",
force_build=False,
ignore=IGNORE_PATTERNS
).copy_local_file("pyproject.toml", "pyproject.toml").copy_local_file("poetry.lock", "poetry.lock").env({"ENV": os.getenv('ENV'), "LLM_API_KEY": os.getenv("LLM_API_KEY")}).poetry_install_from_file(poetry_pyproject_toml="pyproject.toml")


@app.function(
image=image,
gpu="T4",
concurrency_limit=5
)
async def run_single_repo(instance_data: dict, disable_cognee: bool = False):
import os
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate import of json module

The json module is already imported at line 6. The re-import at line 49 is unnecessary and can be removed to avoid redundancy.

Apply this diff to remove the redundant import:

- import json
🧰 Tools
🪛 Ruff (0.8.2)

49-49: Redefinition of unused json from line 6

Remove definition: json

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 49-49: Redefinition of unused json from line 6

from process_single_repo import process_repo # Import the async function directly

# Process the instance
result = await process_repo(instance_data, disable_cognee=disable_cognee)

# Save the result
instance_id = instance_data["instance_id"]
filename = f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_id}.json"
path_in_container = os.path.join("/app", filename)

with open(path_in_container, "w") as f:
json.dump(result, f, indent=2)

with open(path_in_container, "r") as f:
content = f.read()

Comment on lines +61 to +66
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for file operations

File operations should include proper error handling to manage potential I/O exceptions.

-    with open(path_in_container, "w") as f:
-        json.dump(result, f, indent=2)
-
-    with open(path_in_container, "r") as f:
-        content = f.read()
+    try:
+        with open(path_in_container, "w") as f:
+            json.dump(result, f, indent=2)
+
+        with open(path_in_container, "r") as f:
+            content = f.read()
+    except IOError as e:
+        print(f"Error handling file {path_in_container}: {e}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(path_in_container, "w") as f:
json.dump(result, f, indent=2)
with open(path_in_container, "r") as f:
content = f.read()
try:
with open(path_in_container, "w") as f:
json.dump(result, f, indent=2)
with open(path_in_container, "r") as f:
content = f.read()
except IOError as e:
print(f"Error handling file {path_in_container}: {e}")
raise

return (filename, content)
# async def run_single_repo(instance_data: dict, disable_cognee: bool = False):
# import subprocess
# import json
# import os
#
# # Install project dependencies
# subprocess.run(
# ["poetry", "install", "--no-interaction"],
# cwd="/app",
# check=True
# )
#
# instance_json_str = json.dumps(instance_data)
#
# # cmd = [
# # "poetry",
# # "run",
# # "python",
# # "process_single_repo.py",
# # f"--instance_json={instance_json_str}",
# # ]
# # if disable_cognee:
# # cmd.append("--disable-cognee")
# #
# # subprocess.run(cmd, cwd="/app", check=True, env={
# # "ENV": os.getenv('ENV'),
# # "LLM_API_KEY": os.getenv("LLM_API_KEY")
# # })
# venv_python = os.path.join("venv", "bin", "python") # Use "Scripts" instead of "bin" on Windows
#
# cmd = [
# "poetry",
# "run",
# "process_single_repo.py",
# f"--instance_json={instance_json_str}",
# ]
# if disable_cognee:
# cmd.append("--disable-cognee")
#
# subprocess.run(cmd, cwd="/app", check=True, env={
# "ENV": os.getenv('ENV'),
# "LLM_API_KEY": os.getenv("LLM_API_KEY")
# })
# instance_id = instance_data["instance_id"]
# filename = f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_id}.json"
# path_in_container = os.path.join("/app", filename)
#
# if os.path.exists(path_in_container):
# with open(path_in_container, "r") as f:
# content = f.read()
# return (filename, content)
# else:
# return (filename, "")

Comment on lines +68 to +121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove commented out code

Large blocks of commented out code should be removed as they're already tracked in version control. This improves code readability and maintainability.


@app.local_entrypoint()
async def main(disable_cognee: bool = False, num_samples: int = 2):
import subprocess
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate import of json module

The json module is already imported at line 6. The re-import at line 125 is unnecessary and can be removed to avoid redundancy.

Apply this diff to remove the redundant import:

- import json
🧰 Tools
🪛 Ruff (0.8.2)

125-125: Redefinition of unused json from line 6

Remove definition: json

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 125-125: Redefinition of unused json from line 6

import os
def check_install_package(package_name):
"""Check if a pip package is installed and install it if not."""
try:
__import__(package_name)
return True
except ImportError:
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
return True
except subprocess.CalledProcessError:
return False

for dependency in ["transformers", "sentencepiece", "swebench","python-dotenv"]:
check_install_package(dependency)

from swebench.harness.utils import load_swebench_dataset

print(f"Configuration:")
print(f"• Running in Modal mode")
print(f"• Disable Cognee: {disable_cognee}")
print(f"• Number of samples: {num_samples}")

dataset_name = (
"princeton-nlp/SWE-bench_Lite_bm25_13K" if disable_cognee
else "princeton-nlp/SWE-bench_Lite"
)

swe_dataset = load_swebench_dataset(dataset_name, split="test")
swe_dataset = swe_dataset[:num_samples]

print(f"Processing {num_samples} samples from {dataset_name}")
import pip

# Install required dependencies
pip.main(['install', "pydantic>=2.0.0", "pydantic-settings>=2.0.0"])
Comment on lines +160 to +162
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid installing packages at runtime; manage dependencies via Poetry

Installing packages at runtime using pip.main() is not recommended. Dependencies should be managed through poetry and installed prior to runtime to ensure consistency and avoid potential issues.

Consider adding pydantic>=2.0.0 and pydantic-settings>=2.0.0 to your pyproject.toml and installing them via poetry during the image build process.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter


tasks = [
run_single_repo.remote(instance, disable_cognee=disable_cognee)
for instance in swe_dataset
]
import asyncio
# Run all tasks concurrently
results = await asyncio.gather(*tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for concurrent execution

The asyncio.gather() call should include error handling to manage potential failures in concurrent tasks.

-    results = await asyncio.gather(*tasks)
+    try:
+        results = await asyncio.gather(*tasks)
+    except Exception as e:
+        print(f"Error during concurrent execution: {e}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
results = await asyncio.gather(*tasks)
try:
results = await asyncio.gather(*tasks)
except Exception as e:
print(f"Error during concurrent execution: {e}")
raise


# Process results
merged = []
for filename, content in results:
if content:
with open(filename, "w") as f:
f.write(content)
print(f"Saved {filename} locally.")
merged.append(json.loads(content))

# Save merged results
merged_filename = "merged_nocognee.json" if disable_cognee else "merged_cognee.json"
with open(merged_filename, "w") as f:
json.dump(merged, f, indent=2)

print(f"Merged {len(merged)} repos into {merged_filename}!")
print("Done!")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Format code using Ruff formatter

The file requires formatting according to Ruff formatter, as indicated by the pipeline failures. Please format the code to comply with the project's style guidelines.

🧰 Tools
🪛 Ruff (0.8.2)

49-49: Redefinition of unused json from line 6

Remove definition: json

(F811)


125-125: Redefinition of unused json from line 6

Remove definition: json

(F811)


144-144: f-string without any placeholders

Remove extraneous f prefix

(F541)


145-145: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 49-49: Redefinition of unused json from line 6


[error] 125-125: Redefinition of unused json from line 6


[error] 144-144: f-string without any placeholders. Remove extraneous f prefix


[error] 145-145: f-string without any placeholders. Remove extraneous f prefix

14 changes: 14 additions & 0 deletions evals/get_started.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import modal

app = modal.App("example-get-started")


@app.function()
def square(x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will turn it into a test

print("This code is running on a remote worker!")
return x**2


@app.local_entrypoint()
def main():
print("the square is", square.remote(42))
Loading
Loading