-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/cog 946 abstract eval dataset #418
Conversation
…. Json schema validation for datasets.
WalkthroughThe changes in Changes
Sequence DiagramsequenceDiagram
participant CLI as Command Line
participant Eval as Evaluation Script
participant Download as Dataset Downloader
participant Load as Dataset Loader
participant Cognee as Cognee System
CLI->>Eval: Select dataset
Eval->>Download: Download dataset
Download-->>Eval: Dataset file
Eval->>Load: Load dataset
Load-->>Eval: Parsed dataset
Eval->>Cognee: Evaluate with dataset
Cognee-->>Eval: Evaluation results
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
evals/eval_on_hotpot.py (2)
47-60
: Handle dataset download exceptions and improve user instructionsIn the
download_qa_dataset
function, consider automating the download and extraction of the "2wikimultihop" dataset or provide clearer instructions. Additionally, improve the string formatting for better readability.Apply this diff to enhance exception handling and string formatting:
def download_qa_dataset(dataset_name: str, dir: str): if dataset_name not in qa_datasets: raise ValueError(f"{dataset_name} is not a supported dataset.") url = qa_datasets[dataset_name]["URL"] if dataset_name == "2wikimultihop": - raise Exception("Please download 2wikimultihop dataset (data.zip) manually from \ - https://www.dropbox.com/scl/fi/heid2pkiswhfaqr5g0piw/data.zip?rlkey=ira57daau8lxfj022xvk1irju&e=1 \ - and unzip it.") + raise Exception("""Please download the 2wikimultihop dataset (data.zip) manually from +https://www.dropbox.com/scl/fi/heid2pkiswhfaqr5g0piw/data.zip?rlkey=ira57daau8lxfj022xvk1irju&e=1 +and unzip it into the specified directory.""") wget.download(url, out=dir)Alternatively, you might consider automating the download and extraction process using
requests
andzipfile
modules.
Line range hint
165-167
: Improve help message formatting for clarityThe help message for the
--metric
argument uses backslashes for line continuation, which may not render correctly. Use implicit string concatenation for better readability.Apply this diff to adjust the help message:
parser.add_argument("--metric", type=str, default="correctness_metric", - help="Valid options are Deepeval metrics (e.g. AnswerRelevancyMetric) \ - and metrics defined in evals/deepeval_metrics.py, e.g. f1_score_metric") + help=("Valid options are Deepeval metrics (e.g., AnswerRelevancyMetric) " + "and metrics defined in evals/deepeval_metrics.py, e.g., f1_score_metric"))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
evals/eval_on_hotpot.py
(5 hunks)
🔇 Additional comments (5)
evals/eval_on_hotpot.py (5)
11-11
: Importjsonschema
for dataset validationGood use of the
jsonschema
library to validate the structure of QA datasets, enhancing data integrity.
21-30
: Introduceqa_datasets
dictionary for flexible dataset handlingThe
qa_datasets
dictionary effectively stores metadata for multiple QA datasets, allowing for scalable and flexible evaluation across different datasets.
32-44
: Define JSON schema for QA dataset validationDefining
qa_json_schema
ensures that loaded datasets conform to the expected structure, which helps prevent runtime errors due to malformed data.
95-95
: Update dataset name to 'QA' in Cognee systemChanging the
dataset_name
to 'QA' when adding sentences ensures consistency across different datasets within the Cognee system.
21-30
: Verify the accessibility of dataset URLsThe URLs specified in
qa_datasets
may change or become unavailable. To ensure smooth execution, verify that these URLs are accessible and handle potential download errors gracefully.Run the following script to check the accessibility of the dataset URLs:
✅ Verification successful
Dataset URLs are accessible and functional ✅
Both dataset URLs are currently accessible:
- HotpotQA dataset URL is directly accessible
- 2WikiMultihop dataset URL properly redirects to the downloadable resource
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the accessibility of dataset URLs # List of dataset URLs urls=( "http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_fullwiki_v1.json" "https://www.dropbox.com/scl/fi/heid2pkiswhfaqr5g0piw/data.zip?rlkey=ira57daau8lxfj022xvk1irju&e=1" ) # Check each URL for url in "${urls[@]}"; do echo "Checking $url" status_code=$(curl -o /dev/null -s -w "%{http_code}\n" "$url") echo "HTTP Status Code: $status_code" doneLength of output: 1018
evals/eval_on_hotpot.py
Outdated
data_root_dir = get_absolute_path("../.data") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent use of Path
objects for file paths
To support path manipulations using the /
operator, data_root_dir
should be a Path
object rather than a string.
Apply this diff to fix the type inconsistency:
- data_root_dir = get_absolute_path("../.data")
+ data_root_dir = Path(get_absolute_path("../.data"))
Also applies to: 142-143
evals/eval_on_hotpot.py
Outdated
@@ -109,6 +160,7 @@ | |||
if __name__ == "__main__": | |||
parser = argparse.ArgumentParser() | |||
|
|||
parser.add_argument("--dataset", type=str, choices=list(qa_datasets.keys()), help="Which dataset to evaluate on") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the --dataset
argument required
The --dataset
argument is currently optional, but the code expects it to be provided. If not specified, args.dataset
will be None
, leading to a KeyError
. Make the argument required to prevent this issue.
Apply this diff to enforce the requirement:
parser = argparse.ArgumentParser()
- parser.add_argument("--dataset", type=str, choices=list(qa_datasets.keys()), help="Which dataset to evaluate on")
+ parser.add_argument("--dataset", type=str, choices=list(qa_datasets.keys()), required=True, help="Which dataset to evaluate on")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
parser.add_argument("--dataset", type=str, choices=list(qa_datasets.keys()), help="Which dataset to evaluate on") | |
parser.add_argument("--dataset", type=str, choices=list(qa_datasets.keys()), required=True, help="Which dataset to evaluate on") |
evals/eval_on_hotpot.py
Outdated
validate(instance=dataset, schema=qa_json_schema) | ||
except ValidationError as e: | ||
print("File is not a valid QA dataset:", e.message) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise an exception when dataset validation fails
Currently, if the dataset fails validation, the code prints an error message but continues execution, which could lead to downstream errors. To prevent this, raise an exception after catching the ValidationError
.
Apply this diff to handle validation errors appropriately:
try:
validate(instance=dataset, schema=qa_json_schema)
except ValidationError as e:
print("File is not a valid QA dataset:", e.message)
+ raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
validate(instance=dataset, schema=qa_json_schema) | |
except ValidationError as e: | |
print("File is not a valid QA dataset:", e.message) | |
validate(instance=dataset, schema=qa_json_schema) | |
except ValidationError as e: | |
print("File is not a valid QA dataset:", e.message) | |
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
evals/qa_dataset_utils.py
Outdated
@@ -0,0 +1,74 @@ | |||
from cognee.root_dir import get_absolute_path | |||
import json | |||
import wget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wget
is not in pyproject.toml
. Also, it's a package that has not been maintained since 2015. Maybe we can use the built in requests.get
instead of wget.download
?
QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
Summary by CodeRabbit
New Features
Improvements