Support 4 different rag options in eval #439

alekszievr · 2025-01-14T13:44:38Z

Summary by CodeRabbit

New Features
- Introduced a flexible context retrieval system for question-answering tasks.
- Added multiple context provider strategies (raw context, simple RAG, brute-force triplet search).
- Implemented a new command-line option for selecting context retrieval method.
Refactor
- Removed dependency on previous context retrieval implementation.
- Replaced cognee-specific context management with a more modular approach.
Chores
- Updated evaluation script to support new context provider selection mechanism.

…. Json schema validation for datasets.

…mprove-metric-selection

coderabbitai · 2025-01-14T13:44:45Z

Walkthrough

The pull request introduces substantial modifications to the context retrieval mechanisms in the evaluation scripts. The eval_on_hotpot.py file transitions from using the cognee library to a new context provider system, removing the get_context_with_cognee function and adding the answer_qa_instance function. A new module, qa_context_provider_utils.py, is created to facilitate various context retrieval strategies, enhancing the flexibility of the evaluation process for question-answering tasks.

Changes

File	Changes
`evals/eval_on_hotpot.py`	- Removed `get_context_with_cognee` function - Added `answer_qa_instance` function - Modified `eval_on_QA_dataset` to use context provider name - Updated command-line argument from `--with_cognee` to `--rag_option`
`evals/qa_context_provider_utils.py`	- Added new async context retrieval methods: - `get_raw_context` - `cognify_instance` - `get_context_with_cognee` - `get_context_with_simple_rag` - `get_context_with_brute_force_triplet_search` - Created `qa_context_providers` dictionary mapping context retrieval strategies

Possibly related PRs

Feat/cog 946 abstract eval dataset #418: Changes in eval_on_hotpot.py regarding the function eval_on_QA_dataset and the introduction of command-line arguments are directly related to the modifications made in the same file, indicating a shared focus on enhancing the evaluation process for QA datasets.

Suggested reviewers

lxobr

Poem

🐰 A Rabbit's Ode to Context Retrieval 🔍
From Cognee's grasp, we now break free,
Providers dance with agile glee
Context flows like river's might
Flexible, swift, a coder's delight!
RAG options bloom, no limits seen

Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

evals/eval_on_hotpot.py (1)
Line range hint 17-29: Remove duplicate and incomplete definition of 'answer_qa_instance'

The function answer_qa_instance is defined twice. The first definition (lines 17-20) is incomplete and contains errors:

context is assigned but never used.

search_results is referenced but not defined.

This duplication leads to confusion and errors in the code. Remove the first, incomplete definition to retain only the correct implementation.

Apply this diff to remove the duplicate function:
 async def answer_qa_instance(instance, context_provider):
-    context = await context_provider(instance)
-
-    search_results_str = "\n".join([context_item["text"] for context_item in search_results])
-
-    return search_results_str

 async def get_context_without_cognee(instance):
     return instance["context"]
🧰 Tools

🪛 Ruff (0.8.2)

18-18: Local variable context is assigned to but never used

Remove assignment to unused variable context

(F841)

20-20: Undefined name search_results

(F821)

🪛 GitHub Actions: ruff lint

[error] 18-18: Local variable context is assigned to but never used

[error] 20-20: Undefined name search_results

🧹 Nitpick comments (4)

evals/qa_context_provider_utils.py (4)
13-14: Potential performance impact due to frequent data pruning

The cognify_instance function calls await cognee.prune.prune_data() and await cognee.prune.prune_system(metadata=True) every time it's executed. Repeatedly pruning data and system metadata may lead to significant performance degradation, especially if cognify_instance is called frequently. Consider optimizing the pruning strategy, such as pruning less often or only when necessary.

28-28: Handle potential KeyError when accessing 'text' in search results

In the list comprehension [context_item["text"] for context_item in search_results], there is an assumption that every context_item contains a "text" key. If any item lacks this key, a KeyError will be raised. Consider adding error handling to account for missing keys.

Apply this diff to add error handling:
 search_results_str = "\n".join(
-    [context_item["text"] for context_item in search_results]
+    [context_item.get("text", "") for context_item in search_results]
 )
39-39: Ensure 'text' key exists in found chunks

In get_context_with_simple_rag, the comprehension [context_item.payload["text"] for context_item in found_chunks] assumes that the "text" key exists in context_item.payload. If it's missing, a KeyError will occur. Consider verifying the presence of the key or using .get().

Apply this diff to prevent potential errors:
 search_results_str = "\n".join(
-    [context_item.payload["text"] for context_item in found_chunks]
+    [context_item.payload.get("text", "") for context_item in found_chunks]
 )
22-22: Optimize repeated calls to 'cognify_instance'

Each of the context provider functions (get_context_with_cognee, get_context_with_simple_rag, get_context_with_brute_force_triplet_search) calls await cognify_instance(instance). This may lead to redundant processing if the same instance is handled multiple times. Consider restructuring the code to call cognify_instance once per instance or caching the result.

Also applies to: 34-34, 45-45

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6653d73 and 3a8f62e.

📒 Files selected for processing (2)

evals/eval_on_hotpot.py (3 hunks)
evals/qa_context_provider_utils.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

evals/eval_on_hotpot.py

18-18: Local variable context is assigned to but never used

Remove assignment to unused variable context

(F841)

🪛 GitHub Actions: ruff lint

evals/eval_on_hotpot.py

[error] 18-18: Local variable context is assigned to but never used

[error] 20-20: Undefined name search_results

[error] 29-29: Redefinition of unused answer_qa_instance from line 17

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: run_simple_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: docker-compose-test
GitHub Check: profiler

🔇 Additional comments (3)

evals/qa_context_provider_utils.py (1)

37-37: Verify the correctness of the search field 'document_chunk_text'

In the vector_engine.search call, the search field is specified as "document_chunk_text". Please verify that this is the correct field name used in your vector engine schema for searching document chunks.

evals/eval_on_hotpot.py (2)

18-18: Eliminate unused variable 'context'

The variable context is assigned but not used in the first definition of answer_qa_instance. After removing the duplicate definition as suggested, this issue will be resolved.

🧰 Tools

🪛 Ruff (0.8.2)

18-18: Local variable context is assigned to but never used

Remove assignment to unused variable context

(F841)

🪛 GitHub Actions: ruff lint

[error] 18-18: Local variable context is assigned to but never used

20-20: Fix undefined variable 'search_results'

The variable search_results is referenced but not defined in the first definition of answer_qa_instance. Removing the incomplete function definition as previously suggested will eliminate this error.

🧰 Tools

🪛 Ruff (0.8.2)

20-20: Undefined name search_results

(F821)

🪛 GitHub Actions: ruff lint

[error] 20-20: Undefined name search_results

evals/qa_context_provider_utils.py

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

evals/eval_on_hotpot.py (1)

Line range hint 17-32: Add error handling for context provider failures.

The answer_qa_instance function should handle potential failures in context retrieval and LLM calls. Consider wrapping the operations in try-catch blocks to gracefully handle exceptions.

 async def answer_qa_instance(instance, context_provider):
+    try:
         context = await context_provider(instance)
+    except Exception as e:
+        logger.error(f"Failed to retrieve context: {e}")
+        return None

     args = {
         "question": instance["question"],
         "context": context,
     }
     user_prompt = render_prompt("context_for_question.txt", args)
     system_prompt = read_query_prompt("answer_hotpot_using_cognee_search.txt")

     llm_client = get_llm_client()
+    try:
         answer_prediction = await llm_client.acreate_structured_output(
             text_input=user_prompt,
             system_prompt=system_prompt,
             response_model=str,
         )
+    except Exception as e:
+        logger.error(f"Failed to generate answer: {e}")
+        return None

     return answer_prediction

🧹 Nitpick comments (1)

evals/eval_on_hotpot.py (1)

12-12: Great architectural improvement!

The transition to a pluggable context provider system is a significant improvement that:

Enhances modularity and extensibility

Makes it easier to add new RAG options

Follows good separation of concerns principles

Also applies to: 67-70

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a8f62e and 3921b48.

📒 Files selected for processing (1)

evals/eval_on_hotpot.py (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: run_simple_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: profiler
GitHub Check: docker-compose-test

🔇 Additional comments (2)

evals/eval_on_hotpot.py (2)

Line range hint 67-76: LGTM! Good separation of concerns.

The refactored eval_on_QA_dataset function cleanly separates the context provider selection from its usage. The implementation properly handles both promptfoo and deepeval metrics.

84-90: LGTM! Well-structured CLI argument handling.

The new rag_option argument is well-implemented with:

Proper type checking

Validation against available providers

Clear help message

Required flag to prevent missing arguments

Also applies to: 97-97

evals/eval_on_hotpot.py

alekszievr and others added 25 commits January 8, 2025 17:41

QA eval dataset as argument, with hotpot and 2wikimultihop as options…

a67512d

…. Json schema validation for datasets.

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

ddf1bb8

Load dataset file by filename, outsource utilities

e0a8c19

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

0186128

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

92f2e94

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

1018292

restructure metric selection

49fb053

Add comprehensiveness, diversity and empowerment metrics

13422ba

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

0ead7d1

add promptfoo as an option

d57609d

Merge branch 'dev' into feat/COG-950-improve-metric-selection

192ada3

refactor RAG solution in eval;2C

8eedc2b

LLM as a judge metrics implemented in a uniform way

079c16c

Merge branch 'dev' into feat/COG-950-improve-metric-selection

782c352

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

51d56e1

Merge branch 'dev' into feat/COG-946-abstract-eval-dataset

cefe7d8

Use requests.get instead of wget

273b16c

Merge branch 'dev' into feat/COG-950-improve-metric-selection

48627b2

clean up promptfoo config template

66d8850

minor fixes

e414516

Merge branch 'feat/COG-946-abstract-eval-dataset' into feat/COG-950-i…

f28e208

…mprove-metric-selection

get promptfoo path instead of hardcoding

c95dbb8

minor fixes

14cac1b

Add LLM as a judge prompts

51d4607

Support 4 different rag options in eval

d20ecd0

alekszievr and others added 3 commits January 14, 2025 18:14

Merge branch 'dev' into feat/COG-950-improve-metric-selection

9131df2

Minor refactor and logger usage

9c10303

Minor cleanup and renaming

5aa0f05

Base automatically changed from feat/COG-950-improve-metric-selection to dev January 15, 2025 09:45

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

evals/qa_context_provider_utils.py Show resolved Hide resolved

Merge branch 'dev' into feat/cog-954-rag-choice

3921b48

alekszievr force-pushed the feat/cog-954-rag-choice branch from 3a8f62e to 3921b48 Compare January 15, 2025 09:56

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

evals/eval_on_hotpot.py Show resolved Hide resolved

lxobr self-requested a review January 15, 2025 10:55

lxobr assigned alekszievr Jan 15, 2025

lxobr approved these changes Jan 15, 2025

View reviewed changes

alekszievr merged commit 3494521 into dev Jan 15, 2025
25 checks passed

alekszievr deleted the feat/cog-954-rag-choice branch January 15, 2025 14:34

This was referenced Jan 16, 2025

Run eval on a set of parameters and save them as png and json #443

Merged

Incremental eval of cognee pipeline #445

Merged

feat: Add incremental eval option to paramset #446

Merged

Feat: Save and load contexts and answers for eval #462

Merged

coderabbitai bot mentioned this pull request Jan 23, 2025

Cog 1069 update notebooks evals #464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 4 different rag options in eval #439

Support 4 different rag options in eval #439

alekszievr commented Jan 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

Support 4 different rag options in eval #439

Support 4 different rag options in eval #439

Conversation

alekszievr commented Jan 14, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 14, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

alekszievr commented Jan 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)