Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 4 different rag options in eval #439

Merged
merged 29 commits into from
Jan 15, 2025
Merged

Conversation

alekszievr
Copy link
Contributor

@alekszievr alekszievr commented Jan 14, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a flexible context retrieval system for question-answering tasks.
    • Added multiple context provider strategies (raw context, simple RAG, brute-force triplet search).
    • Implemented a new command-line option for selecting context retrieval method.
  • Refactor

    • Removed dependency on previous context retrieval implementation.
    • Replaced cognee-specific context management with a more modular approach.
  • Chores

    • Updated evaluation script to support new context provider selection mechanism.

alekszievr and others added 25 commits January 8, 2025 17:41
Copy link
Contributor

coderabbitai bot commented Jan 14, 2025

Walkthrough

The pull request introduces substantial modifications to the context retrieval mechanisms in the evaluation scripts. The eval_on_hotpot.py file transitions from using the cognee library to a new context provider system, removing the get_context_with_cognee function and adding the answer_qa_instance function. A new module, qa_context_provider_utils.py, is created to facilitate various context retrieval strategies, enhancing the flexibility of the evaluation process for question-answering tasks.

Changes

File Changes
evals/eval_on_hotpot.py - Removed get_context_with_cognee function
- Added answer_qa_instance function
- Modified eval_on_QA_dataset to use context provider name
- Updated command-line argument from --with_cognee to --rag_option
evals/qa_context_provider_utils.py - Added new async context retrieval methods:
- get_raw_context
- cognify_instance
- get_context_with_cognee
- get_context_with_simple_rag
- get_context_with_brute_force_triplet_search
- Created qa_context_providers dictionary mapping context retrieval strategies

Possibly related PRs

  • Feat/cog 946 abstract eval dataset #418: Changes in eval_on_hotpot.py regarding the function eval_on_QA_dataset and the introduction of command-line arguments are directly related to the modifications made in the same file, indicating a shared focus on enhancing the evaluation process for QA datasets.

Suggested reviewers

  • lxobr

Poem

🐰 A Rabbit's Ode to Context Retrieval 🔍
From Cognee's grasp, we now break free,
Providers dance with agile glee
Context flows like river's might
Flexible, swift, a coder's delight!
RAG options bloom, no limits seen

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Base automatically changed from feat/COG-950-improve-metric-selection to dev January 15, 2025 09:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
evals/eval_on_hotpot.py (1)

Line range hint 17-29: Remove duplicate and incomplete definition of 'answer_qa_instance'

The function answer_qa_instance is defined twice. The first definition (lines 17-20) is incomplete and contains errors:

  • context is assigned but never used.
  • search_results is referenced but not defined.

This duplication leads to confusion and errors in the code. Remove the first, incomplete definition to retain only the correct implementation.

Apply this diff to remove the duplicate function:

 async def answer_qa_instance(instance, context_provider):
-    context = await context_provider(instance)
-
-    search_results_str = "\n".join([context_item["text"] for context_item in search_results])
-
-    return search_results_str

 async def get_context_without_cognee(instance):
     return instance["context"]
🧰 Tools
🪛 Ruff (0.8.2)

18-18: Local variable context is assigned to but never used

Remove assignment to unused variable context

(F841)


20-20: Undefined name search_results

(F821)

🪛 GitHub Actions: ruff lint

[error] 18-18: Local variable context is assigned to but never used


[error] 20-20: Undefined name search_results

🧹 Nitpick comments (4)
evals/qa_context_provider_utils.py (4)

13-14: Potential performance impact due to frequent data pruning

The cognify_instance function calls await cognee.prune.prune_data() and await cognee.prune.prune_system(metadata=True) every time it's executed. Repeatedly pruning data and system metadata may lead to significant performance degradation, especially if cognify_instance is called frequently. Consider optimizing the pruning strategy, such as pruning less often or only when necessary.


28-28: Handle potential KeyError when accessing 'text' in search results

In the list comprehension [context_item["text"] for context_item in search_results], there is an assumption that every context_item contains a "text" key. If any item lacks this key, a KeyError will be raised. Consider adding error handling to account for missing keys.

Apply this diff to add error handling:

 search_results_str = "\n".join(
-    [context_item["text"] for context_item in search_results]
+    [context_item.get("text", "") for context_item in search_results]
 )

39-39: Ensure 'text' key exists in found chunks

In get_context_with_simple_rag, the comprehension [context_item.payload["text"] for context_item in found_chunks] assumes that the "text" key exists in context_item.payload. If it's missing, a KeyError will occur. Consider verifying the presence of the key or using .get().

Apply this diff to prevent potential errors:

 search_results_str = "\n".join(
-    [context_item.payload["text"] for context_item in found_chunks]
+    [context_item.payload.get("text", "") for context_item in found_chunks]
 )

22-22: Optimize repeated calls to 'cognify_instance'

Each of the context provider functions (get_context_with_cognee, get_context_with_simple_rag, get_context_with_brute_force_triplet_search) calls await cognify_instance(instance). This may lead to redundant processing if the same instance is handled multiple times. Consider restructuring the code to call cognify_instance once per instance or caching the result.

Also applies to: 34-34, 45-45

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6653d73 and 3a8f62e.

📒 Files selected for processing (2)
  • evals/eval_on_hotpot.py (3 hunks)
  • evals/qa_context_provider_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
evals/eval_on_hotpot.py

18-18: Local variable context is assigned to but never used

Remove assignment to unused variable context

(F841)

🪛 GitHub Actions: ruff lint
evals/eval_on_hotpot.py

[error] 18-18: Local variable context is assigned to but never used


[error] 20-20: Undefined name search_results


[error] 29-29: Redefinition of unused answer_qa_instance from line 17

⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
  • GitHub Check: profiler
🔇 Additional comments (3)
evals/qa_context_provider_utils.py (1)

37-37: Verify the correctness of the search field 'document_chunk_text'

In the vector_engine.search call, the search field is specified as "document_chunk_text". Please verify that this is the correct field name used in your vector engine schema for searching document chunks.

evals/eval_on_hotpot.py (2)

18-18: Eliminate unused variable 'context'

The variable context is assigned but not used in the first definition of answer_qa_instance. After removing the duplicate definition as suggested, this issue will be resolved.

🧰 Tools
🪛 Ruff (0.8.2)

18-18: Local variable context is assigned to but never used

Remove assignment to unused variable context

(F841)

🪛 GitHub Actions: ruff lint

[error] 18-18: Local variable context is assigned to but never used


20-20: Fix undefined variable 'search_results'

The variable search_results is referenced but not defined in the first definition of answer_qa_instance. Removing the incomplete function definition as previously suggested will eliminate this error.

🧰 Tools
🪛 Ruff (0.8.2)

20-20: Undefined name search_results

(F821)

🪛 GitHub Actions: ruff lint

[error] 20-20: Undefined name search_results

evals/qa_context_provider_utils.py Show resolved Hide resolved
@alekszievr alekszievr force-pushed the feat/cog-954-rag-choice branch from 3a8f62e to 3921b48 Compare January 15, 2025 09:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
evals/eval_on_hotpot.py (1)

Line range hint 17-32: Add error handling for context provider failures.

The answer_qa_instance function should handle potential failures in context retrieval and LLM calls. Consider wrapping the operations in try-catch blocks to gracefully handle exceptions.

 async def answer_qa_instance(instance, context_provider):
+    try:
         context = await context_provider(instance)
+    except Exception as e:
+        logger.error(f"Failed to retrieve context: {e}")
+        return None

     args = {
         "question": instance["question"],
         "context": context,
     }
     user_prompt = render_prompt("context_for_question.txt", args)
     system_prompt = read_query_prompt("answer_hotpot_using_cognee_search.txt")

     llm_client = get_llm_client()
+    try:
         answer_prediction = await llm_client.acreate_structured_output(
             text_input=user_prompt,
             system_prompt=system_prompt,
             response_model=str,
         )
+    except Exception as e:
+        logger.error(f"Failed to generate answer: {e}")
+        return None

     return answer_prediction
🧹 Nitpick comments (1)
evals/eval_on_hotpot.py (1)

12-12: Great architectural improvement!

The transition to a pluggable context provider system is a significant improvement that:

  • Enhances modularity and extensibility
  • Makes it easier to add new RAG options
  • Follows good separation of concerns principles

Also applies to: 67-70

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a8f62e and 3921b48.

📒 Files selected for processing (1)
  • evals/eval_on_hotpot.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: profiler
  • GitHub Check: docker-compose-test
🔇 Additional comments (2)
evals/eval_on_hotpot.py (2)

Line range hint 67-76: LGTM! Good separation of concerns.

The refactored eval_on_QA_dataset function cleanly separates the context provider selection from its usage. The implementation properly handles both promptfoo and deepeval metrics.


84-90: LGTM! Well-structured CLI argument handling.

The new rag_option argument is well-implemented with:

  • Proper type checking
  • Validation against available providers
  • Clear help message
  • Required flag to prevent missing arguments

Also applies to: 97-97

evals/eval_on_hotpot.py Show resolved Hide resolved
@lxobr lxobr self-requested a review January 15, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants