-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 4 different rag options in eval #439
Conversation
…. Json schema validation for datasets.
…mprove-metric-selection
WalkthroughThe pull request introduces substantial modifications to the context retrieval mechanisms in the evaluation scripts. The Changes
Possibly related PRs
Suggested reviewers
Poem
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
evals/eval_on_hotpot.py (1)
Line range hint
17-29
: Remove duplicate and incomplete definition of 'answer_qa_instance'The function
answer_qa_instance
is defined twice. The first definition (lines 17-20) is incomplete and contains errors:
context
is assigned but never used.search_results
is referenced but not defined.This duplication leads to confusion and errors in the code. Remove the first, incomplete definition to retain only the correct implementation.
Apply this diff to remove the duplicate function:
async def answer_qa_instance(instance, context_provider): - context = await context_provider(instance) - - search_results_str = "\n".join([context_item["text"] for context_item in search_results]) - - return search_results_str async def get_context_without_cognee(instance): return instance["context"]🧰 Tools
🪛 Ruff (0.8.2)
18-18: Local variable
context
is assigned to but never usedRemove assignment to unused variable
context
(F841)
20-20: Undefined name
search_results
(F821)
🪛 GitHub Actions: ruff lint
[error] 18-18: Local variable
context
is assigned to but never used
[error] 20-20: Undefined name
search_results
🧹 Nitpick comments (4)
evals/qa_context_provider_utils.py (4)
13-14
: Potential performance impact due to frequent data pruningThe
cognify_instance
function callsawait cognee.prune.prune_data()
andawait cognee.prune.prune_system(metadata=True)
every time it's executed. Repeatedly pruning data and system metadata may lead to significant performance degradation, especially ifcognify_instance
is called frequently. Consider optimizing the pruning strategy, such as pruning less often or only when necessary.
28-28
: Handle potential KeyError when accessing 'text' in search resultsIn the list comprehension
[context_item["text"] for context_item in search_results]
, there is an assumption that everycontext_item
contains a"text"
key. If any item lacks this key, aKeyError
will be raised. Consider adding error handling to account for missing keys.Apply this diff to add error handling:
search_results_str = "\n".join( - [context_item["text"] for context_item in search_results] + [context_item.get("text", "") for context_item in search_results] )
39-39
: Ensure 'text' key exists in found chunksIn
get_context_with_simple_rag
, the comprehension[context_item.payload["text"] for context_item in found_chunks]
assumes that the"text"
key exists incontext_item.payload
. If it's missing, aKeyError
will occur. Consider verifying the presence of the key or using.get()
.Apply this diff to prevent potential errors:
search_results_str = "\n".join( - [context_item.payload["text"] for context_item in found_chunks] + [context_item.payload.get("text", "") for context_item in found_chunks] )
22-22
: Optimize repeated calls to 'cognify_instance'Each of the context provider functions (
get_context_with_cognee
,get_context_with_simple_rag
,get_context_with_brute_force_triplet_search
) callsawait cognify_instance(instance)
. This may lead to redundant processing if the same instance is handled multiple times. Consider restructuring the code to callcognify_instance
once per instance or caching the result.Also applies to: 34-34, 45-45
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
evals/eval_on_hotpot.py
(3 hunks)evals/qa_context_provider_utils.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
evals/eval_on_hotpot.py
18-18: Local variable context
is assigned to but never used
Remove assignment to unused variable context
(F841)
🪛 GitHub Actions: ruff lint
evals/eval_on_hotpot.py
[error] 18-18: Local variable context
is assigned to but never used
[error] 20-20: Undefined name search_results
[error] 29-29: Redefinition of unused answer_qa_instance
from line 17
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: docker-compose-test
- GitHub Check: profiler
🔇 Additional comments (3)
evals/qa_context_provider_utils.py (1)
37-37
: Verify the correctness of the search field 'document_chunk_text'In the
vector_engine.search
call, the search field is specified as"document_chunk_text"
. Please verify that this is the correct field name used in your vector engine schema for searching document chunks.evals/eval_on_hotpot.py (2)
18-18
: Eliminate unused variable 'context'The variable
context
is assigned but not used in the first definition ofanswer_qa_instance
. After removing the duplicate definition as suggested, this issue will be resolved.🧰 Tools
🪛 Ruff (0.8.2)
18-18: Local variable
context
is assigned to but never usedRemove assignment to unused variable
context
(F841)
🪛 GitHub Actions: ruff lint
[error] 18-18: Local variable
context
is assigned to but never used
20-20
: Fix undefined variable 'search_results'The variable
search_results
is referenced but not defined in the first definition ofanswer_qa_instance
. Removing the incomplete function definition as previously suggested will eliminate this error.🧰 Tools
🪛 Ruff (0.8.2)
20-20: Undefined name
search_results
(F821)
🪛 GitHub Actions: ruff lint
[error] 20-20: Undefined name
search_results
3a8f62e
to
3921b48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
evals/eval_on_hotpot.py (1)
Line range hint
17-32
: Add error handling for context provider failures.The
answer_qa_instance
function should handle potential failures in context retrieval and LLM calls. Consider wrapping the operations in try-catch blocks to gracefully handle exceptions.async def answer_qa_instance(instance, context_provider): + try: context = await context_provider(instance) + except Exception as e: + logger.error(f"Failed to retrieve context: {e}") + return None args = { "question": instance["question"], "context": context, } user_prompt = render_prompt("context_for_question.txt", args) system_prompt = read_query_prompt("answer_hotpot_using_cognee_search.txt") llm_client = get_llm_client() + try: answer_prediction = await llm_client.acreate_structured_output( text_input=user_prompt, system_prompt=system_prompt, response_model=str, ) + except Exception as e: + logger.error(f"Failed to generate answer: {e}") + return None return answer_prediction
🧹 Nitpick comments (1)
evals/eval_on_hotpot.py (1)
12-12
: Great architectural improvement!The transition to a pluggable context provider system is a significant improvement that:
- Enhances modularity and extensibility
- Makes it easier to add new RAG options
- Follows good separation of concerns principles
Also applies to: 67-70
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
evals/eval_on_hotpot.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: profiler
- GitHub Check: docker-compose-test
🔇 Additional comments (2)
evals/eval_on_hotpot.py (2)
Line range hint
67-76
: LGTM! Good separation of concerns.The refactored
eval_on_QA_dataset
function cleanly separates the context provider selection from its usage. The implementation properly handles both promptfoo and deepeval metrics.
84-90
: LGTM! Well-structured CLI argument handling.The new
rag_option
argument is well-implemented with:
- Proper type checking
- Validation against available providers
- Clear help message
- Required flag to prevent missing arguments
Also applies to: 97-97
Summary by CodeRabbit
New Features
Refactor
cognee
-specific context management with a more modular approach.Chores