Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shiny new LLMOps #110

Closed
wants to merge 10 commits into from
Closed

Shiny new LLMOps #110

wants to merge 10 commits into from

Conversation

Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Jun 15, 2024

Summary by CodeRabbit

  • New Features

    • Introduced detailed documentation on the development of Cognee and its applications in AI.
    • Added search methods for evaluating datasets asynchronously.
  • Bug Fixes

    • Fixed active link for the blog post on knowledge graphs.
  • Dependencies

    • Updated dspy-ai to version 2.4.10.
    • Added deepeval and evals dependencies.
  • Enhancements

    • Improved configuration options for graph topology tasks.
    • Enhanced dataset management and evaluation processes.

Copy link
Contributor

coderabbitai bot commented Jun 15, 2024

Walkthrough

This release introduces significant enhancements across multiple components of the Cognee system, focusing on improvements in graph topology processing, dataset management, and the configuration setup. Key updates include the addition of new functions for asynchronous operations, adjustments to dependencies, and a new blog post detailing Cognee's development and its practical applications.

Changes

File Path Summary
cognee/api/v1/cognify/cognify.py Added imports, print statements, setting variables, and calling functions related to graph topology and configuration adjustments.
cognee/api/v1/config/config.py Added static methods in the Config class to set graph topology task configurations.
cognee/infrastructure/databases/graph/config.py Changed graph_topology_task to True, renamed graph_topology to graph_topology_model, and updated to_dict.
cognee/infrastructure/llm/prompts/extract_topology.txt Updated instructions for extracting topology information from text.
cognee/modules/topology/infer_data_topology.py Updated assignment of graph_topology to graph_topology_model.
evals/simple_rag_vs_cognee_eval.py Major restructuring, added asynchronous tasks, new functions for dataset loading and evaluation, and logging.
pyproject.toml Updated dependencies (deepeval, dspy-ai), added evals under notebook.
docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md New blog post detailing Cognee's development and practical applications with Dynamo.fyi.
docs/blog/index.md Updated a link for the blog post "Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi".

Poem

In code's grand waltz, changes align,
Graphs and tasks redefined.
With async strides and datasets vast,
Configs tuned, improvements last.
Dependencies now freshly sown,
Cognee's strengths have grown.
📚✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range and nitpick comments (1)
cognee/api/v1/cognify/cognify.py (1)

Line range hint 241-241: Avoid using a bare except statement. It's a best practice to specify the exceptions you're catching to avoid masking unexpected errors.

- except:
+ except ExpectedExceptionType:
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 79d299e and 16139f7.

Files selected for processing (5)
  • cognee/api/v1/cognify/cognify.py (3 hunks)
  • cognee/api/v1/config/config.py (1 hunks)
  • cognee/infrastructure/databases/graph/config.py (2 hunks)
  • cognee/infrastructure/llm/prompts/extract_topology.txt (1 hunks)
  • cognee/modules/topology/infer_data_topology.py (1 hunks)
Additional context used
Ruff
cognee/api/v1/cognify/cognify.py

151-151: Local variable graph_topology is assigned to but never used (F841)

Remove assignment to unused variable graph_topology


241-241: Do not use bare except (E722)


306-306: Local variable dataset_name is assigned to but never used (F841)

Remove assignment to unused variable dataset_name


308-308: Local variable data_dir is assigned to but never used (F841)

Remove assignment to unused variable data_dir


328-328: Local variable graph is assigned to but never used (F841)

Remove assignment to unused variable graph

Additional comments not posted (4)
cognee/infrastructure/llm/prompts/extract_topology.txt (1)

4-6: The updated instructions clarify the process and ensure the correct usage of document IDs for node identification, aligning well with the intended graph topology extraction.

cognee/modules/topology/infer_data_topology.py (1)

10-10: Correctly updated the variable to match the new attribute name in the graph configuration. This ensures consistency across the codebase.

cognee/infrastructure/databases/graph/config.py (1)

22-23: The updates to the GraphConfig class are well-implemented, ensuring that the new graph_topology_model attribute and the graph_topology_task setting are correctly integrated and reflected in the configuration dictionary.

Also applies to: 42-42

cognee/api/v1/config/config.py (1)

135-149: The addition of new static methods for managing graph topology settings is well-executed. These methods enhance the configurability of the system and ensure greater flexibility in handling different graph topology scenarios.

Comment on lines +153 to +155
print(graph_config.infer_graph_topology)
print(graph_config.graph_topology_task)

Copy link
Contributor

@coderabbitai coderabbitai bot Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple print statements for debugging purposes can clutter the production code. Consider using a more robust logging approach that can be easily toggled or configured.

- print(graph_config.infer_graph_topology)
- print(graph_config.graph_topology_task)
- print('infered topology added')
- print('provided topology added')
+ logger.debug(f"Infer graph topology: {graph_config.infer_graph_topology}")
+ logger.debug(f"Graph topology task: {graph_config.graph_topology_task}")
+ logger.debug("Inferred topology added")
+ logger.debug("Provided topology added")

Also applies to: 162-163, 168-169

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This is a good suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@@ -294,14 +303,30 @@

from cognee.api.v1.add import add

dataset_name = "explanations"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several local variables (dataset_name, data_dir, graph) are assigned but never used. Consider removing these if they are unnecessary.

- dataset_name = "explanations"
- data_dir = os.path.abspath("../../../.data")
- graph = await cognify("explanations")

Also applies to: 308-308, 328-328

Tools
Ruff

306-306: Local variable dataset_name is assigned to but never used (F841)

Remove assignment to unused variable dataset_name

await add([text], "example_dataset")

from cognee.api.v1.config.config import config
config.set_chunk_engine(ChunkEngine.LANGCHAIN_ENGINE )
config.set_chunk_strategy(ChunkStrategy.LANGCHAIN_CHARACTER)
config.embedding_engine = LiteLLMEmbeddingEngine()
config.set_chunk_engine(ChunkEngine.LANGCHAIN_ENGINE)
Copy link
Contributor

@coderabbitai coderabbitai bot Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate assignment to config.set_chunk_engine. If this is intentional, consider adding a comment explaining why; otherwise, it seems like an oversight.

- config.set_chunk_engine(ChunkEngine.LANGCHAIN_ENGINE)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config.set_chunk_engine(ChunkEngine.LANGCHAIN_ENGINE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

config.set_infer_graph_topology=True

from cognee.api.v1.datasets.datasets import datasets
print(datasets.list_datasets())
Copy link
Contributor

@coderabbitai coderabbitai bot Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing the list of datasets might not be necessary in the production environment. Consider removing or replacing it with a debug log.

- print(datasets.list_datasets())
+ logger.debug("Available datasets: " + str(datasets.list_datasets()))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(datasets.list_datasets())
logger.debug("Available datasets: " + str(datasets.list_datasets()))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@@ -149,15 +150,23 @@
graph_config = get_graph_config()
graph_topology = graph_config.graph_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable graph_topology is assigned but never used. Consider removing it if it's not necessary.

- graph_topology = graph_config.graph_model
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
graph_topology = graph_config.graph_model
Tools
Ruff

151-151: Local variable graph_topology is assigned to but never used (F841)

Remove assignment to unused variable graph_topology

@@ -1,4 +1,5 @@
import asyncio
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating imports from the same module on a single line for better readability.

- import os
+ import os, asyncio

Committable suggestion was skipped due to low confidence.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range and nitpick comments (1)
evals/simple_rag_vs_cognee_eval.py (1)

Line range hint 72-87: The function cognify_search_base_rag and cognify_search_graph are using dynamic imports which can be inefficient and harder to track. Consider moving these imports to the top of the file.

+ from cognee.api.v1.search.search import search
async def cognify_search_base_rag(content:str):
    base_config = get_base_config()
    cognee_directory_path = os.path.abspath(".cognee_system")
    base_config.system_root_directory = cognee_directory_path
    vector_engine = get_vector_engine()
    return_ = await vector_engine.search(collection_name="basic_rag", query_text=content, limit=10)
    print("results", return_)
    return return_

async def cognify_search_graph(content:str, search_type= 'SIMILARITY'):
-    from cognee.api.v1.search.search import search
    params = {'query': content}
    results = await search(search_type, params)
    print("results", results)
    return results
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 16139f7 and 7ff87bb.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (2)
  • evals/simple_rag_vs_cognee_eval.py (5 hunks)
  • pyproject.toml (2 hunks)
Files skipped from review due to trivial changes (1)
  • pyproject.toml
Additional context used
Ruff
evals/simple_rag_vs_cognee_eval.py

12-12: Module level import not at top of file (E402)


13-13: Module level import not at top of file (E402)


14-14: Module level import not at top of file (E402)


17-17: Module level import not at top of file (E402)

Additional comments not posted (2)
evals/simple_rag_vs_cognee_eval.py (2)

127-131: The main function is correctly using async/await patterns, but ensure that all asynchronous calls are properly awaited to avoid runtime issues.


4-10: Ensure that all imports are placed at the top of the file to comply with PEP 8 standards.

- import os
- from cognee.base_config import get_base_config
- from cognee.infrastructure.databases.vector import get_vector_engine
- import logging
+ # Imports should be moved to the top of the file

Likely invalid or redundant comment.

Comment on lines 58 to 69
async def run_cognify_task(dataset:str= 'test_datasets', dataset_name:str = 'initial_test' ):
from cognee.api.v1.add import add
from cognee.api.v1.prune import prune
from cognee.api.v1.cognify.cognify import cognify

await prune.prune_system()

await add("data://test_datasets", "initial_test")

graph = await cognify("initial_test")
await add(f"data://{dataset}", dataset_name)

graph = await cognify(dataset_name)

return graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function run_cognify_task seems to be well-defined for asynchronous operation. However, consider adding docstrings to improve maintainability and readability.

+ """
+ Asynchronously runs the cognify task.
+ Parameters:
+     dataset (str): The dataset to use, defaults to 'test_datasets'.
+     dataset_name (str): The name of the dataset, defaults to 'initial_test'.
+ Returns:
+     graph: The result of the cognify operation.
+ """
async def run_cognify_task(dataset:str= 'test_datasets', dataset_name:str = 'initial_test' ):
    from cognee.api.v1.add import add
    from cognee.api.v1.prune import prune
    from cognee.api.v1.cognify.cognify import cognify

    await prune.prune_system()
    await add(f"data://{dataset}", dataset_name)
    graph = await cognify(dataset_name)
    return graph
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def run_cognify_task(dataset:str= 'test_datasets', dataset_name:str = 'initial_test' ):
from cognee.api.v1.add import add
from cognee.api.v1.prune import prune
from cognee.api.v1.cognify.cognify import cognify
await prune.prune_system()
await add("data://test_datasets", "initial_test")
graph = await cognify("initial_test")
await add(f"data://{dataset}", dataset_name)
graph = await cognify(dataset_name)
return graph
"""
Asynchronously runs the cognify task.
Parameters:
dataset (str): The dataset to use, defaults to 'test_datasets'.
dataset_name (str): The name of the dataset, defaults to 'initial_test'.
Returns:
graph: The result of the cognify operation.
"""
async def run_cognify_task(dataset:str= 'test_datasets', dataset_name:str = 'initial_test' ):
from cognee.api.v1.add import add
from cognee.api.v1.prune import prune
from cognee.api.v1.cognify.cognify import cognify
await prune.prune_system()
await add(f"data://{dataset}", dataset_name)
graph = await cognify(dataset_name)
return graph

Comment on lines 94 to 115
async def convert_goldens_to_test_cases(test_cases_raw: List[LLMTestCase], context_type:str =None) -> List[LLMTestCase]:

test_cases = []
for case in test_cases_raw:
if context_type == "naive_rag":
context = await cognify_search_base_rag(case.input)
print('naive rag', context)
elif context_type == "graph":
context = await cognify_search_graph(case.input)
print('graph', context)
else:
context = case.context

test_case = LLMTestCase(
input=case.input,
# Generate actual output using the 'input' and 'additional_metadata'
actual_output= str(get_answer(case.input, case.context).model_dump()['response']),
actual_output= str(get_answer(case.input, context).model_dump()['response']),
expected_output=case.expected_output,
context=case.context,
retrieval_context=["retrieval_context"],
)
test_cases.append(test_case)
test_cases.append(test_case)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The previous script did not find any try or except blocks within the convert_goldens_to_test_cases function, indicating a lack of error handling around asynchronous calls.

  • The convert_goldens_to_test_cases function lacks proper error handling for its asynchronous calls.
Analysis chain

The function convert_goldens_to_test_cases is handling different context types and converting them to test cases. It is crucial to ensure that the error handling is robust, especially since it relies on asynchronous calls which might fail.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the robustness of error handling in asynchronous calls within `convert_goldens_to_test_cases`.

# Test: Check for proper error handling in async functions. Expect: Proper try-except blocks around async calls.
rg --type python $'try { $$$ } catch { $$$ }' evals/simple_rag_vs_cognee_eval.py

Length of output: 195



Script:

#!/bin/bash
# Description: Verify the robustness of error handling in asynchronous calls within `convert_goldens_to_test_cases`.

# Test: Check for proper error handling in async functions. Expect: Proper try-except blocks around async calls.
rg 'try:' -A 3 evals/simple_rag_vs_cognee_eval.py

Length of output: 133



Script:

#!/bin/bash
# Description: Verify the robustness of error handling in asynchronous calls within `convert_goldens_to_test_cases`.

# Test: Check for proper error handling in async functions within the `convert_goldens_to_test_cases` function. Expect: Proper try-except blocks around async calls.
ast-grep --lang python --pattern $'def convert_goldens_to_test_cases($_, $_):\n    $$$' evals/simple_rag_vs_cognee_eval.py | rg 'try:|except:'

Length of output: 142

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7ff87bb and f3a1d2e.

Files selected for processing (1)
  • evals/simple_rag_vs_cognee_eval.py (1 hunks)
Additional context used
Ruff
evals/simple_rag_vs_cognee_eval.py

174-174: Module level import not at top of file

(E402)

Additional comments not posted (9)
evals/simple_rag_vs_cognee_eval.py (9)

25-26: LGTM!


29-43: LGTM!


46-60: LGTM!


63-75: LGTM!


78-92: LGTM!


95-112: LGTM!


149-172: LGTM!


176-187: LGTM!


189-207: LGTM!

Comment on lines +1 to +20
import asyncio
import logging
import os
from typing import List, Type

from pydantic import BaseModel
from dotenv import load_dotenv

from typing import List, Type
from deepeval.dataset import EvaluationDataset
from deepeval.test_case import LLMTestCase
import dotenv
dotenv.load_dotenv()

from deepeval.metrics import HallucinationMetric, BaseMetric
from cognee.infrastructure.llm.get_llm_client import get_llm_client
from cognee.base_config import get_base_config
from cognee.infrastructure.databases.vector import get_vector_engine
from cognee.api.v1.add import add
from cognee.api.v1.prune import prune
from cognee.api.v1.cognify.cognify import cognify
from cognee.api.v1.search.search import search

dataset = EvaluationDataset()
dataset.add_test_cases_from_json_file(
# file_path is the absolute path to you .json file
file_path="./synthetic_data/20240519_185842.json",
input_key_name="input",
actual_output_key_name="actual_output",
expected_output_key_name="expected_output",
context_key_name="context"
)

print(dataset)
# from deepeval.synthesizer import Synthesizer
#
# synthesizer = Synthesizer(model="gpt-3.5-turbo")
#
# dataset = EvaluationDataset()
# dataset.generate_goldens_from_docs(
# synthesizer=synthesizer,
# document_paths=['natural_language_processing.txt', 'soldiers_home.pdf', 'trump.txt'],
# max_goldens_per_document=10,
# num_evolutions=5,
# enable_breadth_evolve=True,
# )


print(dataset.goldens)
print(dataset)
load_dotenv()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the import order.

The import statement at line 174 should be moved to the top of the file to comply with PEP 8 guidelines.

- from typing import Type

Also applies to: 174-174

Comment on lines +115 to +146
async def convert_goldens_to_test_cases(test_cases_raw: List[LLMTestCase], context_type: str = None) -> List[
LLMTestCase]:
"""
Converts raw test cases into LLMTestCase objects with actual outputs generated.

Args:
test_cases_raw (List[LLMTestCase]): List of raw test cases.
context_type (str, optional): Type of context to use. Defaults to None.

def convert_goldens_to_test_cases(test_cases_raw: List[LLMTestCase]) -> List[LLMTestCase]:
Returns:
List[LLMTestCase]: List of LLMTestCase objects with actual outputs populated.
"""
test_cases = []
for case in test_cases_raw:
if context_type == "naive_rag":
context = await cognify_search_base_rag(case.input)
elif context_type == "graph":
context = await cognify_search_graph(case.input)
else:
context = case.context

actual_output = str((await get_answer(case.input, context)).response)

test_case = LLMTestCase(
input=case.input,
# Generate actual output using the 'input' and 'additional_metadata'
actual_output= str(get_answer(case.input, case.context).model_dump()['response']),
actual_output=actual_output,
expected_output=case.expected_output,
context=case.context,
retrieval_context=["retrieval_context"],
)
test_cases.append(test_case)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for asynchronous calls.

The function lacks proper error handling for its asynchronous calls. Ensure that all asynchronous calls are wrapped in try-except blocks.

+    try:
+        if context_type == "naive_rag":
+            context = await cognify_search_base_rag(case.input)
+        elif context_type == "graph":
+            context = await cognify_search_graph(case.input)
+        else:
+            context = case.context
+
+        actual_output = str((await get_answer(case.input, context)).response)
+    except Exception as error:
+        logger.error("Error converting test case: %s", error, exc_info=True)
+        raise error
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def convert_goldens_to_test_cases(test_cases_raw: List[LLMTestCase], context_type: str = None) -> List[
LLMTestCase]:
"""
Converts raw test cases into LLMTestCase objects with actual outputs generated.
Args:
test_cases_raw (List[LLMTestCase]): List of raw test cases.
context_type (str, optional): Type of context to use. Defaults to None.
def convert_goldens_to_test_cases(test_cases_raw: List[LLMTestCase]) -> List[LLMTestCase]:
Returns:
List[LLMTestCase]: List of LLMTestCase objects with actual outputs populated.
"""
test_cases = []
for case in test_cases_raw:
if context_type == "naive_rag":
context = await cognify_search_base_rag(case.input)
elif context_type == "graph":
context = await cognify_search_graph(case.input)
else:
context = case.context
actual_output = str((await get_answer(case.input, context)).response)
test_case = LLMTestCase(
input=case.input,
# Generate actual output using the 'input' and 'additional_metadata'
actual_output= str(get_answer(case.input, case.context).model_dump()['response']),
actual_output=actual_output,
expected_output=case.expected_output,
context=case.context,
retrieval_context=["retrieval_context"],
)
test_cases.append(test_case)
test_cases = []
for case in test_cases_raw:
try:
if context_type == "naive_rag":
context = await cognify_search_base_rag(case.input)
elif context_type == "graph":
context = await cognify_search_graph(case.input)
else:
context = case.context
actual_output = str((await get_answer(case.input, context)).response)
except Exception as error:
logger.error("Error converting test case: %s", error, exc_info=True)
raise error
test_case = LLMTestCase(
input=case.input,
actual_output=actual_output,
expected_output=case.expected_output,
context=case.context,
retrieval_context=["retrieval_context"],
)
test_cases.append(test_case)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f3a1d2e and 2d09379.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (1)
  • pyproject.toml (3 hunks)
Additional comments not posted (2)
pyproject.toml (2)

76-76: Verify compatibility of the new deepeval dependency.

Ensure that adding deepeval version ^0.21.55 does not introduce any compatibility issues with the rest of the codebase.


59-59: Verify compatibility of the updated dspy-ai dependency.

Ensure that updating dspy-ai to version 2.4.10 does not introduce any compatibility issues with the rest of the codebase.

Verification successful

The update to dspy-ai version 2.4.10 is compatible with the existing codebase.

There are no breaking changes or significant updates in the new version that would affect the current usage patterns.

  • The changelog for version 2.4.10 does not indicate any breaking changes.
  • The search results show extensive usage of dspy in the codebase, and no compatibility issues are expected.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify compatibility of the updated `dspy-ai` dependency.

# Test: Run the test suite. Expect: All tests to pass.
poetry install
poetry run pytest

Length of output: 184



Script:

#!/bin/bash
# Search for all import statements related to `dspy-ai` in the codebase.
rg --fixed-strings "import dspy-ai" --glob "*.py"

# Search for all usage instances of `dspy-ai` in the codebase.
rg --fixed-strings "dspy-ai." --glob "*.py"

Length of output: 95



Script:

#!/bin/bash
# Search for all import statements related to `dspy_ai` in the codebase (alternative import style).
rg --fixed-strings "import dspy_ai" --glob "*.py"

# Search for all usage instances of `dspy_ai` in the codebase (alternative usage pattern).
rg --fixed-strings "dspy_ai." --glob "*.py"

# Search for all import statements related to `dspy` in the codebase (potential module import).
rg --fixed-strings "import dspy" --glob "*.py"

# Search for all usage instances of `dspy` in the codebase (potential module usage).
rg --fixed-strings "dspy." --glob "*.py"

Length of output: 4542



Script:

#!/bin/bash
# Fetch the changelog or release notes for `dspy-ai` version `2.4.10`.
curl -s https://pypi.org/pypi/dspy-ai/2.4.10/json | jq -r '.info.description'

Length of output: 36729

@@ -149,15 +150,23 @@ async def process_batch(files_batch):
graph_config = get_graph_config()
graph_topology = graph_config.graph_model

print(graph_config.infer_graph_topology)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete these print lines.

Comment on lines +153 to +155
print(graph_config.infer_graph_topology)
print(graph_config.graph_topology_task)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This is a good suggestion

await add([text], "example_dataset")

from cognee.api.v1.config.config import config
config.set_chunk_engine(ChunkEngine.LANGCHAIN_ENGINE )
config.set_chunk_strategy(ChunkStrategy.LANGCHAIN_CHARACTER)
config.embedding_engine = LiteLLMEmbeddingEngine()
config.set_chunk_engine(ChunkEngine.LANGCHAIN_ENGINE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

config.set_infer_graph_topology=True

from cognee.api.v1.datasets.datasets import datasets
print(datasets.list_datasets())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range and nitpick comments (6)
docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md (6)

15-15: Remove trailing spaces.

There is an unnecessary trailing space at the end of this line.

- 
+

27-27: Remove multiple consecutive blank lines.

There are two consecutive blank lines here, which is unnecessary.

- 
- 
+ 
Tools
Markdownlint

27-27: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


50-50: Remove trailing spaces.

There is an unnecessary trailing space at the end of this line.

- 
+
Tools
Markdownlint

50-50: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


68-68: Remove multiple consecutive blank lines.

There are two consecutive blank lines here, which is unnecessary.

- 
- 
+ 
Tools
Markdownlint

68-68: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


78-78: Remove trailing spaces.

There is an unnecessary trailing space at the end of this line.

- 
+
Tools
Markdownlint

78-78: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


82-82: Remove multiple consecutive blank lines.

There are two consecutive blank lines here, which is unnecessary.

- 
- 
+ 
Tools
Markdownlint

82-82: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2d09379 and f322a6c.

Files ignored due to path filters (4)
  • docs/blog/posts/where-do-knowledge-graphs-fit/cognee_results.png is excluded by !**/*.png
  • docs/blog/posts/where-do-knowledge-graphs-fit/dynamo_fyi_demo.jpg is excluded by !**/*.jpg
  • docs/blog/posts/where-do-knowledge-graphs-fit/initial_results.png is excluded by !**/*.png
  • docs/blog/posts/where-do-knowledge-graphs-fit/timeline.jpg is excluded by !**/*.jpg
Files selected for processing (1)
  • docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md (1 hunks)
Additional context used
LanguageTool
docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md

[uncategorized] ~52-~52: Possible missing comma found.
Context: ...ersonalization layer above the Firebase store where all customer data is located. How...

(AI_HYDRA_LEO_MISSING_COMMA)

Markdownlint
docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md

36-36: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


50-50: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


78-78: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


27-27: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


68-68: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


82-82: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

Additional comments not posted (2)
docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md (2)

1-12: Front matter looks good.

The metadata is correctly formatted.


42-47: Section looks good.

The description of Dynamo's use case and the recommender system is clear and informative.


Upon arrival at Dynamo, we found a rudimentary agent support system connected to OpenAI with one templated query. Instead of building a GraphRAG, we needed to replicate the existing functionality and create an infrastructure layer for future evolution towards GraphRAG.

GraphRAG would provide a semantic and personalization layer above the Firebase store where all customer data is located. However, this would only be useful if we had a system to release our agent support assistant to production.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing comma.

There is a possible missing comma in this sentence.

- layer above the Firebase store where all customer data is located
+ layer above the Firebase store, where all customer data is located
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
GraphRAG would provide a semantic and personalization layer above the Firebase store where all customer data is located. However, this would only be useful if we had a system to release our agent support assistant to production.
GraphRAG would provide a semantic and personalization layer above the Firebase store, where all customer data is located. However, this would only be useful if we had a system to release our agent support assistant to production.
Tools
LanguageTool

[uncategorized] ~52-~52: Possible missing comma found.
Context: ...ersonalization layer above the Firebase store where all customer data is located. How...

(AI_HYDRA_LEO_MISSING_COMMA)

Copy link

gitguardian bot commented Jul 3, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
7122536 Triggered Generic High Entropy Secret d8c6e6f docker-compose.yml View secret
7122536 Triggered Generic High Entropy Secret d8c6e6f docker-compose.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f322a6c and d8c6e6f.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (1)
  • pyproject.toml (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d8c6e6f and 688d757.

Files selected for processing (1)
  • docs/blog/index.md (1 hunks)
Additional context used
Markdownlint
docs/blog/index.md

8-8: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

@@ -8,7 +8,7 @@ The goal of the blog is to discuss broader topics around the cognee project, inc

1. [LLMOps stack + Graphs](posts/Shiny_new_LLMOps.md)

[//]: # (2. [Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi]&#40;posts/where-do-knowledge-graphs-fit.md&#41;)
2. [Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi](posts/wwhere-do-knowledge-graphs-fit-and-where-they-dont.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify the new blog post link.

The link to the new blog post "Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi" is correctly formatted. However, there is a typographical error in the filename wwhere-do-knowledge-graphs-fit-and-where-they-dont.md.

- 2. [Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi](posts/wwhere-do-knowledge-graphs-fit-and-where-they-dont.md)
+ 2. [Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi](posts/where-do-knowledge-graphs-fit-and-where-they-dont.md)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. [Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi](posts/wwhere-do-knowledge-graphs-fit-and-where-they-dont.md)
2. [Where do knowledge graphs fit, and where do they not? A case study with dynamo.fyi](posts/where-do-knowledge-graphs-fit-and-where-they-dont.md)

@@ -8,7 +8,7 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove consecutive blank lines.

There are multiple consecutive blank lines which are not necessary and should be removed to adhere to markdown best practices.

- 8-8: Expected: 1; Actual: 2
+ 8: Expected: 1; Actual: 1

Committable suggestion was skipped due to low confidence.

Tools
Markdownlint

8-8: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 688d757 and 77e06e2.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (1)
  • docs/blog/index.md (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • docs/blog/index.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 77e06e2 and a8ed752.

Files selected for processing (1)
  • docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • docs/blog/posts/where-do-knowledge-graphs-fit-and-where-they-dont.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants