Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COG-485 - Fix/integration test warnings #176

Merged
merged 6 commits into from
Nov 5, 2024
Merged

Conversation

0xideas
Copy link
Contributor

@0xideas 0xideas commented Nov 4, 2024

Small changes to remove warnings

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced API error handling with consistent JSON responses for not found scenarios.
    • Improved handling of optional fields in data models, making certain fields default to None.
    • Added new evaluation and training functions to streamline processing and improve functionality.
  • Bug Fixes

    • Corrected print statement formatting in multiple test files for clearer output.
  • Chores

    • Updated various dependencies in the pyproject.toml file to ensure compatibility and access to the latest features.
  • Refactor

    • Streamlined initialization processes in data models and updated method calls for improved clarity.
    • Enhanced error handling in ontology processing and data validation workflows.
  • Removed

    • Deprecated the FastembedEmbeddingEngine class to simplify the codebase.

@0xideas 0xideas requested a review from borisarzentar November 4, 2024 15:16
@0xideas 0xideas self-assigned this Nov 4, 2024
Copy link
Contributor

coderabbitai bot commented Nov 4, 2024

Warning

Rate limit exceeded

@0xideas has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 19 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between cb20506 and 67f5d07.

Walkthrough

This pull request introduces several updates across multiple files within the cognee project. Key changes include modifications to data transfer objects (DTOs) to make certain fields optional, improvements in API endpoint error handling, and adjustments to the initialization of various classes. Additionally, several print statement formatting issues were corrected in test files. The pyproject.toml file underwent significant dependency updates, including version upgrades and the addition of new packages, while one file was removed entirely. Overall, the changes enhance clarity, consistency, and functionality throughout the codebase.

Changes

File Change Summary
cognee/api/client.py Updated updated_at fields in DatasetDTO and DataDTO to default to None. Improved error handling for get_dataset_data and get_raw_data endpoints to return JSON responses. Enhanced add endpoint to handle URLs and clone GitHub repositories.
cognee/modules/pipelines/Pipeline.py Changed description attribute in PipelineConfig to default to None. Constructor in Pipeline now assigns description from pipeline_config.
cognee/modules/pipelines/operations/run_parallel.py Updated tasks parameter type from [Task] to List[Task] in run_tasks_parallel function.
cognee/shared/GithubTopology.py Replaced Directory.update_forward_refs() with Directory.model_rebuild() in Directory class.
cognee/shared/SourceCodeGraph.py Set default values to None for several optional fields in BaseClass, Class, Function, and Variable.
cognee/tasks/infer_data_ontology/models/models.py Replaced NodeModel.update_forward_refs() with NodeModel.model_rebuild().
cognee/tests/test_library.py Corrected print statement formatting for extracted summaries.
cognee/tests/test_neo4j.py Corrected print statement formatting for extracted summaries.
cognee/tests/test_qdrant.py Corrected print statement formatting for extracted summaries.
cognee/tests/test_weaviate.py Corrected print statement formatting for extracted summaries.
pyproject.toml Upgraded multiple dependencies, added new dependencies, and removed several obsolete ones.
cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py Removed FastembedEmbeddingEngine class and its methods.
cognee/modules/cognify/evaluate.py Added evaluate and evaluate_answer functions to handle dataset evaluation.
cognee/modules/cognify/train.py Updated train function to utilize dspy and simplified error handling in evaluate_answer.
cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py Updated validation method for DataPoint[Keyword] instances from Keyword.parse_obj to Keyword.model_validate.
cognee/tasks/infer_data_ontology/infer_data_ontology.py Updated recursive_flatten method to use NodeModel.model_validate. Enhanced error handling in add_graph_ontology.
cognee/tasks/save_chunks_to_store/save_chunks_to_store.py Updated payload validation method for existing chunks from DocumentChunk.parse_obj to DocumentChunk.model_validate.

Possibly related PRs

Suggested reviewers

  • borisarzentar

Poem

🐇 In the code where bunnies play,
Changes hop in bright array.
Optional fields now take a bow,
Error handling, oh so wow!
Dependencies sprout, new ones bloom,
In the code, there's plenty of room!
Let's celebrate this joyful day! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
cognee/modules/pipelines/Pipeline.py (1)

Line range hint 13-19: Fix type mismatch between Pipeline and PipelineConfig description fields.

There's a type safety issue: Pipeline.description is typed as str (non-optional) but is initialized from PipelineConfig.description which is Optional[str]. This could lead to runtime issues if None is passed.

Consider one of these solutions:

class Pipeline():
    id: UUID = uuid4()
    name: str
-    description: str
+    description: Optional[str]
    tasks: list[Task] = []

Or add validation in the constructor:

def __init__(self, name: str, pipeline_config: PipelineConfig):
    self.name = name
+    if pipeline_config.description is None:
+        raise ValueError("description is required")
    self.description = pipeline_config.description
cognee/modules/pipelines/operations/run_parallel.py (1)

Line range hint 5-11: Review result handling logic

The current implementation has a potential issue in its result handling:

  1. When results has exactly one item, it returns an empty list instead of the single result
  2. The behavior of returning only the last result (when multiple exist) or an empty list isn't clearly documented

Consider this improved implementation:

 def run_tasks_parallel(tasks: List[Task]) -> Callable[[Any], Generator[Any, Any, Any]]:
+    """Execute multiple tasks in parallel and return the final result.
+    
+    When multiple tasks are executed:
+    - Returns the last result if there are multiple results
+    - Returns the single result if there's exactly one result
+    - Returns an empty list if no results
+    """
     async def parallel_run(*args, **kwargs):
         parallel_tasks = [asyncio.create_task(task.run(*args, **kwargs)) for task in tasks]
 
         results = await asyncio.gather(*parallel_tasks)
-        return results[len(results) - 1] if len(results) > 1 else []
+        if not results:
+            return []
+        return results[-1]
cognee/tasks/infer_data_ontology/models/models.py (1)

Line range hint 1-28: Consider adding field descriptions for better documentation.

While the model structure is clean, adding Field descriptions (like you did for OntologyNode.id) would improve the self-documentation of the models.

Example improvement:

 class NodeModel(BaseModel):
-    node_id: str
-    name: str
-    default_relationship: Optional[RelationshipModel] = None
-    children: List[Union[Dict[str, Any], "NodeModel"]] = Field(default_factory=list)
+    node_id: str = Field(..., description="Unique identifier for the node")
+    name: str = Field(..., description="Display name of the node")
+    default_relationship: Optional[RelationshipModel] = Field(
+        None, description="Default relationship type for child nodes"
+    )
+    children: List[Union[Dict[str, Any], "NodeModel"]] = Field(
+        default_factory=list,
+        description="Nested child nodes or raw dictionaries"
+    )
cognee/shared/SourceCodeGraph.py (1)

41-41: Consider adding field documentation.

While setting None as the default value is correct, the field name default_value with a default of None could be confusing. Consider adding a docstring or comment to clarify the purpose of this field.

Add a docstring to clarify the field's purpose:

    is_static: Optional[bool] = False
+    # The default value assigned to this variable in the source code, if any
    default_value: Optional[str] = None
cognee/api/client.py (1)

203-203: LGTM! Consider documenting the field behavior.

The change to make updated_at optional with a default value of None is good. For better maintainability, consider adding a docstring to both DTOs explaining when updated_at is set vs. when it remains None.

Example docstring:

class DataDTO(OutDTO):
    """Data transfer object for file data.
    
    Attributes:
        updated_at: Timestamp of the last update. None if never updated.
    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cc77d84 and 79642cf.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • cognee/api/client.py (2 hunks)
  • cognee/modules/pipelines/Pipeline.py (1 hunks)
  • cognee/modules/pipelines/operations/run_parallel.py (1 hunks)
  • cognee/shared/GithubTopology.py (1 hunks)
  • cognee/shared/SourceCodeGraph.py (3 hunks)
  • cognee/tasks/infer_data_ontology/models/models.py (1 hunks)
  • cognee/tests/test_library.py (1 hunks)
  • cognee/tests/test_neo4j.py (1 hunks)
  • cognee/tests/test_qdrant.py (1 hunks)
  • cognee/tests/test_weaviate.py (1 hunks)
  • pyproject.toml (2 hunks)
✅ Files skipped from review due to trivial changes (4)
  • cognee/tests/test_library.py
  • cognee/tests/test_neo4j.py
  • cognee/tests/test_qdrant.py
  • cognee/tests/test_weaviate.py
🔇 Additional comments (10)
cognee/modules/pipelines/Pipeline.py (1)

8-8: LGTM! Good practice for optional fields.

Adding a default value of None for the optional description field aligns with best practices and helps prevent potential initialization issues.

cognee/modules/pipelines/operations/run_parallel.py (2)

1-1: LGTM! Clean import organization

The addition of List to the typing imports is properly organized and necessary for the updated type hints.


5-5: LGTM! Proper type hint update

The change from [Task] to List[Task] follows PEP 484 guidelines and helps eliminate type checking warnings. This aligns well with the PR's objective of fixing integration test warnings.

cognee/tasks/infer_data_ontology/models/models.py (1)

15-15: LGTM! Correct migration from deprecated update_forward_refs().

The change from update_forward_refs() to model_rebuild() is the correct approach for handling forward references in Pydantic v2.

Let's verify the Pydantic version and check for any remaining instances of the deprecated method:

✅ Verification successful

Migration to model_rebuild() is complete and consistent with Pydantic v2.8.2

The verification confirms:

  • Project uses Pydantic v2.8.2 which requires model_rebuild()
  • No remaining instances of deprecated update_forward_refs() found in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check Pydantic version in pyproject.toml
echo "Checking Pydantic version:"
grep -A 1 "pydantic" pyproject.toml

# Look for any remaining instances of update_forward_refs
echo -e "\nChecking for remaining instances of deprecated method:"
rg "update_forward_refs\(\)" --type py

Length of output: 361

cognee/shared/SourceCodeGraph.py (3)

9-9: LGTM! Good improvement to the BaseClass model.

Setting None as the default value for the optional constructor_parameters field follows Pydantic best practices and will help prevent validation warnings.


31-31: LGTM! Consistent with other model improvements.

Setting None as the default value for the optional parameters field aligns with the changes in other models.


16-17: Verify circular dependency handling.

While the default values are correct, the from_class: Optional[BaseClass] creates a circular type dependency since Class is referenced in BaseClass. This might cause issues with Pydantic's model validation.

Let's check if there are any other circular dependencies in the codebase:

pyproject.toml (2)

58-58: Verify compatibility with LiteLLM version 1.51.3

The LiteLLM update from 1.38.10 to 1.51.3 is substantial and should be carefully reviewed, especially given its dependency on the OpenAI SDK.

Consider the following:

  1. Review the LiteLLM changelog for breaking changes
  2. Ensure compatibility with OpenAI SDK 1.52.0
  3. Consider updating to the latest stable version (1.21.0 as of April 2024)
✅ Verification successful

LiteLLM 1.51.3 update appears safe to proceed

The codebase uses LiteLLM in a very limited way, only for embeddings via the aembedding function with a simple API surface. The implementation in LiteLLMEmbeddingEngine.py uses basic functionality that has remained stable across versions:

  • Only uses aembedding function with standard parameters (model, input, api_key)
  • No usage of advanced features, Router, Cache, or complex configurations
  • Simple async wrapper for embedding generation

The update from 1.38.10 to 1.51.3 should not impact this basic embedding functionality as it's a core feature that maintains backward compatibility.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for LiteLLM usage patterns that might be affected by breaking changes
rg -l 'litellm\.(completion|embedding|Router|Cache)'
rg -l 'litellm\.utils\.(completion|embedding)'

Length of output: 103


Script:

#!/bin/bash
# Let's try a broader search for litellm usage patterns
rg -l "litellm"

# Also search for specific imports
rg "^from litellm|^import litellm" -A 2

# Look for configuration related to litellm
rg "litellm" -g "*.py" -g "*.yaml" -g "*.yml" -g "*.json" -g "*.toml"

Length of output: 1771


Script:

#!/bin/bash
# Let's examine the LiteLLMEmbeddingEngine implementation to understand usage patterns
cat cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py

# Check poetry.lock for current and previous versions
rg "litellm" poetry.lock -A 5 -B 5

Length of output: 2761


22-22: Verify compatibility with OpenAI SDK version 1.52.0

The OpenAI SDK update from 1.27.0 to 1.52.0 is a significant jump that may introduce breaking changes, particularly in error handling and response structures. This could be related to the integration test warnings you're addressing.

Consider the following:

  1. Review the OpenAI SDK changelog for breaking changes
  2. Test the integration with LiteLLM thoroughly
  3. Consider updating to the latest stable version (1.13.3 as of April 2024)
✅ Verification successful

OpenAI SDK version 1.52.0 update appears safe to proceed

The codebase uses OpenAI SDK in a limited and well-structured way through adapter patterns:

  • Primary usage is through AsyncOpenAI and OpenAI client initialization in adapter classes
  • LiteLLM integration is isolated to embeddings functionality
  • No direct usage of error handling classes or deprecated API resources found

The adapter pattern implementation provides a good isolation layer that will help manage any breaking changes from the SDK update. While version 1.52.0 isn't the latest, it's being used with LiteLLM 1.51.3, suggesting intentional version alignment between these interdependent packages.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for OpenAI SDK usage patterns that might be affected by breaking changes
rg -l 'openai\.(Error|APIError|InvalidRequestError)' 
rg -l 'openai\.api_resources'

Length of output: 86


Script:

#!/bin/bash
# Let's search for any direct OpenAI SDK usage patterns
rg -l "from openai" 
rg -l "import openai"

# Search for specific OpenAI client initialization or API calls
rg -l "OpenAI|openai\.Client"

# Check if we're using LiteLLM as an abstraction layer
rg -l "from litellm" 
rg -l "import litellm"

Length of output: 1004


Script:

#!/bin/bash
# Let's examine the actual OpenAI SDK usage in the adapter files
rg -A 5 "openai\.Client|from openai import" cognee/infrastructure/llm/openai/adapter.py cognee/infrastructure/llm/generic_llm_api/adapter.py

# Check how LiteLLM is being used for embeddings
rg -A 5 "litellm" cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py

Length of output: 2192

cognee/api/client.py (1)

149-149: LGTM! Making updated_at optional improves DTO robustness.

The change to make updated_at optional with a default value of None is a good practice, as it handles the case where a dataset hasn't been updated yet.

@@ -18,7 +18,7 @@ class Directory(BaseModel):
directories: List['Directory'] = []

# Allows recursive Directory Model
Directory.update_forward_refs()
Directory.model_rebuild()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Several Pydantic v1 methods need to be updated to v2 equivalents

The codebase shows multiple instances of deprecated Pydantic v1 methods that should be updated to their v2 equivalents:

  • parse_obj()model_validate() in:
    • cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py
    • cognee/tasks/infer_data_ontology/infer_data_ontology.py
    • cognee/tasks/save_chunks_to_store/save_chunks_to_store.py
  • schema related usages in database adapters should be reviewed and updated to use Pydantic v2's model configuration
  • The imports look good as they're using the new style with pydantic_settings for settings classes

While the change from update_forward_refs() to model_rebuild() is correct, there are other v1 methods that should also be updated to maintain consistency with Pydantic v2.

🔗 Analysis chain

LGTM! Correct update to use Pydantic v2's model_rebuild()

The change from update_forward_refs() to model_rebuild() aligns with Pydantic v2's best practices for handling forward references in recursive models.

Let's verify the Pydantic version and check for other potential v1 methods that might need updates:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check Pydantic version and usage of deprecated v1 methods

# Test 1: Verify Pydantic version in dependencies
echo "Checking Pydantic version:"
grep -r "pydantic" pyproject.toml

# Test 2: Search for other potential v1 methods that might need updates
echo -e "\nChecking for other Pydantic v1 methods:"
rg -l "update_forward_refs|construct|validate|parse_obj|parse_raw|parse_file|schema|schema_json|validate_json" --type py

Length of output: 1287


Script:

#!/bin/bash
# Let's check the actual usage of these v1 methods in the files to determine if they need updates
echo "Checking specific v1 method usages:"
rg "update_forward_refs|construct|validate|parse_obj|parse_raw|parse_file|schema|schema_json|validate_json" --type py -C 2

# Also check for Pydantic imports to see if they're using v2 style imports
echo -e "\nChecking Pydantic import patterns:"
rg "from pydantic" --type py

Length of output: 27510

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 79642cf and ad8943f.

📒 Files selected for processing (3)
  • .github/workflows/test_python_3_10.yml (1 hunks)
  • .github/workflows/test_python_3_11.yml (1 hunks)
  • .github/workflows/test_python_3_9.yml (1 hunks)
🔇 Additional comments (4)
.github/workflows/test_python_3_9.yml (2)

50-50: LGTM! Good optimization of CI dependencies.

Excluding documentation dependencies during test runs is a good optimization that can speed up CI execution and reduce resource usage.


50-50: Verify no test dependencies in docs group.

Please ensure that no test dependencies are accidentally grouped under the docs group in pyproject.toml.

✅ Verification successful

No test dependencies found in docs group

Based on the verification results, the docs group in pyproject.toml only contains documentation-related packages:

  • mkdocs-material: Documentation theme
  • mkdocs-minify-plugin: Documentation minification
  • mkdocstrings: Documentation string generator
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if any test-related packages are in the docs group
# Expected: No test packages should be found in the docs group

echo "Checking for test dependencies in docs group..."
rg -A 10 '^\[tool\.poetry\.group\.docs\.dependencies\]' pyproject.toml

Length of output: 461

.github/workflows/test_python_3_11.yml (2)

50-50: LGTM! Optimization to exclude unnecessary doc dependencies.

This change improves CI efficiency by not installing documentation dependencies during test runs.


50-50: Verify test dependencies are not affected.

Let's ensure no test dependencies are incorrectly marked as documentation dependencies.

✅ Verification successful

Let me gather more information to verify this thoroughly.


Based on the pyproject.toml content, I can now generate the final response:

Test dependencies are properly separated and not affected by docs exclusion

The test dependencies are correctly configured in the dev group, not in the docs group. The docs group only contains documentation-specific packages (mkdocs-related), while testing packages like pytest and pytest-asyncio are properly placed in the dev dependencies. The --without docs flag in the poetry install command will not affect any test dependencies.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if any test dependencies are marked as documentation dependencies
# Expected: No test-related packages should be in the docs group

# Check the poetry groups configuration
cat pyproject.toml | awk '/^\[tool\.poetry\.group\.docs\.dependencies\]/{flag=1;next}/^\[/{flag=0}flag' | \
  grep -iE 'test|pytest|coverage|mock|hypothesis|behave|cucumber|robot|selenium|playwright|cypress'

# Check if any test files import packages that might be in docs group
rg -l '(import|from) \w+' tests/ | \
  xargs cat | grep -oE '^(import|from) [a-zA-Z0-9_]+' | \
  sort -u | cut -d' ' -f2 > /tmp/test_imports.txt

echo "Packages imported in test files:"
cat /tmp/test_imports.txt

Length of output: 423


Script:

#!/bin/bash
# Let's check the entire pyproject.toml to understand the dependency structure
echo "=== Full pyproject.toml content ==="
cat pyproject.toml

echo -e "\n=== Test dependencies ==="
cat pyproject.toml | awk '/^\[tool\.poetry\.group\.test\.dependencies\]/{flag=1;next}/^\[/{flag=0}flag'

echo -e "\n=== Main dependencies ==="
cat pyproject.toml | awk '/^\[tool\.poetry\.dependencies\]/{flag=1;next}/^\[/{flag=0}flag'

echo -e "\n=== List all test files ==="
fd -e py -e pytest . tests/

Length of output: 4608

.github/workflows/test_python_3_10.yml Outdated Show resolved Hide resolved
@0xideas 0xideas force-pushed the fix/integration-test-warnings branch from ad8943f to 115e93b Compare November 4, 2024 16:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
pyproject.toml (1)

89-93: New testing tools added to improve code quality.

The addition of deptry, pylint, and ruff will help catch potential issues:

  • deptry: Finds unused dependencies
  • pylint: Comprehensive linting
  • ruff: Fast Python linter

These tools should help prevent integration test warnings in the future.

Consider adding configurations for these tools to ensure consistent usage across the project:

  • Add .pylintrc for pylint configuration
  • Add pyproject.toml sections for ruff settings
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ad8943f and 115e93b.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py (0 hunks)
  • cognee/modules/cognify/evaluate.py (0 hunks)
  • cognee/modules/cognify/train.py (0 hunks)
  • pyproject.toml (2 hunks)
💤 Files with no reviewable changes (3)
  • cognee/infrastructure/databases/vector/embeddings/FastembedEmbeddingEngine.py
  • cognee/modules/cognify/evaluate.py
  • cognee/modules/cognify/train.py
🔇 Additional comments (2)
pyproject.toml (2)

22-22: Verify impact of dependency changes on integration tests.

Several significant changes that could affect integration tests:

  1. openai upgrade to 1.52.0
  2. New dependencies: requests, aiohttp, typing_extensions, dspy, nest_asyncio
  3. litellm upgrade to 1.49.1

Let's verify the removal of old dependencies didn't break anything:

#!/bin/bash
# Check for usage of removed packages that might affect tests
echo "Checking for potentially broken imports..."
rg -l "import (pyarrow|greenlet|fastembed|tokenizers|transformers)" || echo "No direct usage found"

# Check for OpenAI and LiteLLM usage that might need updates
echo -e "\nChecking OpenAI and LiteLLM usage patterns..."
rg -l "import (openai|litellm)" | while read -r file; do
  echo "=== $file ==="
  rg -A 2 "import (openai|litellm)" "$file"
done

Also applies to: 27-34, 62-62


49-49: ⚠️ Potential issue

Verify compatibility with tenacity major version upgrade.

The upgrade from tenacity ^8.2.3 to ^9.0.0 is a major version change that might introduce breaking changes in retry mechanisms, potentially affecting integration tests.

Run this script to check for tenacity usage that might need updates:

✅ Verification successful

The previous script didn't return any matches for tenacity usage. Let's try a different approach to search for tenacity usage patterns:


Let's check the imports to confirm which tenacity features are being used:


Tenacity v9.0.0 upgrade is safe to proceed

The codebase uses only basic tenacity features (@retry decorator with stop_after_attempt), which have remained stable across major versions. The usage pattern is consistent across all LLM adapters (OpenAI, Generic, Anthropic) and only involves simple retry logic with a maximum of 5 attempts. These core features are maintained in tenacity v9.0.0 with backward compatibility.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tenacity decorators and retry configurations that might need updates
rg -l "from tenacity import|import tenacity|@retry|RetryError" | while read -r file; do
  echo "=== $file ==="
  rg -A 3 "from tenacity import|import tenacity|@retry|RetryError" "$file"
done

Length of output: 1466


Script:

#!/bin/bash
# Search for tenacity usage with more specific patterns
rg -A 5 "@retry|RetryError|stop=|wait=|retry_if|retry_unless" 

Length of output: 7187


Script:

#!/bin/bash
# Search for tenacity imports to see which features are used
rg "from tenacity import" -A 2 cognee/infrastructure/llm/

Length of output: 847

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
cognee/tasks/save_chunks_to_store/save_chunks_to_store.py (1)

Line range hint 4-4: Add/fix return type annotations to improve type safety.

To enhance type safety and prevent potential warnings:

  1. Add return type annotation to save_chunks_to_store
  2. Fix return type annotation for get_previous_chunk_id (currently shows DocumentChunk but returns str | None)
-async def save_chunks_to_store(data_chunks: list[DocumentChunk], collection_name: str):
+async def save_chunks_to_store(data_chunks: list[DocumentChunk], collection_name: str) -> list[DocumentChunk]:

-def get_previous_chunk_id(document_chunks: list[DocumentChunk], current_chunk: DocumentChunk) -> DocumentChunk:
+def get_previous_chunk_id(document_chunks: list[DocumentChunk], current_chunk: DocumentChunk) -> str | None:

Also applies to: 84-84

cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py (1)

Line range hint 62-67: LGTM! Good migration to Pydantic v2's preferred validation method.

The change from parse_obj to model_validate is correct and aligns with Pydantic v2's recommendations. This should help resolve deprecation warnings in integration tests.

Consider extracting the payload creation into a separate helper method since this pattern is repeated twice in the code:

def create_keyword_payload(id: UUID, text: str, chunk: DocumentChunk) -> dict:
    return {
        "uuid": str(id),
        "text": text,
        "chunk_id": str(chunk.chunk_id),
        "document_id": str(chunk.document_id),
    }
cognee/tasks/infer_data_ontology/infer_data_ontology.py (4)

Line range hint 89-89: Fix error logging format.

The error logging uses string formatting with a dictionary key that doesn't match the object model. The file["id"] access will fail as file is a file object, not a dictionary.

-logger.warning("File (%s) has an unknown file type. We are skipping it.", file["id"])
+logger.warning("File (%s) has an unknown file type. We are skipping it.", base_file.id)

Line range hint 84-86: Add defensive checks for chunk processing.

The code assumes the chunking always produces results and accesses the first chunk without validation. This could raise an IndexError if chunking produces empty results.

-                        if chunks_with_ids[0][0] == 1:
+                        if chunks_with_ids and len(chunks_with_ids) > 0 and chunks_with_ids[0][0] == 1:
                             initial_chunks_and_ids.append({base_file.id: chunks_with_ids})

Line range hint 63-67: Enhance error message for unsupported file formats.

The error message could be more specific about which file formats are supported.

-                raise ValueError("Unsupported file format")
+                raise ValueError(f"Unsupported file format. File must be either .json or .csv, got: {file_path}")

57-61: Specify file encoding in file operations.

File operations should explicitly specify the encoding to ensure consistent behavior across different platforms.

-            async with aiofiles.open(file_path, mode="r") as f:
+            async with aiofiles.open(file_path, mode="r", encoding="utf-8") as f:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 115e93b and cb20506.

📒 Files selected for processing (3)
  • cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py (2 hunks)
  • cognee/tasks/infer_data_ontology/infer_data_ontology.py (1 hunks)
  • cognee/tasks/save_chunks_to_store/save_chunks_to_store.py (1 hunks)
🔇 Additional comments (4)
cognee/tasks/save_chunks_to_store/save_chunks_to_store.py (2)

14-17: LGTM! Correct migration to Pydantic v2 validation method.

The change from parse_obj to model_validate is the recommended approach for Pydantic v2 compatibility.


14-17: Verify other Pydantic v1 methods in the codebase.

Let's check for other instances of deprecated Pydantic v1 methods that might need similar updates.

✅ Verification successful

All Pydantic validation methods are using v2 compatible syntax

The codebase is already using the Pydantic v2 compatible model_validate method consistently across all files:

  • save_chunks_to_store.py: Uses model_validate
  • infer_data_ontology.py: Uses model_validate
  • chunk_naive_llm_classifier.py: Uses model_validate
  • modules/settings/get_settings.py: Uses model_validate

No deprecated Pydantic v1 validation methods were found in use.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other potential Pydantic v1 methods that need migration
echo "Checking for parse_obj usage:"
rg "parse_obj" -l

echo -e "\nChecking for other common Pydantic v1 methods:"
rg "parse_raw|parse_file|construct|validate" -l

echo -e "\nChecking for Schema usage (might need update to model_json_schema):"
rg "\.schema\(" -l

Length of output: 876


Script:

#!/bin/bash
# Let's examine the actual usage of validate methods in the relevant files
echo "Checking validate usage in save_chunks_to_store.py:"
rg "validate" -A 2 -B 2 cognee/tasks/save_chunks_to_store/save_chunks_to_store.py

echo -e "\nChecking validate usage in SourceCodeGraph.py:"
rg "validate" -A 2 -B 2 cognee/shared/SourceCodeGraph.py

echo -e "\nChecking validate usage in infer_data_ontology.py:"
rg "validate" -A 2 -B 2 cognee/tasks/infer_data_ontology/infer_data_ontology.py

echo -e "\nChecking validate usage in chunk_naive_llm_classifier.py:"
rg "validate" -A 2 -B 2 cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py

echo -e "\nChecking for model_validate usage (Pydantic v2):"
rg "model_validate" -A 2 -B 2

Length of output: 4783

cognee/tasks/chunk_naive_llm_classifier/chunk_naive_llm_classifier.py (1)

Line range hint 101-106: Consistent application of Pydantic v2 migration.

The change is correctly applied here as well, maintaining consistency with the previous instance.

Let's verify if there are any remaining instances of parse_obj that might need updating:

✅ Verification successful

Migration to model_validate is complete across the codebase

The verification confirms that there are no remaining instances of parse_obj in the codebase, while model_validate is consistently used across multiple files including the reviewed changes. The migration appears to be complete and correctly implemented.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining usage of parse_obj in the codebase
rg "parse_obj" --type python

Length of output: 59


Script:

#!/bin/bash
# Search for any remaining usage of parse_obj in the codebase without file type restriction
rg "parse_obj"

# Also search for model_validate to understand the migration coverage
rg "model_validate"

Length of output: 650

cognee/tasks/infer_data_ontology/infer_data_ontology.py (1)

59-59: LGTM: Pydantic v2 compatibility update.

The change from parse_obj to model_validate aligns with Pydantic v2's preferred validation method. This should help resolve integration test warnings.

Let's verify if there are any remaining instances of deprecated parse_obj usage:

✅ Verification successful

Pydantic v2 migration verified - change is consistent with codebase

The verification shows that:

  1. All validation methods in the codebase consistently use model_validate, with no remaining instances of deprecated parse_obj
  2. The NodeModel class is properly defined as a Pydantic model
  3. The codebase uses Pydantic v2 features (pydantic_settings imports indicate v2 usage)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining usage of deprecated parse_obj
rg "parse_obj" --type py

Length of output: 24


Script:

#!/bin/bash
# Let's try a more comprehensive search for Pydantic validation methods
# This will help us verify if the codebase consistently uses model_validate
rg "model_validate|parse_obj|parse_raw|parse_file" --type py

Length of output: 678


Script:

#!/bin/bash
# Let's also check for NodeModel usage to ensure we're looking at the right context
rg "class.*NodeModel" -A 5 --type py

Length of output: 544


Script:

#!/bin/bash
# Additionally, let's check for any Pydantic imports to ensure consistent version usage
rg "from pydantic|import pydantic" --type py

Length of output: 3656

@0xideas 0xideas changed the title Fix/integration test warnings COG-485 - Fix/integration test warnings Nov 5, 2024
@Vasilije1990 Vasilije1990 requested a review from dexters1 November 5, 2024 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants