Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add graph_model parameter to cognify api call #377

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Dec 17, 2024

Summary by CodeRabbit

  • New Features
    • Introduced a new function to dynamically create Pydantic models from JSON schemas.
    • Enhanced the cognify endpoint to support flexible data types for the graph_model attribute.
  • Bug Fixes
    • Improved error handling for the cognify function to return appropriate error messages.

Copy link
Contributor

coderabbitai bot commented Dec 17, 2024

Walkthrough

The changes in the get_cognify_router.py file enhance the flexibility of graph model handling in the Cognify API. A new utility function json_to_pydantic_model is introduced to dynamically create Pydantic models from JSON schemas. The CognifyPayloadDTO class now accepts a more flexible type for the graph_model attribute, allowing for dynamic model creation at runtime. The cognify endpoint now includes logic to handle dynamic graph model conversion, providing more robust and adaptable model processing.

Changes

File Change Summary
cognee/api/v1/cognify/routers/get_cognify_router.py - Added json_to_pydantic_model function for dynamic Pydantic model creation
- Updated CognifyPayloadDTO.graph_model type from Optional[BaseModel] to Optional[Any]
- Enhanced cognify endpoint with dynamic graph model handling

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router
    participant CognifyPayload
    participant DynamicModel
    
    Client->>Router: Send Cognify Request
    Router->>CognifyPayload: Parse Payload
    alt Graph Model Provided
        CognifyPayload->>DynamicModel: Create Dynamic Model
        DynamicModel-->>Router: Dynamic Model Instance
    else No Graph Model
        CognifyPayload-->>Router: graph_model = None
    end
    Router->>Router: Process Cognify Request
Loading

Poem

🐰 Dynamically dancing, models so light,
JSON schemas transforming with might!
Flexible types, a rabbit's delight,
Pydantic magic takes playful flight
Cognify router, now smart and bright! 🌟

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)

28-36: Consider separating model conversion logic

The endpoint currently handles both model conversion and business logic. Consider extracting the model conversion into a separate service or utility class for better maintainability and testability.

Suggested structure:

  1. Create a GraphModelConverter service class
  2. Add specific exception types for different failure scenarios
  3. Move the conversion logic out of the route handler

Would you like me to provide a detailed implementation of this architectural improvement?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45cb2c3 and 57ae93b.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/routers/get_cognify_router.py (3 hunks)
🔇 Additional comments (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (2)

2-3: LGTM! Import changes align with implementation needs

The addition of Any type and create_model supports the enhanced flexibility in graph model handling.


12-12: Consider adding runtime type validation for graph_model

While changing to Optional[Any] enables flexibility, it bypasses Pydantic's static type checking. Consider adding runtime validation to ensure the graph_model contains the expected schema structure.

Comment on lines +15 to +19
def json_to_pydantic_model(name: str, json_schema: dict) -> BaseModel:
"""
Create a Pydantic model on the fly from JSON.
"""
return create_model(name, **{k: (type(v), ...) for k, v in json_schema.items()})
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add schema validation and improve type mapping

The current implementation has several potential issues:

  1. No validation of the input schema structure
  2. Assumes all fields are required (...)
  3. Simple type inference might not handle complex nested types correctly

Consider this improved implementation:

def json_to_pydantic_model(name: str, json_schema: dict) -> BaseModel:
    """
    Create a Pydantic model on the fly from JSON.
    """
-    return create_model(name, **{k: (type(v), ...) for k, v in json_schema.items()})
+    def get_field_type(value: Any) -> tuple:
+        if isinstance(value, dict):
+            return (dict, ...)
+        if isinstance(value, list):
+            return (List[Any], ...)
+        return (type(value), None)  # Make fields optional by default
+
+    if not isinstance(json_schema, dict):
+        raise ValueError("JSON schema must be a dictionary")
+
+    return create_model(
+        name,
+        **{k: get_field_type(v) for k, v in json_schema.items()}
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def json_to_pydantic_model(name: str, json_schema: dict) -> BaseModel:
"""
Create a Pydantic model on the fly from JSON.
"""
return create_model(name, **{k: (type(v), ...) for k, v in json_schema.items()})
def json_to_pydantic_model(name: str, json_schema: dict) -> BaseModel:
"""
Create a Pydantic model on the fly from JSON.
"""
def get_field_type(value: Any) -> tuple:
if isinstance(value, dict):
return (dict, ...)
if isinstance(value, list):
return (List[Any], ...)
return (type(value), None) # Make fields optional by default
if not isinstance(json_schema, dict):
raise ValueError("JSON schema must be a dictionary")
return create_model(
name,
**{k: get_field_type(v) for k, v in json_schema.items()}
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth merging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@borisarzentar, yes, the suggested improvements provide additional schema validation and handle nested data structures more robustly. By making fields optional by default, they also reduce the chances of unexpected errors. Unless you foresee any compatibility issues, it seems worth merging.

Comment on lines 28 to 36
try:
# Dynamic conversion of `graph_model` to Pydantic
if payload.graph_model:
graph_model_schema = payload.graph_model
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
print(graph_model_instance)
else:
graph_model_instance = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix control flow and remove debug code

Several issues need attention:

  1. Remove the print statement
  2. Fix indentation of the else clause
  3. Add error handling for model creation
  4. Restructure the nested try-except blocks

Apply this fix:

        try:
            # Dynamic conversion of `graph_model` to Pydantic
            if payload.graph_model:
                graph_model_schema = payload.graph_model
                GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
                graph_model_instance = GraphModelDynamic(**graph_model_schema)
-                print(graph_model_instance)
-        else:
-            graph_model_instance = None
-        try:
-            await cognee_cognify(payload.datasets, user, payload.graph_model)
-        except Exception as error:
+            else:
+                graph_model_instance = None
+            
+            await cognee_cognify(payload.datasets, user, graph_model_instance)
+        except ValueError as error:
+            return JSONResponse(
+                status_code=400,
+                content={"error": f"Invalid graph model schema: {str(error)}"}
+            )
+        except Exception as error:
            return JSONResponse(
                status_code=409,
                content={"error": str(error)}
            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
# Dynamic conversion of `graph_model` to Pydantic
if payload.graph_model:
graph_model_schema = payload.graph_model
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
print(graph_model_instance)
else:
graph_model_instance = None
try:
# Dynamic conversion of `graph_model` to Pydantic
if payload.graph_model:
graph_model_schema = payload.graph_model
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
else:
graph_model_instance = None
await cognee_cognify(payload.datasets, user, graph_model_instance)
except ValueError as error:
return JSONResponse(
status_code=400,
content={"error": f"Invalid graph model schema: {str(error)}"}
)
except Exception as error:
return JSONResponse(
status_code=409,
content={"error": str(error)}
)

Resolve syntax issue with cognify router

Fix
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (2)

17-21: ⚠️ Potential issue

Previous concerns about schema validation remain unaddressed

The current implementation still lacks proper schema validation and type mapping as mentioned in the previous review.

The implementation needs to:

  1. Validate input schema structure
  2. Handle optional fields
  3. Support complex nested types
  4. Add proper error handling

Please refer to the previous review comment for the suggested implementation.


31-40: ⚠️ Potential issue

Clean up implementation and improve error handling

Several issues need to be addressed:

  1. Remove debug print statement
  2. Fix control flow
  3. Add specific error handling for model creation

Please refer to the previous review comment for the detailed implementation fix.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57ae93b and a42ab3e.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/routers/get_cognify_router.py (2 hunks)
🔇 Additional comments (2)
cognee/api/v1/cognify/routers/get_cognify_router.py (2)

2-2: LGTM! Required imports added for new functionality.

The addition of Any type is necessary for the flexible graph_model type in CognifyPayloadDTO.


31-40: Verify compatibility with existing implementations

Let's verify the impact of these changes on the existing codebase.

✅ Verification successful

Changes are compatible with existing implementations

The changes in the code are safe and compatible with the existing codebase for the following reasons:

  1. The default behavior is preserved by using KnowledgeGraph when graph_model is not provided.
  2. The cognee_cognify function in cognify_v2.py already accepts a graph_model parameter of type BaseModel, making it compatible with both the default KnowledgeGraph and any dynamically created models.
  3. The dynamic model creation follows Pydantic's schema, ensuring type safety and compatibility with the graph extraction pipeline.
  4. The existing graph model implementations in data_models.py and models.py show that all graph-related models follow the Pydantic BaseModel pattern, which is consistent with the dynamic model approach.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for existing implementations and usages

# Find all files that import or use cognify endpoint
rg -l "cognify" --type python

# Find existing graph model implementations
rg -l "class.*Graph.*\(BaseModel\)" --type python

# Check for existing calls to cognee_cognify
rg "cognee_cognify\(" --type python -A 2

Length of output: 217


Script:

#!/bin/bash
# Let's try without the python type filter since it's not recognized

# Find all files that import or use cognify endpoint
rg -l "cognify"

# Find existing graph model implementations
rg -l "class.*Graph.*\(BaseModel\)"

# Check for existing calls to cognee_cognify
rg "cognee_cognify\(" -A 2

# Look for any potential graph model related files
fd "graph|model" --type f

# Check the content of the current file to understand the context better
cat "cognee/api/v1/cognify/routers/get_cognify_router.py"

Length of output: 5774


Script:

#!/bin/bash
# Let's check the existing graph model implementations and their structure

# Check KnowledgeGraph implementation
cat "cognee/shared/data_models.py"

# Check other potential graph model implementations
cat "cognee/tasks/graph/models.py"

# Look for any usage patterns of cognee_cognify
cat "cognee/api/v1/cognify/cognify_v2.py"

# Check test files for usage patterns
cat "cognee/tests/test_library.py"

Length of output: 20186

cognee/api/v1/cognify/routers/get_cognify_router.py Outdated Show resolved Hide resolved

class CognifyPayloadDTO(BaseModel):
datasets: List[str]
graph_model: Optional[BaseModel] = None
graph_model: Optional[Any] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a more specific type than Any

Using Optional[Any] reduces type safety and could lead to runtime errors. Consider using a Union type with specific expected types or creating a base interface that all graph models must implement.

-    graph_model: Optional[Any] = None
+    graph_model: Optional[Union[dict, BaseModel]] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
graph_model: Optional[Any] = None
graph_model: Optional[Union[dict, BaseModel]] = None

@borisarzentar borisarzentar changed the title Update get_cognify_router.py Add graph_model parameter to cognify api call Dec 18, 2024
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
cognee/api/v1/cognify/routers/get_cognify_router.py (3)

14-14: ⚠️ Potential issue

Use more specific types instead of Any

Using Optional[Any] significantly reduces type safety. A more specific type would prevent potential runtime errors and provide better IDE support.

-    graph_model: Optional[Any] = None
+    graph_model: Optional[Union[dict, BaseModel]] = None

17-21: ⚠️ Potential issue

Enhance schema validation and type mapping

The current implementation lacks proper validation and type handling:

  1. No schema structure validation
  2. All fields are marked as required
  3. Complex nested types aren't properly handled

The previous suggestion for a more robust implementation is still valid and should be implemented.


31-40: ⚠️ Potential issue

Remove debug code and fix control flow structure

The current implementation contains debug code and has suboptimal control flow structure.

Previous suggestions for fixing control flow and removing debug code should be implemented.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a42ab3e and a34d010.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/routers/get_cognify_router.py (2 hunks)
🔇 Additional comments (1)
cognee/api/v1/cognify/routers/get_cognify_router.py (1)

2-2: LGTM: Import changes are appropriate

The addition of Any from typing is necessary for the new type annotation in CognifyPayloadDTO.

Comment on lines +31 to +40
# Dynamic conversion of `graph_model` to Pydantic
if payload.graph_model:
graph_model_schema = payload.graph_model
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
print(graph_model_instance)
else:
graph_model_instance = KnowledgeGraph()

await cognee_cognify(payload.datasets, user, graph_model_instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add proper validation and error handling

The current implementation lacks proper validation before dynamic model creation.

Add validation and specific error handling:

        try:
            if payload.graph_model:
+               if not isinstance(payload.graph_model, dict):
+                   raise ValueError("graph_model must be a dictionary")
                graph_model_schema = payload.graph_model
                try:
                    GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
                    graph_model_instance = GraphModelDynamic(**graph_model_schema)
                except ValidationError as e:
                    return JSONResponse(
                        status_code=400,
                        content={"error": f"Invalid graph model schema: {str(e)}"}
                    )
            else:
                graph_model_instance = KnowledgeGraph()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Dynamic conversion of `graph_model` to Pydantic
if payload.graph_model:
graph_model_schema = payload.graph_model
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
print(graph_model_instance)
else:
graph_model_instance = KnowledgeGraph()
await cognee_cognify(payload.datasets, user, graph_model_instance)
# Dynamic conversion of `graph_model` to Pydantic
try:
if payload.graph_model:
if not isinstance(payload.graph_model, dict):
raise ValueError("graph_model must be a dictionary")
graph_model_schema = payload.graph_model
try:
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
except ValidationError as e:
return JSONResponse(
status_code=400,
content={"error": f"Invalid graph model schema: {str(e)}"}
)
else:
graph_model_instance = KnowledgeGraph()
await cognee_cognify(payload.datasets, user, graph_model_instance)

💡 Codebase verification

Custom graph models must include nodes and edges fields

The dynamic graph model must be compatible with the base KnowledgeGraph model structure which requires:

  • nodes: List[Node] where each Node has: id, name, type, and description
  • edges: List[Edge] where each Edge has: source_node_id, target_node_id, relationship_name, and optional properties

The code should validate that any custom graph_model schema includes these required fields before creating the dynamic model.

🔗 Analysis chain

Verify graph model compatibility

The code assumes any valid schema will work with cognee_cognify.

Let's verify the expected graph model structure:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for graph model usage patterns in cognee_cognify
ast-grep --pattern 'async def cognify($$$, graph_model: $_) {
  $$$
}'

# Search for KnowledgeGraph usage to understand required interface
rg -A 5 "class KnowledgeGraph"

Length of output: 485


Script:

#!/bin/bash
# Let's check the Node and Edge models to understand the required structure
rg -A 5 "class Node" cognee/shared/data_models.py
rg -A 5 "class Edge" cognee/shared/data_models.py

# Let's also check how cognee_cognify uses the graph_model
ast-grep --pattern 'async def cognee_cognify($_, $_, graph_model: $_) {
  $$$
}'

# Search for any validation or type checking of graph_model
rg -B2 -A5 "graph_model" cognee/api/v1/cognify/

Length of output: 6080

graph_model_schema = payload.graph_model
GraphModelDynamic = json_to_pydantic_model("GraphModelDynamic", graph_model_schema)
graph_model_instance = GraphModelDynamic(**graph_model_schema)
print(graph_model_instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the print statement


class CognifyPayloadDTO(BaseModel):
datasets: List[str]
graph_model: Optional[BaseModel] = None
graph_model: Optional[Any] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing to Any? We assume that graph model will be a pydantic model.

Comment on lines +15 to +19
def json_to_pydantic_model(name: str, json_schema: dict) -> BaseModel:
"""
Create a Pydantic model on the fly from JSON.
"""
return create_model(name, **{k: (type(v), ...) for k, v in json_schema.items()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth merging?

@Vasilije1990 Vasilije1990 changed the title Add graph_model parameter to cognify api call feat: Add graph_model parameter to cognify api call Jan 17, 2025
@borisarzentar
Copy link
Contributor

@Vasilije1990 Please take care of this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants