Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implements graph visualization method for cognee #493

Merged
merged 12 commits into from
Feb 6, 2025

Conversation

hajdul88
Copy link
Contributor

@hajdul88 hajdul88 commented Feb 3, 2025

Description

This PR contains the improvement of the visualization endpoint

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

  • New Features
    • Launched an enhanced interactive network visualization utility that renders dynamic, browser-based graphs. The new feature simplifies execution by directly generating an HTML file showcasing the visualization—complete with interactive elements and an on-screen confirmation—providing a more intuitive and efficient experience.

Copy link
Contributor

coderabbitai bot commented Feb 3, 2025

Walkthrough

The pull request refactors the visualization implementation. In the API module, the visualize_graph function is updated to remove its label parameter and instead calls a new asynchronous function, cognee_network_visualization. A main execution block is added to set up logging and an asyncio event loop. The legacy function create_cognee_style_network_with_logo is removed from both the shared utilities and the tests, with related imports updated accordingly. A new module provides the implementation for cognee_network_visualization, and corresponding tests have been added to verify its HTML output.

Changes

Files Change Summary
cognee/api/v1/visualize/visualize.py Modified visualize_graph: removed the label parameter, updated the internal call to use cognee_network_visualization, and added a main block for setting up logging and an asyncio event loop.
cognee/shared/utils.py
cognee/__init__.py
Removed the create_cognee_style_network_with_logo function and its import; added a new import for cognee_network_visualization.
cognee/modules/visualization/cognee_network_visualization.py Added a new asynchronous function cognee_network_visualization that constructs a directed graph from input data and generates an interactive HTML visualization using D3.js.
cognee/tests/unit/processing/utils/utils_test.py
cognee/tests/unit/modules/visualization/visualization_test.py
Removed the obsolete test for create_cognee_style_network_with_logo and its associated imports; added a new test to validate the HTML output of cognee_network_visualization.

Sequence Diagram(s)

sequenceDiagram
    participant Main
    participant EventLoop
    participant VisualizeGraph
    participant NetworkVisualization

    Main->>EventLoop: Create asyncio event loop
    EventLoop->>VisualizeGraph: Run visualize_graph()
    VisualizeGraph->>NetworkVisualization: Call cognee_network_visualization(graph_data)
    NetworkVisualization-->>VisualizeGraph: Return HTML content
    VisualizeGraph-->>EventLoop: Complete execution
    EventLoop->>Main: Shutdown loop
Loading
sequenceDiagram
    participant Test
    participant NetworkVisualization

    Test->>NetworkVisualization: Invoke with sample graph data
    NetworkVisualization-->>Test: Return generated HTML
    Test->>Test: Assert HTML structure (e.g., <html>, D3.js script)
Loading

Poem

I hopped through lines of code so bright,
Unburdening functions with all my might.
Async flows and visuals now take flight,
Tests and updates in the moonlit night.
A joyful rabbit coding with delight!
🐇✨

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hajdul88 hajdul88 marked this pull request as draft February 3, 2025 16:00
@hajdul88 hajdul88 requested a review from Vasilije1990 February 3, 2025 16:45
@hajdul88 hajdul88 marked this pull request as ready for review February 4, 2025 09:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cognee/shared/utils.py (1)

374-501: Consider configuring file output path and adding error handling.
The function currently writes the HTML to a file in the user’s home directory. If the user lacks permission or if the path is invalid, this could fail silently. Adding optional parameters for the target file path or introducing more robust exception handling (e.g., logging errors or raising an exception) would improve usability.

cognee/api/v1/visualize/visualize.py (1)

10-15: Add docstring and guard for empty graph data.
The new visualize_graph() function currently logs and calls the utility but lacks a docstring describing its purpose or a check if graph_data is empty. Consider adding brief documentation and verifying data before proceeding.

+"""
+Asynchronously fetches the graph data, logs it, and generates a styleable HTML network visualization.
+"""
 async def visualize_graph():
+    if not graph_data or not any(graph_data):
+        logging.warning("No graph data found; skipping visualization.")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d3acc3 and 16a1c95.

📒 Files selected for processing (3)
  • cognee/api/v1/visualize/visualize.py (1 hunks)
  • cognee/shared/utils.py (2 hunks)
  • cognee/tests/unit/processing/utils/utils_test.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: docker-compose-test
🔇 Additional comments (3)
cognee/shared/utils.py (1)

340-341: Function signature appears clear and concise.
The new async def create_cognee_style_network_with_logo(graph_data) function properly accepts a single graph_data parameter. Streamlining the signature makes usage more straightforward throughout the codebase.

cognee/api/v1/visualize/visualize.py (1)

6-7: Async support and logging setup.
Importing asyncio and setup_logging aligns well with the new asynchronous approach. This addition is appropriate and helps maintain a robust application structure.

cognee/tests/unit/processing/utils/utils_test.py (1)

85-112: Comprehensive test coverage for HTML output.
This test method thoroughly checks the generated HTML content for crucial elements (e.g., <html>, D3 script tags). It effectively validates that the function produces a valid visualization scaffold.

cognee/shared/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
cognee/tests/unit/processing/utils/utils_test.py (2)

89-104: Consider moving test data to fixtures.

Moving the test data to fixtures would improve reusability and make the test more maintainable.

+@pytest.fixture
+def graph_data():
+    nodes_data = [
+        (1, {"pydantic_type": "Entity", "name": "Node1", "updated_at": 123, "created_at": 123}),
+        (2, {
+            "pydantic_type": "DocumentChunk",
+            "name": "Node2",
+            "updated_at": 123,
+            "created_at": 123,
+        }),
+    ]
+    edges_data = [(1, 2, "related_to", {})]
+    return (nodes_data, edges_data)
+
 @pytest.mark.asyncio
-async def test_create_cognee_style_network_with_logo():
+async def test_create_cognee_style_network_with_logo(graph_data):
     from pathlib import Path
-
-    nodes_data = [
-        (1, {"pydantic_type": "Entity", "name": "Node1", "updated_at": 123, "created_at": 123}),
-        (
-            2,
-            {
-                "pydantic_type": "DocumentChunk",
-                "name": "Node2",
-                "updated_at": 123,
-                "created_at": 123,
-            },
-        ),
-    ]
-    edges_data = [
-        (1, 2, "related_to", {}),
-    ]
-    graph_data = (nodes_data, edges_data)

108-113: Add more comprehensive assertions.

The current assertions only verify basic HTML structure. Consider adding more specific assertions to validate:

  • Node properties in the output
  • Edge properties in the output
  • Graph styling and configuration
     assert isinstance(html_output, str)
 
     assert "<html>" in html_output
     assert '<script src="https://d3js.org/d3.v5.min.js"></script>' in html_output
     assert "var nodes =" in html_output
     assert "var links =" in html_output
+    # Verify node properties
+    assert '"pydantic_type":"Entity"' in html_output
+    assert '"name":"Node1"' in html_output
+    # Verify edge properties
+    assert '"relation":"related_to"' in html_output
+    # Verify styling
+    assert "linear-gradient(90deg, #101010, #1a1a2e)" in html_output
cognee/infrastructure/visualization/cognee_network_visualization.py (1)

12-17: Make color mapping configurable.

Consider moving the color mapping to a configuration file or accepting it as a parameter for better flexibility.

-    color_map = {
-        "Entity": "#f47710",
-        "EntityType": "#6510f4",
-        "DocumentChunk": "#801212",
-        "default": "#D3D3D3",
-    }
+    color_map = get_color_mapping() # Load from config or accept as parameter
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16a1c95 and 74051c2.

📒 Files selected for processing (4)
  • cognee/__init__.py (1 hunks)
  • cognee/api/v1/visualize/visualize.py (1 hunks)
  • cognee/infrastructure/visualization/cognee_network_visualization.py (1 hunks)
  • cognee/tests/unit/processing/utils/utils_test.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/api/v1/visualize/visualize.py
⏰ Context from checks skipped due to timeout of 90000ms (19)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: docker-compose-test
  • GitHub Check: profiler
🔇 Additional comments (2)
cognee/__init__.py (1)

8-10: LGTM! Good code organization.

Moving the visualization code to a dedicated module improves code organization and maintainability.

cognee/infrastructure/visualization/cognee_network_visualization.py (1)

24-27: Address the TODO comment.

The TODO comment indicates a need to decide which properties to show on nodes and edges.

Please review and decide which properties should be displayed. Consider:

  • Essential properties for visualization
  • Performance impact of including additional properties
  • User experience and readability

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (3)
cognee/shared/utils.py (3)

30-34: Add error handling for NLTK downloads.

The quiet=True parameter could mask download failures. Consider adding error handling to ensure required NLTK resources are available.

-    nltk.download("maxent_ne_chunker", quiet=True)
+    try:
+        nltk.download("maxent_ne_chunker", quiet=True)
+    except Exception as e:
+        logging.error(f"Failed to download NLTK resource: {e}")
+        raise

57-71: Add error handling for file operations.

File operations should have proper error handling to gracefully handle permission issues or disk errors.

-    if not os.path.isdir(home_dir):
-        os.makedirs(home_dir, exist_ok=True)
-    anonymous_id_file = os.path.join(home_dir, ".anon_id")
-    if not os.path.isfile(anonymous_id_file):
-        anonymous_id = str(uuid4())
-        with open(anonymous_id_file, "w", encoding="utf-8") as f:
-            f.write(anonymous_id)
-    else:
-        with open(anonymous_id_file, "r", encoding="utf-8") as f:
-            anonymous_id = f.read()
+    try:
+        if not os.path.isdir(home_dir):
+            os.makedirs(home_dir, exist_ok=True)
+        anonymous_id_file = os.path.join(home_dir, ".anon_id")
+        if not os.path.isfile(anonymous_id_file):
+            anonymous_id = str(uuid4())
+            with open(anonymous_id_file, "w", encoding="utf-8") as f:
+                f.write(anonymous_id)
+        else:
+            with open(anonymous_id_file, "r", encoding="utf-8") as f:
+                anonymous_id = f.read()
+    except OSError as e:
+        logging.error(f"Failed to handle anonymous ID file: {e}")
+        anonymous_id = str(uuid4())  # Fallback to temporary ID

96-96: Add timeout to HTTP request.

The requests.post call should include a timeout to prevent hanging on slow responses.

-    response = requests.post(proxy_url, json=payload)
+    response = requests.post(proxy_url, json=payload, timeout=10)
🧹 Nitpick comments (5)
cognee/shared/utils.py (5)

26-27: Consider making the proxy URL configurable.

The proxy URL is hardcoded to a test environment. Consider moving this to a configuration file or environment variable for better maintainability and environment-specific settings.

-# Analytics Proxy Url, currently hosted by Vercel
-proxy_url = "https://test.prometh.ai"
+# Analytics Proxy Url
+proxy_url = os.getenv("ANALYTICS_PROXY_URL", "https://test.prometh.ai")

340-349: Add return type hint for graph_to_tuple function.

The function's return type should be explicitly defined for better code clarity and type checking.

-def graph_to_tuple(graph):
+def graph_to_tuple(graph) -> tuple[list, list]:

295-295: Move logo image to a separate asset file.

The base64 encoded logo image should be moved to a separate asset file for better maintainability.

Consider creating an assets directory and loading the image from there:

-    logo_url = "..."
+    with open("assets/logo.base64", "r") as f:
+        logo_url = f"data:image/png;base64,{f.read().strip()}"

313-315: Remove unused imports.

The following imports are not used in the function:

  • output_file
  • show
-    from bokeh.plotting import figure, from_networkx
-    from bokeh.models import Circle, MultiLine, HoverTool, ColumnDataSource, Range1d
-    from bokeh.plotting import output_file, show
+    from bokeh.plotting import from_networkx
+    from bokeh.models import Circle, MultiLine

352-366: Improve function docstring with type hints and return value.

Add parameter and return type information to the docstring for better documentation.

-def setup_logging(log_level=logging.INFO):
-    """Sets up the logging configuration."""
+def setup_logging(log_level: int = logging.INFO) -> None:
+    """Sets up the logging configuration.
+    
+    Args:
+        log_level (int): The logging level to use. Defaults to logging.INFO.
+    
+    Returns:
+        None
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74051c2 and c3f5dec.

📒 Files selected for processing (1)
  • cognee/shared/utils.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: docker-compose-test
  • GitHub Check: profiler

import os


async def create_cognee_style_network_with_logo(graph_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put it in modules, as it is not related to infrastructure. Also please rename the file to match the function that is inside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -82,27 +84,30 @@ def test_prepare_nodes():

@pytest.mark.asyncio
async def test_create_cognee_style_network_with_logo():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the test to its own file as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
cognee/tests/unit/modules/visualization/visualization_test.py (1)

8-34: Enhance test coverage with edge cases.

The test only covers the happy path. Add tests for:

  • Empty graph data
  • Large graphs
  • Special characters in node names
  • Invalid node/edge data
+@pytest.mark.asyncio
+async def test_empty_graph():
+    nodes_data = []
+    edges_data = []
+    graph_data = (nodes_data, edges_data)
+    html_output = await cognee_network_visualization(graph_data)
+    assert isinstance(html_output, str)
+    assert "<html>" in html_output

+@pytest.mark.asyncio
+async def test_special_characters():
+    nodes_data = [(1, {"name": "<script>alert('xss')</script>", "pydantic_type": "Entity"})]
+    edges_data = []
+    graph_data = (nodes_data, edges_data)
+    html_output = await cognee_network_visualization(graph_data)
+    assert "<script>alert('xss')</script>" not in html_output
🧹 Nitpick comments (3)
cognee/modules/visualization/cognee_network_visualization.py (1)

24-27: Consider using a configuration for node properties.

The TODO comment suggests uncertainty about which properties to display. Consider moving this configuration to a separate settings file.

+    # Import from configuration
+    EXCLUDED_NODE_PROPERTIES = ["updated_at", "created_at"]
+
-        del node_info["updated_at"]
-        del node_info["created_at"]
+        for prop in EXCLUDED_NODE_PROPERTIES:
+            node_info.pop(prop, None)
cognee/api/v1/visualize/visualize.py (1)

23-31: Consider using asyncio.run() for main block.

The manual event loop management could be simplified using asyncio.run().

 if __name__ == "__main__":
     setup_logging(logging.ERROR)
-    loop = asyncio.new_event_loop()
-    asyncio.set_event_loop(loop)
-    try:
-        loop.run_until_complete(visualize_graph())
-    finally:
-        loop.run_until_complete(loop.shutdown_asyncgens())
+    asyncio.run(visualize_graph())
cognee/tests/unit/modules/visualization/visualization_test.py (1)

28-33: Add more specific HTML structure assertions.

The current assertions are too basic. Add more specific checks for the graph structure.

     assert isinstance(html_output, str)
     assert "<html>" in html_output
     assert '<script src="https://d3js.org/d3.v5.min.js"></script>' in html_output
     assert "var nodes =" in html_output
     assert "var links =" in html_output
+    # Check for specific node data
+    assert '"pydantic_type":"Entity"' in html_output
+    assert '"name":"Node1"' in html_output
+    # Check for specific edge data
+    assert '"relation":"related_to"' in html_output
+    # Check for SVG elements
+    assert '<svg style="position: fixed;"' in html_output
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f5dec and 91e61e0.

📒 Files selected for processing (5)
  • cognee/__init__.py (1 hunks)
  • cognee/api/v1/visualize/visualize.py (1 hunks)
  • cognee/modules/visualization/cognee_network_visualization.py (1 hunks)
  • cognee/tests/unit/modules/visualization/visualization_test.py (1 hunks)
  • cognee/tests/unit/processing/utils/utils_test.py (0 hunks)
💤 Files with no reviewable changes (1)
  • cognee/tests/unit/processing/utils/utils_test.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/init.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: docker-compose-test
🔇 Additional comments (1)
cognee/modules/visualization/cognee_network_visualization.py (1)

45-45: Verify D3.js source integrity.

Using an external CDN without integrity checks poses a security risk.

@hajdul88 hajdul88 merged commit bcd3265 into dev Feb 6, 2025
26 of 27 checks passed
@hajdul88 hajdul88 deleted the feature/cog-1036-improving-visualization-endpoint branch February 6, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants