-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: metrics in neo4j adapter [COG-1082] #487
base: dev
Are you sure you want to change the base?
Conversation
…-tokens-to-metric-table
…add-num-tokens-to-metric-table
WalkthroughThis pull request updates various functions and methods across multiple modules to incorporate an Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Store as store_descriptive_metrics
participant GraphEngine as get_graph_metrics
participant Neo4jAdapter
Client->>Store: Call store_descriptive_metrics(data_points, include_optional)
Store->>GraphEngine: get_graph_metrics(include_optional)
GraphEngine->>Neo4jAdapter: Check graph_exists("myGraph")
alt Graph does not exist
Neo4jAdapter->>Neo4jAdapter: project_entire_graph("myGraph")
end
Neo4jAdapter->>Neo4jAdapter: Compute detailed graph metrics
Neo4jAdapter-->>GraphEngine: Return computed metrics
GraphEngine-->>Store: Return metrics
Store-->>Client: Return final result
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
…og-1082-metrics-in-networkx-adapter
…g-1082-metrics-in-neo4j-adapter
|
||
async def _get_num_connected_components(): | ||
graph_name = "myGraph" | ||
await self.drop_graph(graph_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there reason we call drop_graph
and project_entire_graph
again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to do it once. I kept it in here to keep these methods interchangeable in case we want to remove one of them or change the order.
What do you think, should I remove one of them?
e89c9b9
to
27feae8
Compare
27feae8
to
af8e798
Compare
…g-1082-metrics-in-neo4j-adapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 2
🧹 Nitpick comments (8)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (4)
573-577
: Add logging after dropping the graph.
It might be helpful to log whether the graph was successfully dropped or was absent, for better traceability in production.async def drop_graph(self, graph_name="myGraph"): if await self.graph_exists(graph_name): drop_query = f"CALL gds.graph.drop('{graph_name}');" await self.query(drop_query) + logger.debug(f"Dropped graph '{{graph_name}}' successfully.")
633-636
: Diameter not yet implemented.
If diameter is critical, consider GDS Shortest Path or BFS expansions. Let us know if you’d like assistance with a workable approach.
637-642
: Average shortest path not yet implemented.
Likewise, GDS offers built-in algorithms for average path length. Let us know if you’d like to integrate it.
643-645
: Average clustering not yet implemented.
For completeness, you may explore GDS or external libraries to compute clustering.cognee/infrastructure/databases/graph/graph_db_interface.py (1)
59-59
: Document the new parameterinclude_optional
.
Adding a short docstring describing its usage will help future maintainers understand which metrics are impacted by this flag.cognee/modules/data/methods/store_descriptive_metrics.py (1)
26-29
: Add validation for the include_optional parameter.Consider adding validation for the
include_optional
parameter to ensure it's a boolean value.async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool): + if not isinstance(include_optional, bool): + raise ValueError("include_optional must be a boolean value") db_engine = get_relational_engine() graph_engine = await get_graph_engine() graph_metrics = await graph_engine.get_graph_metrics(include_optional)cognee/api/v1/cognify/cognify_v2.py (1)
168-168
: Consider making include_optional configurable.The
include_optional
parameter is hardcoded toTrue
. Consider making this configurable through the cognify config to allow flexibility in whether optional metrics are computed.- Task(store_descriptive_metrics, include_optional=True), + Task(store_descriptive_metrics, include_optional=cognee_config.include_optional_metrics),cognee/infrastructure/databases/graph/networkx/adapter.py (1)
416-422
: Improve error handling in clustering coefficient calculation.The current implementation swallows exception details. Consider logging the full exception traceback for better debugging.
def _get_avg_clustering(graph): try: return nx.average_clustering(nx.DiGraph(graph)) except Exception as e: - logger.warning("Failed to calculate clustering coefficient: %s", e) + logger.warning("Failed to calculate clustering coefficient", exc_info=True) return None
🛑 Comments failed to post (2)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)
647-649:
⚠️ Potential issuePotential index/key error in node/edge data extraction.
nodes[0]["nodes"]
oredges[0]["elements"]
might raise an exception if the query returns an empty list or no matching keys. Consider validating non-empty results.num_nodes = len(nodes[0].get("nodes", [])) if nodes and "nodes" in nodes[0] else 0 num_edges = len(edges[0].get("elements", [])) if edges and "elements" in edges[0] else 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.num_nodes = len(nodes[0].get("nodes", [])) if nodes and "nodes" in nodes[0] else 0 num_edges = len(edges[0].get("elements", [])) if edges and "elements" in edges[0] else 0
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
442-447: 🛠️ Refactor suggestion
Use None instead of -1 for missing optional metrics.
Using -1 as a sentinel value for missing optional metrics could be misleading as it might be interpreted as a valid metric value. Consider using None instead.
optional_metrics = { - "num_selfloops": -1, - "diameter": -1, - "avg_shortest_path_length": -1, - "avg_clustering": -1, + "num_selfloops": None, + "diameter": None, + "avg_shortest_path_length": None, + "avg_clustering": None, }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.optional_metrics = { "num_selfloops": None, "diameter": None, "avg_shortest_path_length": None, "avg_clustering": None, }
81a4aa3
to
f2ad1d4
Compare
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit
New Features
Improvements