-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pgvector search #360
Fix pgvector search #360
Conversation
Resolve issue regarding UUID being concatenated instead of string
…en using vector search Issue happens when search is called in a session without previously adding data or creating tables as an import of Vector column type was missing Fix
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughThe changes primarily involve modifications to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
cognee/modules/graph/utils/retrieve_existing_edges.py (1)
56-56
: Approve explicit string conversion but suggest a more robust solutionThe explicit string conversion for edge components is a good fix for type consistency. However, consider using a more robust key construction method:
- existing_edges_map[str(edge[0]) + str(edge[1]) + edge[2]] = True + existing_edges_map[f"{edge[0]}:{edge[1]}:{edge[2]}"] = TrueUsing an f-string with a separator makes the key format more explicit and easier to debug.
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
74-74
: Consider refactoring vector search operationsThere's an opportunity to improve the vector-related operations:
- Extract common vector search logic from
get_distance_from_collection_elements
andsearch
methods- Implement the TODO for similarity score normalization
Example refactor for common search logic:
async def _execute_vector_search( self, collection_name: str, query_vector: List[float], limit: Optional[int] = None ) -> List[tuple]: PGVectorDataPoint = await self.get_table(collection_name) query = select( PGVectorDataPoint, PGVectorDataPoint.c.vector.cosine_distance(query_vector).label("similarity"), ).order_by("similarity") if limit: query = query.limit(limit) async with self.get_async_session() as session: results = await session.execute(query) return [row for row in results]For similarity score normalization:
def _normalize_similarity_scores(self, results: List[tuple]) -> List[ScoredResult]: similarities = [row.similarity for row in results] normalized_scores = normalize_distances(similarities) return [ ScoredResult( id=UUID(str(row.id)), payload=row.payload, score=normalized_scores[idx] ) for idx, row in enumerate(results) ]Also applies to: 111-111
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
(3 hunks)cognee/modules/graph/utils/retrieve_existing_edges.py
(1 hunks)
🔇 Additional comments (2)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
74-74
: Approve consistent Vector column definitions
The vector column definitions now consistently use self.Vector, which aligns with the class-level import.
Also applies to: 111-111
41-44
: Approve class-level Vector import but verify table reflection
Moving the Vector import to class level is good practice. However, let's verify that all table operations handle the Vector type correctly.
✅ Verification successful
Let me analyze the usage of Vector across different adapters to ensure there are no conflicts or inconsistencies.
Vector import and usage is correctly implemented
The verification shows that:
- The
Vector
import in PGVectorAdapter is unique and doesn't conflict with other adapters - LanceDB uses a different
Vector
class fromlancedb.pydantic
- All Vector column definitions in PGVectorAdapter use
self.Vector
, ensuring consistent usage - Table operations properly handle the Vector type through dynamic table loading with
get_table
method
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if Vector type is used consistently across the codebase
# Look for any other Vector imports that might conflict
rg "from pgvector.sqlalchemy import Vector"
# Check for direct Vector usage without self.Vector
rg "Vector\(" --type py
Length of output: 969
Script:
#!/bin/bash
# Check the full context of Vector usage in LanceDB adapter
ast-grep --pattern 'from $_ import Vector'
# Get more context around Vector usage in LanceDB
rg "Vector" -B 2 -A 2 cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
# Check if there are any table reflection or dynamic table loading code
rg "Table|MetaData" --type py cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
Length of output: 1856
Resolve issue with poetry lock Fix
Summary by CodeRabbit
New Features
Bug Fixes
Refactor