-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cog 170 pgvector adapter #158
Changes from 31 commits
d68a3be
c62dfdd
268396a
9fbf2d8
9b9ae6c
aa26eab
02cd240
58e5854
325e6cd
2cd2557
7f7b015
d2772d2
240c660
05e4ef3
f8babba
9f4b8f2
4c381a3
9461ba0
2cedcbe
71c1374
4a73505
a358168
7b2022e
88ded6e
8002db7
dbc86e2
c7ed46d
6b9a142
c78627f
d30c337
0e1533a
195929e
dc46304
0c6f019
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
name: test | pgvector | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- main | ||
workflow_dispatch: | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
cancel-in-progress: true | ||
|
||
env: | ||
RUNTIME__LOG_LEVEL: ERROR | ||
|
||
jobs: | ||
get_docs_changes: | ||
name: docs changes | ||
uses: ./.github/workflows/get_docs_changes.yml | ||
|
||
run_pgvector_integration_test: | ||
name: test | ||
needs: get_docs_changes | ||
if: needs.get_docs_changes.outputs.changes_outside_docs == 'true' | ||
runs-on: ubuntu-latest | ||
defaults: | ||
run: | ||
shell: bash | ||
services: | ||
postgres: | ||
image: pgvector/pgvector:pg17 | ||
env: | ||
POSTGRES_USER: cognee | ||
POSTGRES_PASSWORD: cognee | ||
POSTGRES_DB: cognee_db | ||
options: >- | ||
--health-cmd pg_isready | ||
--health-interval 10s | ||
--health-timeout 5s | ||
--health-retries 5 | ||
ports: | ||
- 5432:5432 | ||
|
||
steps: | ||
- name: Check out | ||
uses: actions/checkout@master | ||
|
||
- name: Setup Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11.x' | ||
|
||
- name: Install Poetry | ||
uses: snok/[email protected] | ||
with: | ||
virtualenvs-create: true | ||
virtualenvs-in-project: true | ||
installer-parallel: true | ||
|
||
- name: Install dependencies | ||
run: poetry install -E postgres --no-interaction | ||
|
||
- name: Run default PGVector | ||
env: | ||
ENV: 'dev' | ||
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
run: poetry run python ./cognee/tests/test_pgvector.py |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
from typing import Dict | ||
|
||
from ..relational.config import get_relational_config | ||
|
||
class VectorConfig(Dict): | ||
vector_db_url: str | ||
vector_db_key: str | ||
|
@@ -26,6 +28,25 @@ def create_vector_engine(config: VectorConfig, embedding_engine): | |
api_key = config["vector_db_key"], | ||
embedding_engine = embedding_engine | ||
) | ||
elif config["vector_db_provider"] == "pgvector": | ||
from .pgvector.PGVectorAdapter import PGVectorAdapter | ||
|
||
# Get configuration for postgres database | ||
relational_config = get_relational_config() | ||
db_username = relational_config.db_username | ||
db_password = relational_config.db_password | ||
db_host = relational_config.db_host | ||
db_port = relational_config.db_port | ||
db_name = relational_config.db_name | ||
|
||
connection_string: str = ( | ||
f"postgresql+asyncpg://{db_username}:{db_password}@{db_host}:{db_port}/{db_name}" | ||
) | ||
Comment on lines
+42
to
+44
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Use SQLAlchemy's Leveraging SQLAlchemy's Refactor the code as follows: +from sqlalchemy.engine import URL
+
# Get configuration for postgres database
relational_config = get_relational_config()
db_username = relational_config.db_username
db_password = relational_config.db_password
db_host = relational_config.db_host
db_port = relational_config.db_port
db_name = relational_config.db_name
- connection_string: str = (
- f"postgresql+asyncpg://{db_username}:{db_password}@{db_host}:{db_port}/{db_name}"
- )
+ connection_url = URL.create(
+ drivername="postgresql+asyncpg",
+ username=db_username,
+ password=db_password,
+ host=db_host,
+ port=db_port,
+ database=db_name
+ ) Update the adapter initialization: - return PGVectorAdapter(connection_string,
+ return PGVectorAdapter(connection_url,
config["vector_db_key"],
embedding_engine
)
Ensure special characters in credentials are URL-encoded When constructing the connection string, special characters in Apply this diff to URL-encode the credentials: +from urllib.parse import quote_plus
+
# Get configuration for postgres database
relational_config = get_relational_config()
db_username = relational_config.db_username
db_password = relational_config.db_password
db_host = relational_config.db_host
db_port = relational_config.db_port
db_name = relational_config.db_name
connection_string: str = (
- f"postgresql+asyncpg://{db_username}:{db_password}@{db_host}:{db_port}/{db_name}"
+ f"postgresql+asyncpg://{quote_plus(db_username)}:{quote_plus(db_password)}@{db_host}:{db_port}/{db_name}"
)
|
||
|
||
return PGVectorAdapter(connection_string, | ||
config["vector_db_key"], | ||
embedding_engine | ||
) | ||
else: | ||
from .lancedb.LanceDBAdapter import LanceDBAdapter | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize Database Initialization Based on Configuration
Initializing both databases every time
add
is called may introduce unnecessary overhead, especially if only one database is in use. Consider initializing databases conditionally based on the configuration or the specific requirements of the operation.Would you like assistance in modifying the code to initialize databases conditionally based on configuration settings?
Consider Adding Error Handling for Database Initialization
Initializing both databases without error handling might lead to unhandled exceptions if a database fails to initialize. Consider wrapping the initialization calls in try-except blocks to handle potential exceptions gracefully.
Apply this diff to include exception handling:
📝 Committable suggestion