Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added boilerplate for user management #123

Merged
merged 48 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
866270b
added boilerplate for user management
Vasilije1990 Jul 21, 2024
77e8c1b
add sqlalchemy engine
Vasilije1990 Jul 21, 2024
e785b30
Initial functional user auth
Vasilije1990 Jul 22, 2024
14e1eba
Fixes for user flow with group management
Vasilije1990 Jul 23, 2024
36e156e
Fixes to the model and adding the read info to the graph
Vasilije1990 Jul 23, 2024
3f5e665
Update cognee/api/client.py
Vasilije1990 Jul 24, 2024
e7b0e71
Fixes to translation services
Vasilije1990 Jul 24, 2024
a93bd53
Merge remote-tracking branch 'origin/COG-206' into COG-206
Vasilije1990 Jul 24, 2024
9616545
Exposed user management
Vasilije1990 Jul 26, 2024
218d322
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
b4d1a73
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
7930586
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
797e7ba
Updates to searches
Vasilije1990 Jul 27, 2024
66749fa
Updates to functions
Vasilije1990 Jul 27, 2024
2717272
Merge remote-tracking branch 'origin/main' into COG-206
borisarzentar Aug 1, 2024
401167b
fix: enable sqlalchemy adapter
borisarzentar Aug 4, 2024
07e2bc1
Fixes to the pipeline
Vasilije1990 Aug 5, 2024
085ca5e
Fixes to the users
Vasilije1990 Aug 5, 2024
7d3e124
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
b5a3b69
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
9a2cde9
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
6035010
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
709a10c
fix: add dataset and data models
borisarzentar Aug 5, 2024
73dd6c2
Fix to neo4j flow
Vasilije1990 Aug 6, 2024
0519986
Merge remote-tracking branch 'origin/COG-206' into COG-206
Vasilije1990 Aug 6, 2024
cb9bfa2
fix: search results preview
borisarzentar Aug 6, 2024
4749995
Merge remote-tracking branch 'origin/COG-206' into COG-206
borisarzentar Aug 6, 2024
7846291
fix: add buildx action
borisarzentar Aug 6, 2024
acf41ff
fix: upgrade setup-buildx-action
borisarzentar Aug 6, 2024
3f35a45
fix: add setup_docker job
borisarzentar Aug 6, 2024
3e3134b
fix: fix debugpy version
borisarzentar Aug 6, 2024
0dad12c
fix: upgrade buildx action
borisarzentar Aug 6, 2024
a590c6a
fix: run tests on ubuntu
borisarzentar Aug 6, 2024
896a2ce
fix: add postgres service to tests
borisarzentar Aug 6, 2024
e492a18
fix: add env variables to test workflows
borisarzentar Aug 6, 2024
ed6e8eb
fix: wait for postgres to be ready before tests
borisarzentar Aug 6, 2024
a34acbc
fix: update neo4j lib
borisarzentar Aug 6, 2024
1c89119
fix: uuid parsing of search results
borisarzentar Aug 6, 2024
8a63aa3
fix: run buildx on all platforms
borisarzentar Aug 6, 2024
4f72eb5
fix: install buildx on all platforms
borisarzentar Aug 6, 2024
6d38bcd
fix: add QEMU
borisarzentar Aug 6, 2024
887d4e1
fix: add install_docker step in github action workflow
borisarzentar Aug 6, 2024
fd20fac
fix: add steps
borisarzentar Aug 6, 2024
8fa572a
fix: add runs-on
borisarzentar Aug 6, 2024
9927855
fix: move if
borisarzentar Aug 6, 2024
841e7b5
fix: add needs
borisarzentar Aug 6, 2024
f808cf1
fix: move install docker job
borisarzentar Aug 6, 2024
e3c3f35
fix: remove macos test
borisarzentar Aug 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions cognee/api/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,77 @@
allow_headers=["*"],
)

from contextlib import asynccontextmanager

from fastapi import Depends, FastAPI

from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Committable suggestion was skipped due to low confidence.

Tools
Ruff

46-46: Module level import not at top of file

(E402)


48-48: Module level import not at top of file

(E402)


50-50: Module level import not at top of file

(E402)


51-51: Module level import not at top of file

(E402)


52-52: Module level import not at top of file

(E402)


app.include_router(
fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(
fastapi_users.get_register_router(UserRead, UserCreate),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_reset_password_router(),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_verify_router(UserRead),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_users_router(UserRead, UserUpdate),
prefix="/users",
tags=["users"],
)


@app.get("/authenticated-route")
async def authenticated_route(user: User = Depends(current_active_user)):
return {"message": f"Hello {user.email}!"}

Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function authenticated_route.

The function authenticated_route is defined twice, which can cause conflicts.

- @app.get("/authenticated-route")
- async def authenticated_route(user: User = Depends(current_active_user)):
-    return {"message": f"Hello {user.email}!"}

Also applies to: 120-123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function authenticated_route.

The function authenticated_route is defined twice, which can cause conflicts.

- @app.get("/authenticated-route")
- async def authenticated_route(user: User = Depends(current_active_user)):
-    return {"message": f"Hello {user.email}!"}

Also applies to: 120-123

@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app.include_router(
fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(
fastapi_users.get_register_router(UserRead, UserCreate),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_reset_password_router(),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_verify_router(UserRead),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_users_router(UserRead, UserUpdate),
prefix="/users",
tags=["users"],
)


@app.get("/authenticated-route")
async def authenticated_route(user: User = Depends(current_active_user)):
return {"message": f"Hello {user.email}!"}
Vasilije1990 marked this conversation as resolved.
Show resolved Hide resolved

@app.get("/")
async def root():
"""
Expand Down
5 changes: 4 additions & 1 deletion cognee/infrastructure/databases/relational/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ class RelationalConfig(BaseSettings):
db_port: str = "5432"
db_user: str = "cognee"
db_password: str = "cognee"
database_engine: object = create_relational_engine(db_path, db_name)
db_provider: str = "duckdb"
database_engine: object = create_relational_engine(db_path, db_name, db_provider)
db_file_path: str = os.path.join(db_path, db_name)


model_config = SettingsConfigDict(env_file = ".env", extra = "allow")

def create_engine(self):
Expand All @@ -29,6 +31,7 @@ def to_dict(self) -> dict:
"db_user": self.db_user,
"db_password": self.db_password,
"db_engine": self.database_engine,
"db_provider": self.db_provider,
}

@lru_cache
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,33 @@
from enum import Enum

from cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter import SQLAlchemyAdapter
from cognee.infrastructure.files.storage import LocalStorage
from cognee.infrastructure.databases.relational import DuckDBAdapter
from cognee.infrastructure.databases.relational import DuckDBAdapter, get_relationaldb_config


class DBProvider(Enum):
DUCKDB = "duckdb"
POSTGRES = "postgres"



def create_relational_engine(db_path: str, db_name: str):
def create_relational_engine(db_path: str, db_name: str, db_provider:str):
LocalStorage.ensure_directory_exists(db_path)

return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
llm_config = get_relationaldb_config()

provider = DBProvider(llm_config.llm_provider)


if provider == DBProvider.DUCKDB:

return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for invalid llm_config.llm_provider.

The function does not handle the case where llm_config.llm_provider is not a valid DBProvider. Consider adding error handling for this scenario.

+    try:
+        provider = DBProvider(llm_config.llm_provider)
+    except ValueError:
+        raise ValueError(f"Invalid database provider: {llm_config.llm_provider}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def create_relational_engine(db_path: str, db_name: str, db_provider:str):
LocalStorage.ensure_directory_exists(db_path)
return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
llm_config = get_relationaldb_config()
provider = DBProvider(llm_config.llm_provider)
if provider == DBProvider.DUCKDB:
return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
)
def create_relational_engine(db_path: str, db_name: str, db_provider:str):
LocalStorage.ensure_directory_exists(db_path)
llm_config = get_relationaldb_config()
try:
provider = DBProvider(llm_config.llm_provider)
except ValueError:
raise ValueError(f"Invalid database provider: {llm_config.llm_provider}")
if provider == DBProvider.DUCKDB:
return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
)

Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
import os
from sqlalchemy import create_engine, MetaData, Table, Column, String, Boolean, TIMESTAMP, text
from sqlalchemy.orm import sessionmaker

class SQLAlchemyAdapter():
def __init__(self, db_type: str, db_path: str, db_name: str):
self.db_location = os.path.abspath(os.path.join(db_path, db_name))
self.engine = create_engine(f"{db_type}:///{self.db_location}")
self.Session = sessionmaker(bind=self.engine)

def get_datasets(self):
with self.engine.connect() as connection:
result = connection.execute(text("SELECT DISTINCT schema_name FROM information_schema.tables;"))
tables = [row['schema_name'] for row in result]
return list(
filter(
lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
tables
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and improve filtering logic.

Include type hints and docstrings for better readability and maintainability. Improve filtering logic for readability.

+    def get_datasets(self) -> List[str]:
+        """
+        Retrieves distinct schema names from the database, excluding staging schemas and 'cognee'.
+        """
-        return list(
-            filter(
-                lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
-                tables
-            )
-        )
+        return [schema_name for schema_name in tables if not schema_name.endswith("staging") and schema_name != "cognee"]
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_datasets(self):
with self.engine.connect() as connection:
result = connection.execute(text("SELECT DISTINCT schema_name FROM information_schema.tables;"))
tables = [row['schema_name'] for row in result]
return list(
filter(
lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
tables
)
)
def get_datasets(self) -> List[str]:
"""
Retrieves distinct schema names from the database, excluding staging schemas and 'cognee'.
"""
with self.engine.connect() as connection:
result = connection.execute(text("SELECT DISTINCT schema_name FROM information_schema.tables;"))
tables = [row['schema_name'] for row in result]
return [schema_name for schema_name in tables if not schema_name.endswith("staging") and schema_name != "cognee"]


def get_files_metadata(self, dataset_name: str):
with self.engine.connect() as connection:
result = connection.execute(text(f"SELECT id, name, file_path, extension, mime_type FROM {dataset_name}.file_metadata;"))
return [dict(row) for row in result]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize the SQL query to prevent SQL injection.

The SQL query should be parameterized to prevent SQL injection.

-            result = connection.execute(text(f"SELECT id, name, file_path, extension, mime_type FROM {dataset_name}.file_metadata;"))
+            result = connection.execute(text("SELECT id, name, file_path, extension, mime_type FROM :dataset_name.file_metadata;"), {'dataset_name': dataset_name})

Committable suggestion was skipped due to low confidence.


def create_table(self, schema_name: str, table_name: str, table_config: list[dict]):
fields_query_parts = [f"{item['name']} {item['type']}" for item in table_config]
with self.engine.connect() as connection:
connection.execute(text(f"CREATE SCHEMA IF NOT EXISTS {schema_name};"))
connection.execute(text(f"CREATE TABLE IF NOT EXISTS {schema_name}.{table_name} ({', '.join(fields_query_parts)});"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize the SQL queries to prevent SQL injection.

The SQL queries should be parameterized to prevent SQL injection.

-            connection.execute(text(f"CREATE SCHEMA IF NOT EXISTS {schema_name};"))
-            connection.execute(text(f"CREATE TABLE IF NOT EXISTS {schema_name}.{table_name} ({', '.join(fields_query_parts)});"))
+            connection.execute(text("CREATE SCHEMA IF NOT EXISTS :schema_name;"), {'schema_name': schema_name})
+            connection.execute(text("CREATE TABLE IF NOT EXISTS :schema_name.:table_name (:fields_query_parts);"), {'schema_name': schema_name, 'table_name': table_name, 'fields_query_parts': ', '.join(fields_query_parts)})

Committable suggestion was skipped due to low confidence.


def delete_table(self, table_name: str):
with self.engine.connect() as connection:
connection.execute(text(f"DROP TABLE IF EXISTS {table_name};"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize the SQL query to prevent SQL injection.

The SQL query should be parameterized to prevent SQL injection.

-            connection.execute(text(f"DROP TABLE IF EXISTS {table_name};"))
+            connection.execute(text("DROP TABLE IF EXISTS :table_name;"), {'table_name': table_name})
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_table(self, table_name: str):
with self.engine.connect() as connection:
connection.execute(text(f"DROP TABLE IF EXISTS {table_name};"))
def delete_table(self, table_name: str):
with self.engine.connect() as connection:
connection.execute(text("DROP TABLE IF EXISTS :table_name;"), {'table_name': table_name})


def insert_data(self, schema_name: str, table_name: str, data: list[dict]):
columns = ", ".join(data[0].keys())
values = ", ".join([f"({', '.join([f':{key}' for key in row.keys()])})" for row in data])
insert_query = text(f"INSERT INTO {schema_name}.{table_name} ({columns}) VALUES {values};")
with self.engine.connect() as connection:
connection.execute(insert_query, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize the SQL query to prevent SQL injection.

The SQL query should be parameterized to prevent SQL injection.

-        insert_query = text(f"INSERT INTO {schema_name}.{table_name} ({columns}) VALUES {values};")
+        insert_query = text("INSERT INTO :schema_name.:table_name (:columns) VALUES :values;")
+        with self.engine.connect() as connection:
+            connection.execute(insert_query, {'schema_name': schema_name, 'table_name': table_name, 'columns': columns, 'values': values, **data})

Committable suggestion was skipped due to low confidence.


def get_data(self, table_name: str, filters: dict = None):
with self.engine.connect() as connection:
query = f"SELECT * FROM {table_name}"
if filters:
filter_conditions = " AND ".join([
f"{key} IN ({', '.join([f':{key}{i}' for i in range(len(value))])})" if isinstance(value, list)
else f"{key} = :{key}" for key, value in filters.items()
])
query += f" WHERE {filter_conditions};"
query = text(query)
results = connection.execute(query, filters)
else:
query += ";"
query = text(query)
results = connection.execute(query)
return {result["data_id"]: result["status"] for result in results}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize the SQL query to prevent SQL injection.

The SQL query should be parameterized to prevent SQL injection.

-            query = f"SELECT * FROM {table_name}"
+            query = "SELECT * FROM :table_name"
-                filter_conditions = " AND ".join([
-                    f"{key} IN ({', '.join([f':{key}{i}' for i in range(len(value))])})" if isinstance(value, list)
-                    else f"{key} = :{key}" for key, value in filters.items()
-                ])
-                query += f" WHERE {filter_conditions};"
+                filter_conditions = " AND ".join([
+                    f"{key} IN (:values)" if isinstance(value, list)
+                    else f"{key} = :{key}" for key, value in filters.items()
+                ])
+                query += f" WHERE {filter_conditions};"
+                query = text(query)
+                results = connection.execute(query, {'table_name': table_name, 'values': ', '.join([f':{key}{i}' for i in range(len(value))]), **filters})

Committable suggestion was skipped due to low confidence.


def execute_query(self, query):
with self.engine.connect() as connection:
result = connection.execute(text(query))
return [dict(row) for row in result]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def execute_query(self, query: str) -> List[Dict[str, Any]]:
+        """
+        Executes the provided SQL query.
+        """
-            result = connection.execute(text(query))
+            result = connection.execute(text(":query"), {'query': query})

Committable suggestion was skipped due to low confidence.


def load_cognify_data(self, data):
metadata = MetaData()
cognify_table = Table(
'cognify', metadata,
Column('document_id', String),
Column('layer_id', String),
Column('created_at', TIMESTAMP, server_default=text('CURRENT_TIMESTAMP')),
Column('updated_at', TIMESTAMP, nullable=True, default=None),
Column('processed', Boolean, default=False),
Column('document_id_target', String, nullable=True)
)
metadata.create_all(self.engine)
insert_query = cognify_table.insert().values(document_id=text(':document_id'), layer_id=text(':layer_id'))
with self.engine.connect() as connection:
connection.execute(insert_query, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def load_cognify_data(self, data: List[Dict[str, Any]]) -> None:
+        """
+        Loads data into the `cognify` table.
+        """

Committable suggestion was skipped due to low confidence.


def fetch_cognify_data(self, excluded_document_id: str):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
query = text(f"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just created the table this query won't return anything, so we can just set records to []. Otherwise we can execute the query.

SELECT document_id, layer_id, created_at, updated_at, processed
FROM cognify
WHERE document_id != :excluded_document_id AND processed = FALSE;
""")
records = connection.execute(query, {'excluded_document_id': excluded_document_id}).fetchall()
if records:
document_ids = tuple(record['document_id'] for record in records)
update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
connection.execute(update_query, {'document_ids': document_ids})
return [dict(record) for record in records]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and remove extraneous f prefixes.

Include type hints and docstrings for better readability and maintainability. Remove extraneous f prefixes from f-strings without placeholders.

+    def fetch_cognify_data(self, excluded_document_id: str) -> List[Dict[str, Any]]:
+        """
+        Fetches data from the `cognify` table based on the provided `excluded_document_id`.
+        """
-            query = text(f"""
-                SELECT document_id, layer_id, created_at, updated_at, processed
-                FROM cognify
-                WHERE document_id != :excluded_document_id AND processed = FALSE;
-            """)
+            query = text("""
+                SELECT document_id, layer_id, created_at, updated_at, processed
+                FROM cognify
+                WHERE document_id != :excluded_document_id AND processed = FALSE;
+            """)
-                update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
+                update_query = text("UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def fetch_cognify_data(self, excluded_document_id: str):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
query = text(f"""
SELECT document_id, layer_id, created_at, updated_at, processed
FROM cognify
WHERE document_id != :excluded_document_id AND processed = FALSE;
""")
records = connection.execute(query, {'excluded_document_id': excluded_document_id}).fetchall()
if records:
document_ids = tuple(record['document_id'] for record in records)
update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
connection.execute(update_query, {'document_ids': document_ids})
return [dict(record) for record in records]
def fetch_cognify_data(self, excluded_document_id: str) -> List[Dict[str, Any]]:
"""
Fetches data from the `cognify` table based on the provided `excluded_document_id`.
"""
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
query = text("""
SELECT document_id, layer_id, created_at, updated_at, processed
FROM cognify
WHERE document_id != :excluded_document_id AND processed = FALSE;
""")
records = connection.execute(query, {'excluded_document_id': excluded_document_id}).fetchall()
if records:
document_ids = tuple(record['document_id'] for record in records)
update_query = text("UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
connection.execute(update_query, {'document_ids': document_ids})
return [dict(record) for record in records]
Tools
Ruff

117-121: f-string without any placeholders

Remove extraneous f prefix

(F541)


125-125: f-string without any placeholders

Remove extraneous f prefix

(F541)


def delete_cognify_data(self):
with self.engine.connect() as connection:
connection.execute(text("""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a table here and delete it immediately in the next line. Should we check if table exists and delete if it does?

CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
connection.execute(text("DELETE FROM cognify;"))
connection.execute(text("DROP TABLE cognify;"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and remove redundant table creation logic.

Include type hints and docstrings for better readability and maintainability. Remove redundant table creation logic before deletion.

+    def delete_cognify_data(self) -> None:
+        """
+        Deletes data from the `cognify` table.
+        """
-            connection.execute(text("""
-                CREATE TABLE IF NOT EXISTS cognify (
-                    document_id STRING,
-                    layer_id STRING,
-                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-                    updated_at TIMESTAMP DEFAULT NULL,
-                    processed BOOLEAN DEFAULT FALSE,
-                    document_id_target STRING NULL
-                );
-            """))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_cognify_data(self):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
connection.execute(text("DELETE FROM cognify;"))
connection.execute(text("DROP TABLE cognify;"))
def delete_cognify_data(self) -> None:
"""
Deletes data from the `cognify` table.
"""
with self.engine.connect() as connection:
connection.execute(text("DELETE FROM cognify;"))
connection.execute(text("DROP TABLE cognify;"))


def delete_database(self):
from cognee.infrastructure.files.storage import LocalStorage
LocalStorage.remove(self.db_location)
if LocalStorage.file_exists(self.db_location + ".wal"):
LocalStorage.remove(self.db_location + ".wal")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def delete_database(self) -> None:
+        """
+        Deletes the database file and its write-ahead log file.
+        """

Committable suggestion was skipped due to low confidence.

Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from typing import AsyncGenerator

from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase

from cognee.infrastructure.databases.relational import get_relationaldb_config
from cognee.infrastructure.databases.relational.create_relational_engine import create_relational_engine

DATABASE_URL = "sqlite+aiosqlite:///./test.db"


class Base(DeclarativeBase):
pass


class User(SQLAlchemyBaseUserTableUUID, Base):
pass



llm_config = get_relationaldb_config()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_config -> relationaldb_config. In line 26 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this



engine = create_relational_engine(llm_config.db_path, llm_config.db_name, llm_config.db_provider)
async_session_maker = async_sessionmaker(engine, expire_on_commit=False)


async def create_db_and_tables():
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)


async def get_async_session() -> AsyncGenerator[AsyncSession, None]:
async with async_session_maker() as session:
yield session


async def get_user_db(session: AsyncSession = Depends(get_async_session)):
yield SQLAlchemyUserDatabase(session, User)
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import uuid

from fastapi_users import schemas


class UserRead(schemas.BaseUser[uuid.UUID]):
pass


class UserCreate(schemas.BaseUserCreate):
pass


class UserUpdate(schemas.BaseUserUpdate):
pass
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import uuid
from typing import Optional

from fastapi import Depends, Request
from fastapi_users import BaseUserManager, FastAPIUsers, UUIDIDMixin, models
from fastapi_users.authentication import (
AuthenticationBackend,
BearerTransport,
JWTStrategy,
)
from fastapi_users.db import SQLAlchemyUserDatabase

from app.db import User, get_user_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is add.db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this


SECRET = "SECRET"


class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
reset_password_token_secret = SECRET
verification_token_secret = SECRET

async def on_after_register(self, user: User, request: Optional[Request] = None):
print(f"User {user.id} has registered.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

-    print(f"User {user.id} has registered.")
+    logger.info(f"User {user.id} has registered.")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def on_after_register(self, user: User, request: Optional[Request] = None):
print(f"User {user.id} has registered.")
async def on_after_register(self, user: User, request: Optional[Request] = None):
logger.info(f"User {user.id} has registered.")


async def on_after_forgot_password(
self, user: User, token: str, request: Optional[Request] = None
):
print(f"User {user.id} has forgot their password. Reset token: {token}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

-    print(f"User {user.id} has forgot their password. Reset token: {token}")
+    logger.info(f"User {user.id} has forgot their password. Reset token: {token}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"User {user.id} has forgot their password. Reset token: {token}")
logger.info(f"User {user.id} has forgot their password. Reset token: {token}")


async def on_after_request_verify(
self, user: User, token: str, request: Optional[Request] = None
):
print(f"Verification requested for user {user.id}. Verification token: {token}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

-        print(f"Verification requested for user {user.id}. Verification token: {token}")
+        logger.info(f"Verification requested for user {user.id}. Verification token: {token}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"Verification requested for user {user.id}. Verification token: {token}")
logger.info(f"Verification requested for user {user.id}. Verification token: {token}")



async def get_user_manager(user_db: SQLAlchemyUserDatabase = Depends(get_user_db)):
yield UserManager(user_db)


bearer_transport = BearerTransport(tokenUrl="auth/jwt/login")


def get_jwt_strategy() -> JWTStrategy[models.UP, models.ID]:
return JWTStrategy(secret=SECRET, lifetime_seconds=3600)


auth_backend = AuthenticationBackend(
name="jwt",
transport=bearer_transport,
get_strategy=get_jwt_strategy,
)

fastapi_users = FastAPIUsers[User, uuid.UUID](get_user_manager, [auth_backend])

current_active_user = fastapi_users.current_user(active=True)
Loading