Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added boilerplate for user management #123

Merged
merged 48 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
866270b
added boilerplate for user management
Vasilije1990 Jul 21, 2024
77e8c1b
add sqlalchemy engine
Vasilije1990 Jul 21, 2024
e785b30
Initial functional user auth
Vasilije1990 Jul 22, 2024
14e1eba
Fixes for user flow with group management
Vasilije1990 Jul 23, 2024
36e156e
Fixes to the model and adding the read info to the graph
Vasilije1990 Jul 23, 2024
3f5e665
Update cognee/api/client.py
Vasilije1990 Jul 24, 2024
e7b0e71
Fixes to translation services
Vasilije1990 Jul 24, 2024
a93bd53
Merge remote-tracking branch 'origin/COG-206' into COG-206
Vasilije1990 Jul 24, 2024
9616545
Exposed user management
Vasilije1990 Jul 26, 2024
218d322
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
b4d1a73
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
7930586
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
797e7ba
Updates to searches
Vasilije1990 Jul 27, 2024
66749fa
Updates to functions
Vasilije1990 Jul 27, 2024
2717272
Merge remote-tracking branch 'origin/main' into COG-206
borisarzentar Aug 1, 2024
401167b
fix: enable sqlalchemy adapter
borisarzentar Aug 4, 2024
07e2bc1
Fixes to the pipeline
Vasilije1990 Aug 5, 2024
085ca5e
Fixes to the users
Vasilije1990 Aug 5, 2024
7d3e124
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
b5a3b69
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
9a2cde9
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
6035010
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
709a10c
fix: add dataset and data models
borisarzentar Aug 5, 2024
73dd6c2
Fix to neo4j flow
Vasilije1990 Aug 6, 2024
0519986
Merge remote-tracking branch 'origin/COG-206' into COG-206
Vasilije1990 Aug 6, 2024
cb9bfa2
fix: search results preview
borisarzentar Aug 6, 2024
4749995
Merge remote-tracking branch 'origin/COG-206' into COG-206
borisarzentar Aug 6, 2024
7846291
fix: add buildx action
borisarzentar Aug 6, 2024
acf41ff
fix: upgrade setup-buildx-action
borisarzentar Aug 6, 2024
3f35a45
fix: add setup_docker job
borisarzentar Aug 6, 2024
3e3134b
fix: fix debugpy version
borisarzentar Aug 6, 2024
0dad12c
fix: upgrade buildx action
borisarzentar Aug 6, 2024
a590c6a
fix: run tests on ubuntu
borisarzentar Aug 6, 2024
896a2ce
fix: add postgres service to tests
borisarzentar Aug 6, 2024
e492a18
fix: add env variables to test workflows
borisarzentar Aug 6, 2024
ed6e8eb
fix: wait for postgres to be ready before tests
borisarzentar Aug 6, 2024
a34acbc
fix: update neo4j lib
borisarzentar Aug 6, 2024
1c89119
fix: uuid parsing of search results
borisarzentar Aug 6, 2024
8a63aa3
fix: run buildx on all platforms
borisarzentar Aug 6, 2024
4f72eb5
fix: install buildx on all platforms
borisarzentar Aug 6, 2024
6d38bcd
fix: add QEMU
borisarzentar Aug 6, 2024
887d4e1
fix: add install_docker step in github action workflow
borisarzentar Aug 6, 2024
fd20fac
fix: add steps
borisarzentar Aug 6, 2024
8fa572a
fix: add runs-on
borisarzentar Aug 6, 2024
9927855
fix: move if
borisarzentar Aug 6, 2024
841e7b5
fix: add needs
borisarzentar Aug 6, 2024
f808cf1
fix: move install docker job
borisarzentar Aug 6, 2024
e3c3f35
fix: remove macos test
borisarzentar Aug 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 79 additions & 4 deletions cognee/api/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel

from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 31-31, 53-59


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 31-32, 53-60

# Set up logging
logging.basicConfig(
level=logging.INFO, # Set the logging level (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL)
Expand All @@ -26,8 +28,13 @@
traces_sample_rate = 1.0,
profiles_sample_rate = 1.0,
)

app = FastAPI(debug = os.getenv("ENV") != "prod")
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app = FastAPI(debug = os.getenv("ENV") != "prod", lifespan=lifespan)
Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield

Also applies to: 90-95

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield

Also applies to: 90-95


origins = [
"http://frontend:3000",
Expand All @@ -43,6 +50,74 @@
allow_headers=["*"],
)

from contextlib import asynccontextmanager

from fastapi import Depends, FastAPI

from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Committable suggestion was skipped due to low confidence.

Tools
Ruff

46-46: Module level import not at top of file

(E402)


48-48: Module level import not at top of file

(E402)


50-50: Module level import not at top of file

(E402)


51-51: Module level import not at top of file

(E402)


52-52: Module level import not at top of file

(E402)


app.include_router(
fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(
fastapi_users.get_register_router(UserRead, UserCreate),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_reset_password_router(),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_verify_router(UserRead),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_users_router(UserRead, UserUpdate),
prefix="/users",
tags=["users"],
)

app.include_router(permission_router, prefix="/manage", tags=["management"])
@app.get("/authenticated-route")
async def authenticated_route(user: User = Depends(current_active_user)):
return {"message": f"Hello {user.email}!"}

Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function authenticated_route.

The function authenticated_route is defined twice, which can cause conflicts.

- @app.get("/authenticated-route")
- async def authenticated_route(user: User = Depends(current_active_user)):
-    return {"message": f"Hello {user.email}!"}

Also applies to: 120-123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function authenticated_route.

The function authenticated_route is defined twice, which can cause conflicts.

- @app.get("/authenticated-route")
- async def authenticated_route(user: User = Depends(current_active_user)):
-    return {"message": f"Hello {user.email}!"}

Also applies to: 120-123

@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app.include_router(
fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(
fastapi_users.get_register_router(UserRead, UserCreate),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_reset_password_router(),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_verify_router(UserRead),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_users_router(UserRead, UserUpdate),
prefix="/users",
tags=["users"],
)



@app.get("/")
async def root():
"""
Expand Down Expand Up @@ -267,8 +342,8 @@ def start_api_server(host: str = "0.0.0.0", port: int = 8000):
relational_config.create_engine()

from cognee.modules.data.deletion import prune_system, prune_data
asyncio.run(prune_data())
asyncio.run(prune_system(metadata = True))
# asyncio.run(prune_data())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please uncomment this after you are done.

# asyncio.run(prune_system(metadata = True))

uvicorn.run(app, host = host, port = port)
except Exception as e:
Expand Down
24 changes: 22 additions & 2 deletions cognee/api/v1/cognify/cognify_v2.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
import asyncio
import hashlib
import logging
import uuid
from typing import Union

from sqlalchemy.ext.asyncio import AsyncSession

from cognee.infrastructure.databases.graph import get_graph_config
from cognee.infrastructure.databases.relational.user_authentication.authentication_db import async_session_maker
from cognee.infrastructure.databases.relational.user_authentication.users import get_user_permissions, fastapi_users
from cognee.modules.cognify.config import get_cognify_config
from cognee.infrastructure.databases.relational.config import get_relationaldb_config
from cognee.modules.data.processing.document_types.AudioDocument import AudioDocument
Expand All @@ -25,7 +31,21 @@

update_status_lock = asyncio.Lock()

async def cognify(datasets: Union[str, list[str]] = None, root_node_id: str = None):
class PermissionDeniedException(Exception):
def __init__(self, message: str):
self.message = message
super().__init__(self.message)

async def cognify(datasets: Union[str, list[str]] = None, root_node_id: str = None, user_id:str="default_user"):
session: AsyncSession = async_session_maker()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this login in a function in some module and just use it here.
Something like def check_permissions(user) for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this live

user = await fastapi_users.get_user_manager.get(user_id)
user_permissions = await get_user_permissions(user, session)
hash_object = hashlib.sha256(user.encode())
hashed_user_id = hash_object.hexdigest()
required_permission = "write"
if required_permission not in user_permissions:
raise PermissionDeniedException("Not enough permissions")

relational_config = get_relationaldb_config()
db_engine = relational_config.database_engine
create_task_status_table()
Expand Down Expand Up @@ -60,7 +80,7 @@ async def run_cognify_pipeline(dataset_name: str, files: list[dict]):
root_node_id = "ROOT"

tasks = [
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined variables hashed_user_id and user_permissions.

The variables hashed_user_id and user_permissions are used but not defined in the provided code. Ensure that these variables are correctly defined and passed to the Task instances.

-                Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
+                Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Tools
Ruff

83-83: Undefined name hashed_user_id

(F821)


83-83: Undefined name user_permissions

(F821)

Task(establish_graph_topology, topology_model = KnowledgeGraph), # Set the graph topology for the document chunk data
Task(expand_knowledge_graph, graph_model = KnowledgeGraph), # Generate knowledge graphs from the document chunks and attach it to chunk nodes
Task(filter_affected_chunks, collection_name = "chunks"), # Find all affected chunks, so we don't process unchanged chunks
Expand Down
4 changes: 2 additions & 2 deletions cognee/infrastructure/data/models/Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
from datetime import datetime, timezone
from sqlalchemy.orm import relationship, MappedColumn, Mapped
from sqlalchemy import Column, String, DateTime, UUID, Text, JSON
from cognee.infrastructure.databases.relational import ModelBase
from cognee.infrastructure.databases.relational import Base
from .DatasetData import DatasetData

class Data(ModelBase):
class Data(Base):
__tablename__ = "data"

id = Column(UUID, primary_key = True)
Expand Down
4 changes: 2 additions & 2 deletions cognee/infrastructure/data/models/Dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
from datetime import datetime, timezone
from sqlalchemy.orm import relationship, Mapped
from sqlalchemy import Column, Text, DateTime, UUID
from cognee.infrastructure.databases.relational import ModelBase
from cognee.infrastructure.databases.relational import Base
from .DatasetData import DatasetData

class Dataset(ModelBase):
class Dataset(Base):
__tablename__ = "dataset"

id = Column(UUID, primary_key = True)
Expand Down
4 changes: 2 additions & 2 deletions cognee/infrastructure/data/models/DatasetData.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from uuid import uuid4
from datetime import datetime, timezone
from sqlalchemy import Column, DateTime, UUID, ForeignKey
from cognee.infrastructure.databases.relational import ModelBase
from cognee.infrastructure.databases.relational import Base

class DatasetData(ModelBase):
class DatasetData(Base):
__tablename__ = "dataset_data"

id = Column(UUID, primary_key = True, default = uuid4())
Expand Down
8 changes: 6 additions & 2 deletions cognee/infrastructure/databases/relational/ModelBase.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
from sqlalchemy.orm import declarative_base

ModelBase = declarative_base()
from sqlalchemy.orm import DeclarativeBase


class Base(DeclarativeBase):
pass

2 changes: 1 addition & 1 deletion cognee/infrastructure/databases/relational/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .ModelBase import ModelBase
from .ModelBase import Base
from .DatabaseEngine import DatabaseEngine
from .sqlite.SqliteEngine import SqliteEngine
from .duckdb.DuckDBAdapter import DuckDBAdapter
Expand Down
10 changes: 7 additions & 3 deletions cognee/infrastructure/databases/relational/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,22 @@

class RelationalConfig(BaseSettings):
db_path: str = os.path.join(get_absolute_path(".cognee_system"), "databases")
db_name: str = "cognee.db"
db_name: str = "cognee_db"
db_host: str = "localhost"
db_port: str = "5432"
db_user: str = "cognee"
db_password: str = "cognee"
database_engine: object = create_relational_engine(db_path, db_name)
db_provider: str = "postgresql+asyncpg"
# database_engine: object = create_relational_engine(db_path, db_name, db_provider)
db_file_path: str = os.path.join(db_path, db_name)


model_config = SettingsConfigDict(env_file = ".env", extra = "allow")

def create_engine(self):
self.db_file_path = os.path.join(self.db_path, self.db_name)
self.database_engine = create_relational_engine(self.db_path, self.db_name)
self.database_engine = create_relational_engine(self.db_path, self.db_name, self.db_provider, self.db_host, self.db_port, self.db_user, self.db_password)
return self.database_engine

def to_dict(self) -> dict:
return {
Expand All @@ -29,6 +32,7 @@ def to_dict(self) -> dict:
"db_user": self.db_user,
"db_password": self.db_password,
"db_engine": self.database_engine,
"db_provider": self.db_provider,
}

@lru_cache
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,40 @@
from enum import Enum

from cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter import SQLAlchemyAdapter
from cognee.infrastructure.files.storage import LocalStorage
from cognee.infrastructure.databases.relational import DuckDBAdapter

def create_relational_engine(db_path: str, db_name: str):

class DBProvider(Enum):
DUCKDB = "duckdb"
POSTGRES = "postgresql+asyncpg"

def create_relational_engine(db_path: str, db_name: str, db_provider:str, db_host:str, db_port:str, db_user:str, db_password:str):
LocalStorage.ensure_directory_exists(db_path)

return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
provider = DBProvider(db_provider)

if provider == DBProvider.DUCKDB:
# return DuckDBAdapter(
# db_name = db_name,
# db_path = db_path,
# )
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
db_host=db_host,
db_port=db_port,
db_user=db_user,
db_password=db_password
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
db_host= db_host,
db_port= db_port,
db_user= db_user,
db_password= db_password
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code and add error handling for invalid db_provider values.

The function correctly selects the database adapter based on the db_provider value. However, the commented-out code for DuckDBAdapter should be removed to maintain code cleanliness. Additionally, error handling for invalid db_provider values should be added.

-    # return DuckDBAdapter(
-    #     db_name = db_name,
-    #     db_path = db_path,
-    # )
+    try:
+        provider = DBProvider(db_provider)
+    except ValueError:
+        raise ValueError(f"Invalid database provider: {db_provider}")

Committable suggestion was skipped due to low confidence.

Loading
Loading