Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added boilerplate for user management #123

Merged
merged 48 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
866270b
added boilerplate for user management
Vasilije1990 Jul 21, 2024
77e8c1b
add sqlalchemy engine
Vasilije1990 Jul 21, 2024
e785b30
Initial functional user auth
Vasilije1990 Jul 22, 2024
14e1eba
Fixes for user flow with group management
Vasilije1990 Jul 23, 2024
36e156e
Fixes to the model and adding the read info to the graph
Vasilije1990 Jul 23, 2024
3f5e665
Update cognee/api/client.py
Vasilije1990 Jul 24, 2024
e7b0e71
Fixes to translation services
Vasilije1990 Jul 24, 2024
a93bd53
Merge remote-tracking branch 'origin/COG-206' into COG-206
Vasilije1990 Jul 24, 2024
9616545
Exposed user management
Vasilije1990 Jul 26, 2024
218d322
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
b4d1a73
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
7930586
Fixes to the ACL model
Vasilije1990 Jul 27, 2024
797e7ba
Updates to searches
Vasilije1990 Jul 27, 2024
66749fa
Updates to functions
Vasilije1990 Jul 27, 2024
2717272
Merge remote-tracking branch 'origin/main' into COG-206
borisarzentar Aug 1, 2024
401167b
fix: enable sqlalchemy adapter
borisarzentar Aug 4, 2024
07e2bc1
Fixes to the pipeline
Vasilije1990 Aug 5, 2024
085ca5e
Fixes to the users
Vasilije1990 Aug 5, 2024
7d3e124
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
b5a3b69
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
9a2cde9
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
6035010
Fixes to the sqlalchemy adapter
Vasilije1990 Aug 5, 2024
709a10c
fix: add dataset and data models
borisarzentar Aug 5, 2024
73dd6c2
Fix to neo4j flow
Vasilije1990 Aug 6, 2024
0519986
Merge remote-tracking branch 'origin/COG-206' into COG-206
Vasilije1990 Aug 6, 2024
cb9bfa2
fix: search results preview
borisarzentar Aug 6, 2024
4749995
Merge remote-tracking branch 'origin/COG-206' into COG-206
borisarzentar Aug 6, 2024
7846291
fix: add buildx action
borisarzentar Aug 6, 2024
acf41ff
fix: upgrade setup-buildx-action
borisarzentar Aug 6, 2024
3f35a45
fix: add setup_docker job
borisarzentar Aug 6, 2024
3e3134b
fix: fix debugpy version
borisarzentar Aug 6, 2024
0dad12c
fix: upgrade buildx action
borisarzentar Aug 6, 2024
a590c6a
fix: run tests on ubuntu
borisarzentar Aug 6, 2024
896a2ce
fix: add postgres service to tests
borisarzentar Aug 6, 2024
e492a18
fix: add env variables to test workflows
borisarzentar Aug 6, 2024
ed6e8eb
fix: wait for postgres to be ready before tests
borisarzentar Aug 6, 2024
a34acbc
fix: update neo4j lib
borisarzentar Aug 6, 2024
1c89119
fix: uuid parsing of search results
borisarzentar Aug 6, 2024
8a63aa3
fix: run buildx on all platforms
borisarzentar Aug 6, 2024
4f72eb5
fix: install buildx on all platforms
borisarzentar Aug 6, 2024
6d38bcd
fix: add QEMU
borisarzentar Aug 6, 2024
887d4e1
fix: add install_docker step in github action workflow
borisarzentar Aug 6, 2024
fd20fac
fix: add steps
borisarzentar Aug 6, 2024
8fa572a
fix: add runs-on
borisarzentar Aug 6, 2024
9927855
fix: move if
borisarzentar Aug 6, 2024
841e7b5
fix: add needs
borisarzentar Aug 6, 2024
f808cf1
fix: move install docker job
borisarzentar Aug 6, 2024
e3c3f35
fix: remove macos test
borisarzentar Aug 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 76 additions & 4 deletions cognee/api/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel

from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 31-31, 53-59


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 31-32, 53-60

# Set up logging
logging.basicConfig(
level=logging.INFO, # Set the logging level (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL)
Expand All @@ -26,8 +28,13 @@
traces_sample_rate = 1.0,
profiles_sample_rate = 1.0,
)

app = FastAPI(debug = os.getenv("ENV") != "prod")
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app = FastAPI(debug = os.getenv("ENV") != "prod", lifespan=lifespan)
Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield

Also applies to: 90-95

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield

Also applies to: 90-95


origins = [
"http://frontend:3000",
Expand All @@ -43,6 +50,71 @@
allow_headers=["*"],
)

from contextlib import asynccontextmanager

from fastapi import Depends, FastAPI

from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Committable suggestion was skipped due to low confidence.

Tools
Ruff

46-46: Module level import not at top of file

(E402)


48-48: Module level import not at top of file

(E402)


50-50: Module level import not at top of file

(E402)


51-51: Module level import not at top of file

(E402)


52-52: Module level import not at top of file

(E402)


app.include_router(
fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(
fastapi_users.get_register_router(UserRead, UserCreate),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_reset_password_router(),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_verify_router(UserRead),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_users_router(UserRead, UserUpdate),
prefix="/users",
tags=["users"],
)

app.include_router(permission_router, prefix="/manage", tags=["management"])

@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app.include_router(
fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(
fastapi_users.get_register_router(UserRead, UserCreate),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_reset_password_router(),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_verify_router(UserRead),
prefix="/auth",
tags=["auth"],
)
app.include_router(
fastapi_users.get_users_router(UserRead, UserUpdate),
prefix="/users",
tags=["users"],
)



@app.get("/")
async def root():
"""
Expand Down Expand Up @@ -267,8 +339,8 @@ def start_api_server(host: str = "0.0.0.0", port: int = 8000):
relational_config.create_engine()

from cognee.modules.data.deletion import prune_system, prune_data
asyncio.run(prune_data())
asyncio.run(prune_system(metadata = True))
# asyncio.run(prune_data())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please uncomment this after you are done.

# asyncio.run(prune_system(metadata = True))

uvicorn.run(app, host = host, port = port)
except Exception as e:
Expand Down
13 changes: 11 additions & 2 deletions cognee/api/v1/add/add.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
import asyncio
import dlt
import duckdb
from fastapi_users import fastapi_users

import cognee.modules.ingestion as ingestion
from cognee.infrastructure.databases.relational.user_authentication.users import give_permission_document, \
get_async_session_context, current_active_user, create_default_user
from cognee.infrastructure.files.storage import LocalStorage
from cognee.modules.ingestion import get_matched_datasets, save_data_to_file
from cognee.shared.utils import send_telemetry
Expand Down Expand Up @@ -48,7 +52,7 @@ async def add(data: Union[BinaryIO, List[BinaryIO], str, List[str]], dataset_nam

return []

async def add_files(file_paths: List[str], dataset_name: str):
async def add_files(file_paths: List[str], dataset_name: str, user_id: str = "default_user"):
base_config = get_base_config()
data_directory_path = base_config.data_root_directory

Expand Down Expand Up @@ -82,12 +86,17 @@ async def add_files(file_paths: List[str], dataset_name: str):
)

@dlt.resource(standalone = True, merge_key = "id")
def data_resources(file_paths: str):
def data_resources(file_paths: str, user_id: str = user_id):
for file_path in file_paths:
with open(file_path.replace("file://", ""), mode = "rb") as file:
classified_data = ingestion.classify(file)

data_id = ingestion.identify(classified_data)
async with get_async_session_context() as session:
if user_id is None:
current_active_user = create_default_user()

give_permission_document(current_active_user, data_id, "write", session= session)

file_metadata = classified_data.get_metadata()

Expand Down
1 change: 1 addition & 0 deletions cognee/api/v1/authenticate_user/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .authenticate_user import authenticate_user
9 changes: 9 additions & 0 deletions cognee/api/v1/authenticate_user/authenticate_user.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from cognee.infrastructure.databases.relational.user_authentication.users import authenticate_user_method


async def authenticate_user():
"""
This function is used to authenticate a user.
"""
output = await authenticate_user_method()
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and improve the docstring.

The function lacks error handling and the docstring could be more descriptive. Consider adding a try-except block and enhancing the docstring.

async def authenticate_user():
-    """
-    This function is used to authenticate a user.
-    """
+    """
+    Authenticate a user.
+
+    Returns:
+        output: The result of the user authentication.
+    """
+    try:
        output = await authenticate_user_method()
+    except Exception as e:
+        # Handle the exception (e.g., log the error, raise a custom exception, etc.)
+        raise e
    return output
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def authenticate_user():
"""
This function is used to authenticate a user.
"""
output = await authenticate_user_method()
return output
async def authenticate_user():
"""
Authenticate a user.
Returns:
output: The result of the user authentication.
"""
try:
output = await authenticate_user_method()
except Exception as e:
# Handle the exception (e.g., log the error, raise a custom exception, etc.)
raise e
return output

152 changes: 89 additions & 63 deletions cognee/api/v1/cognify/cognify_v2.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
import asyncio
import hashlib
import logging
import uuid
from typing import Union

from fastapi_users import fastapi_users
from sqlalchemy.ext.asyncio import AsyncSession

from cognee.infrastructure.databases.graph import get_graph_config
from cognee.infrastructure.databases.relational.user_authentication.authentication_db import async_session_maker
from cognee.infrastructure.databases.relational.user_authentication.users import has_permission_document, \
get_user_permissions, get_async_session_context
# from cognee.infrastructure.databases.relational.user_authentication.authentication_db import async_session_maker
# from cognee.infrastructure.databases.relational.user_authentication.users import get_user_permissions, fastapi_users
from cognee.modules.cognify.config import get_cognify_config
from cognee.infrastructure.databases.relational.config import get_relationaldb_config
from cognee.modules.data.processing.document_types.AudioDocument import AudioDocument
Expand All @@ -25,7 +35,13 @@

update_status_lock = asyncio.Lock()

async def cognify(datasets: Union[str, list[str]] = None, root_node_id: str = None):
class PermissionDeniedException(Exception):
def __init__(self, message: str):
self.message = message
super().__init__(self.message)

async def cognify(datasets: Union[str, list[str]] = None, root_node_id: str = None, user_id:str="default_user"):

relational_config = get_relationaldb_config()
db_engine = relational_config.database_engine
create_task_status_table()
Expand All @@ -35,68 +51,78 @@ async def cognify(datasets: Union[str, list[str]] = None, root_node_id: str = No


async def run_cognify_pipeline(dataset_name: str, files: list[dict]):
async with update_status_lock:
task_status = get_task_status([dataset_name])

if dataset_name in task_status and task_status[dataset_name] == "DATASET_PROCESSING_STARTED":
logger.info(f"Dataset {dataset_name} is being processed.")
return

update_task_status(dataset_name, "DATASET_PROCESSING_STARTED")
try:
cognee_config = get_cognify_config()
graph_config = get_graph_config()
root_node_id = None

if graph_config.infer_graph_topology and graph_config.graph_topology_task:
from cognee.modules.topology.topology import TopologyEngine
topology_engine = TopologyEngine(infer=graph_config.infer_graph_topology)
root_node_id = await topology_engine.add_graph_topology(files = files)
elif graph_config.infer_graph_topology and not graph_config.infer_graph_topology:
from cognee.modules.topology.topology import TopologyEngine
topology_engine = TopologyEngine(infer=graph_config.infer_graph_topology)
await topology_engine.add_graph_topology(graph_config.topology_file_path)
elif not graph_config.graph_topology_task:
root_node_id = "ROOT"

tasks = [
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(establish_graph_topology, topology_model = KnowledgeGraph), # Set the graph topology for the document chunk data
Task(expand_knowledge_graph, graph_model = KnowledgeGraph), # Generate knowledge graphs from the document chunks and attach it to chunk nodes
Task(filter_affected_chunks, collection_name = "chunks"), # Find all affected chunks, so we don't process unchanged chunks
Task(
save_data_chunks,
collection_name = "chunks",
), # Save the document chunks in vector db and as nodes in graph db (connected to the document node and between each other)
run_tasks_parallel([
Task(
summarize_text_chunks,
summarization_model = cognee_config.summarization_model,
collection_name = "chunk_summaries",
), # Summarize the document chunks
Task(
classify_text_chunks,
classification_model = cognee_config.classification_model,
),
]),
Task(remove_obsolete_chunks), # Remove the obsolete document chunks.
]

pipeline = run_tasks(tasks, [
PdfDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "pdf" else
AudioDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "audio" else
ImageDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "image" else
TextDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"])
for file in files
])

async for result in pipeline:
print(result)

update_task_status(dataset_name, "DATASET_PROCESSING_FINISHED")
except Exception as error:
update_task_status(dataset_name, "DATASET_PROCESSING_ERROR")
raise error

for file in files:
file["id"] = str(uuid.uuid4())
file["name"] = file["name"].replace(" ", "_")

async with get_async_session_context() as session:

out = await has_permission_document(user_id, file["id"], "write", session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove assignment to unused variable out.

The variable out is assigned but never used.

-                out = await has_permission_document(user_id, file["id"], "write", session)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
out = await has_permission_document(user_id, file["id"], "write", session)
await has_permission_document(user_id, file["id"], "write", session)
Tools
Ruff

61-61: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)



async with update_status_lock:
task_status = get_task_status([dataset_name])

if dataset_name in task_status and task_status[dataset_name] == "DATASET_PROCESSING_STARTED":
logger.info(f"Dataset {dataset_name} is being processed.")
return

update_task_status(dataset_name, "DATASET_PROCESSING_STARTED")
try:
cognee_config = get_cognify_config()
graph_config = get_graph_config()
root_node_id = None

if graph_config.infer_graph_topology and graph_config.graph_topology_task:
from cognee.modules.topology.topology import TopologyEngine
topology_engine = TopologyEngine(infer=graph_config.infer_graph_topology)
root_node_id = await topology_engine.add_graph_topology(files = files)
elif graph_config.infer_graph_topology and not graph_config.infer_graph_topology:
from cognee.modules.topology.topology import TopologyEngine
topology_engine = TopologyEngine(infer=graph_config.infer_graph_topology)
await topology_engine.add_graph_topology(graph_config.topology_file_path)
elif not graph_config.graph_topology_task:
root_node_id = "ROOT"

tasks = [
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined variables hashed_user_id and user_permissions.

The variables hashed_user_id and user_permissions are used but not defined in the provided code. Ensure that these variables are correctly defined and passed to the Task instances.

-                        Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
+                        Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Tools
Ruff

90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)

Task(establish_graph_topology, topology_model = KnowledgeGraph), # Set the graph topology for the document chunk data
Task(expand_knowledge_graph, graph_model = KnowledgeGraph), # Generate knowledge graphs from the document chunks and attach it to chunk nodes
Task(filter_affected_chunks, collection_name = "chunks"), # Find all affected chunks, so we don't process unchanged chunks
Task(
save_data_chunks,
collection_name = "chunks",
), # Save the document chunks in vector db and as nodes in graph db (connected to the document node and between each other)
run_tasks_parallel([
Task(
summarize_text_chunks,
summarization_model = cognee_config.summarization_model,
collection_name = "chunk_summaries",
), # Summarize the document chunks
Task(
classify_text_chunks,
classification_model = cognee_config.classification_model,
),
]),
Task(remove_obsolete_chunks), # Remove the obsolete document chunks.
]

pipeline = run_tasks(tasks, [
PdfDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "pdf" else
AudioDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "audio" else
ImageDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "image" else
TextDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"])
for file in files
])

async for result in pipeline:
print(result)

update_task_status(dataset_name, "DATASET_PROCESSING_FINISHED")
except Exception as error:
update_task_status(dataset_name, "DATASET_PROCESSING_ERROR")
raise error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error handling in the task execution segment.

Avoid using bare except statements to catch specific exceptions.

-                except Exception as error:
+                except Exception as e:
+                    logger.error(f"Error during task execution: {e}")

Committable suggestion was skipped due to low confidence.

Tools
Ruff

90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)



existing_datasets = db_engine.get_datasets()
Expand Down
1 change: 1 addition & 0 deletions cognee/api/v1/create_user/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .create_user import create_user
22 changes: 22 additions & 0 deletions cognee/api/v1/create_user/create_user.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from cognee.infrastructure.databases.relational.user_authentication.users import create_user_method



async def create_user(email: str, password: str, is_superuser: bool = False):
output = await create_user_method(email=email, password=password, is_superuser=is_superuser)
return output


if __name__ == "__main__":
import asyncio
# Define an example user
example_email = "[email protected]"
example_password = "securepassword123"
example_is_superuser = False

# Create an event loop and run the create_user function
loop = asyncio.get_event_loop()
result = loop.run_until_complete(create_user(example_email, example_password, example_is_superuser))

# Print the result
print(result)
1 change: 1 addition & 0 deletions cognee/api/v1/reset_user_password/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .create_user import create_user
1 change: 1 addition & 0 deletions cognee/api/v1/verify_user_token/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .verify_user_token import verify_user_token
8 changes: 8 additions & 0 deletions cognee/api/v1/verify_user_token/verify_user_token.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from cognee.infrastructure.databases.relational.user_authentication.users import user_check_token



async def verify_user_token(token: str):

output = await user_check_token(token=token)
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and a docstring.

The function lacks error handling and documentation. Consider adding a try-except block and a docstring.

+ """
+ Verify the user token.
+
+ Args:
+     token (str): The user token to verify.
+
+ Returns:
+     output: The result of the token verification.
+ """
async def verify_user_token(token: str):
+    try:
        output = await user_check_token(token=token)
+    except Exception as e:
+        # Handle the exception (e.g., log the error, raise a custom exception, etc.)
+        raise e
    return output

Committable suggestion was skipped due to low confidence.

4 changes: 2 additions & 2 deletions cognee/infrastructure/data/models/Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
from datetime import datetime, timezone
from sqlalchemy.orm import relationship, MappedColumn, Mapped
from sqlalchemy import Column, String, DateTime, UUID, Text, JSON
from cognee.infrastructure.databases.relational import ModelBase
from cognee.infrastructure.databases.relational import Base
from .DatasetData import DatasetData

class Data(ModelBase):
class Data(Base):
__tablename__ = "data"

id = Column(UUID, primary_key = True)
Expand Down
Loading