Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added boilerplate for user management #123

Merged
merged 48 commits into from
Aug 6, 2024
Merged

added boilerplate for user management #123

merged 48 commits into from
Aug 6, 2024

Conversation

Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Jul 21, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a new job for setting up Docker Buildx in CI/CD workflows for efficient multi-platform builds.
    • Added a PostgreSQL service to the CI/CD workflow, improving the testing environment for database interactions.
    • Implemented a new step to ensure PostgreSQL is fully initialized before running tests, enhancing reliability.
  • Bug Fixes

    • Improved error handling and logging for user-related actions.
  • Documentation

    • Updated user management API documentation to reflect new endpoints and functionalities.

Copy link
Contributor

coderabbitai bot commented Jul 21, 2024

Warning

Rate limit exceeded

@borisarzentar has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 48 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between 6d38bcd and e3c3f35.

Walkthrough

The recent enhancements to the FastAPI application significantly improve user management, database interactions, and overall application performance. New authentication routes, along with the SqlAlchemyAdapter, streamline user operations and database handling. Asynchronous processing and user-specific permission checks offer a robust framework for secure data interactions. These changes collectively elevate the functionalities, making the application more adaptable and efficient in managing user data and permissions.

Changes

Files Change Summary
cognee/api/client.py Introduced new routes for authentication and user management; added an asynchronous lifespan context manager for database initialization.
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py Added SqlAlchemyAdapter class to manage databases asynchronously, supporting DuckDB and PostgreSQL with flexible session handling.
cognee/api/v1/search/search.py Enhanced search function to include a user parameter for user-specific operations, improving the relevance of search results.
cognee/modules/users/permissions/methods/check_permissions_on_documents.py, .../get_document_ids_for_user.py Added asynchronous permission management functions for document access, enhancing security and user-specific operations.
.github/workflows/test_common.yml New job added for Docker setup and PostgreSQL service configured for robust testing; includes readiness checks for PostgreSQL.
.github/workflows/test_neo4j.yml, .github/workflows/test_qdrant.yml, .github/workflows/test_weaviate.yml Integrated PostgreSQL service configurations into GitHub Actions for a more robust testing environment, including readiness checks for services.
pyproject.toml Updated dependency versions for stability and added new libraries for user management and asynchronous database interactions.

Poem

In the meadow where bunnies play,
New routes for users brighten the day!
With hops and jumps, we now can see,
Secure access, oh joy, oh glee!
Authentication’s a breeze, we cheer,
For every user, friends are near! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 86c7aa2 and 866270b.

Files selected for processing (4)
  • cognee/api/client.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/schemas.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
Additional context used
Ruff
cognee/api/client.py

46-46: Module level import not at top of file

(E402)


48-48: Module level import not at top of file

(E402)


50-50: Module level import not at top of file

(E402)


51-51: Module level import not at top of file

(E402)


52-52: Module level import not at top of file

(E402)


114-114: Redefinition of unused authenticated_route from line 80

(F811)

Additional comments not posted (17)
cognee/infrastructure/databases/relational/user_authentication/schemas.py (3)

6-7: LGTM!

The class UserRead correctly inherits from schemas.BaseUser[uuid.UUID].


10-11: LGTM!

The class UserCreate correctly inherits from schemas.BaseUserCreate.


14-15: LGTM!

The class UserUpdate correctly inherits from schemas.BaseUserUpdate.

cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (5)

11-12: LGTM!

The class Base correctly inherits from DeclarativeBase.


15-16: LGTM!

The class User correctly inherits from SQLAlchemyBaseUserTableUUID and Base.


25-27: LGTM!

The function create_db_and_tables correctly creates the database and tables using SQLAlchemy.


30-32: LGTM!

The function get_async_session correctly provides an async session for database operations.


35-36: LGTM!

The function get_user_db correctly provides a user database using SQLAlchemy.

cognee/infrastructure/databases/relational/user_authentication/users.py (6)

18-33: LGTM!

The class UserManager correctly inherits from UUIDIDMixin and BaseUserManager. The methods for user registration, password reset, and verification are correctly implemented.


36-37: LGTM!

The function get_user_manager correctly provides a user manager using UserManager.


43-44: LGTM!

The function get_jwt_strategy correctly provides a JWT strategy for authentication.


47-51: LGTM!

The function auth_backend correctly sets up an authentication backend using JWT strategy.


53-53: LGTM!

The function fastapi_users correctly sets up FastAPI users using UserManager and auth_backend.


55-55: LGTM!

The function current_active_user correctly sets up the current active user using fastapi_users.

cognee/api/client.py (3)

54-76: LGTM! Verify router functionality.

The routers are correctly set up and prefixed. Ensure that they function as expected.


79-81: LGTM! Verify function usage.

The function correctly uses dependency injection for current_active_user. Ensure that the function is used as intended.


83-87: LGTM! Verify database setup.

The context manager correctly sets up the database and tables. Ensure that the database setup is functioning as expected.

Comment on lines 46 to 52
from contextlib import asynccontextmanager

from fastapi import Depends, FastAPI

from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Committable suggestion was skipped due to low confidence.

Tools
Ruff

46-46: Module level import not at top of file

(E402)


48-48: Module level import not at top of file

(E402)


50-50: Module level import not at top of file

(E402)


51-51: Module level import not at top of file

(E402)


52-52: Module level import not at top of file

(E402)

cognee/api/client.py Outdated Show resolved Hide resolved
Copy link

gitguardian bot commented Jul 21, 2024

⚠️ GitGuardian has uncovered 16 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9573981 Triggered Generic Password 77e8c1b cognee/infrastructure/databases/relational/config.py View secret
9573981 Triggered Generic Password e785b30 docker-compose.yml View secret
9573981 Triggered Generic Password e785b30 cognee/infrastructure/databases/relational/config.py View secret
9573981 Triggered Generic Password 401167b cognee/infrastructure/databases/relational/config.py View secret
9573981 Triggered Generic Password 896a2ce .github/workflows/test_common.yml View secret
9573981 Triggered Generic Password 896a2ce .github/workflows/test_qdrant.yml View secret
9573981 Triggered Generic Password 896a2ce .github/workflows/test_weaviate.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_weaviate.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_qdrant.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_qdrant.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_neo4j.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_common.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_weaviate.yml View secret
9573981 Triggered Generic Password e492a18 .github/workflows/test_common.yml View secret
13323556 Triggered Username Password 73dd6c2 .github/workflows/test_neo4j.yml View secret
13323556 Triggered Username Password 4749995 .github/workflows/test_neo4j.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 866270b and 77e8c1b.

Files selected for processing (4)
  • cognee/infrastructure/databases/relational/config.py (2 hunks)
  • cognee/infrastructure/databases/relational/create_relational_engine.py (1 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py
Additional context used
Ruff
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

94-98: f-string without any placeholders

Remove extraneous f prefix

(F541)


102-102: f-string without any placeholders

Remove extraneous f prefix

(F541)

Additional comments not posted (13)
cognee/infrastructure/databases/relational/create_relational_engine.py (1)

8-10: Enum DBProvider looks good.

The enum correctly defines the database types DUCKDB and POSTGRES.

cognee/infrastructure/databases/relational/config.py (1)

14-15: Configuration changes look good.

The db_provider configuration option is correctly added and the create_relational_engine function is updated accordingly. The to_dict method is also updated to include the new db_provider attribute.

Also applies to: 34-34

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (11)

5-10: Initialization method looks good.

The __init__ method correctly initializes the SQLAlchemyAdapter with the provided db_type, db_path, and db_name.


11-20: Dataset retrieval method looks good.

The get_datasets method correctly retrieves and filters distinct schema names from the database.


22-25: File metadata retrieval method looks good.

The get_files_metadata method correctly retrieves file metadata from the specified dataset.


27-31: Table creation method looks good.

The create_table method correctly creates a table with the specified schema name, table name, and table configuration.


33-35: Table deletion method looks good.

The delete_table method correctly deletes the specified table.


37-42: Data insertion method looks good.

The insert_data method correctly inserts data into the specified table.


44-59: Data retrieval method looks good.

The get_data method correctly retrieves data from the specified table and applies filters if provided.


61-64: Query execution method looks good.

The execute_query method correctly executes a raw SQL query.


66-80: Data loading method looks good.

The load_cognify_data method correctly loads data into the cognify table.


106-119: Data deletion method looks good.

The delete_cognify_data method correctly deletes all data from the cognify table and drops the table.


121-125: Database deletion method looks good.

The delete_database method correctly deletes the database file and its write-ahead log file.

Comment on lines 14 to 33
def create_relational_engine(db_path: str, db_name: str, db_provider:str):
LocalStorage.ensure_directory_exists(db_path)

return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
llm_config = get_relationaldb_config()

provider = DBProvider(llm_config.llm_provider)


if provider == DBProvider.DUCKDB:

return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for invalid llm_config.llm_provider.

The function does not handle the case where llm_config.llm_provider is not a valid DBProvider. Consider adding error handling for this scenario.

+    try:
+        provider = DBProvider(llm_config.llm_provider)
+    except ValueError:
+        raise ValueError(f"Invalid database provider: {llm_config.llm_provider}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def create_relational_engine(db_path: str, db_name: str, db_provider:str):
LocalStorage.ensure_directory_exists(db_path)
return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
llm_config = get_relationaldb_config()
provider = DBProvider(llm_config.llm_provider)
if provider == DBProvider.DUCKDB:
return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
)
def create_relational_engine(db_path: str, db_name: str, db_provider:str):
LocalStorage.ensure_directory_exists(db_path)
llm_config = get_relationaldb_config()
try:
provider = DBProvider(llm_config.llm_provider)
except ValueError:
raise ValueError(f"Invalid database provider: {llm_config.llm_provider}")
if provider == DBProvider.DUCKDB:
return DuckDBAdapter(
db_name = db_name,
db_path = db_path,
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
)

cognee/api/client.py Outdated Show resolved Hide resolved

def delete_cognify_data(self):
with self.engine.connect() as connection:
connection.execute(text("""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a table here and delete it immediately in the next line. Should we check if table exists and delete if it does?

document_id_target STRING NULL
);
"""))
query = text(f"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just created the table this query won't return anything, so we can just set records to []. Otherwise we can execute the query.




llm_config = get_relationaldb_config()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_config -> relationaldb_config. In line 26 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this

)
from fastapi_users.db import SQLAlchemyUserDatabase

from app.db import User, get_user_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is add.db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 77e8c1b and e785b30.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (8)
  • cognee/api/client.py (3 hunks)
  • cognee/infrastructure/databases/relational/config.py (2 hunks)
  • cognee/infrastructure/databases/relational/create_relational_engine.py (1 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
  • docker-compose.yml (1 hunks)
  • pyproject.toml (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • cognee/infrastructure/databases/relational/config.py
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py
  • cognee/infrastructure/databases/relational/user_authentication/users.py
Additional context used
Ruff
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

117-121: f-string without any placeholders

Remove extraneous f prefix

(F541)


125-125: f-string without any placeholders

Remove extraneous f prefix

(F541)

cognee/api/client.py

29-29: Module level import not at top of file

(E402)


51-51: Module level import not at top of file

(E402)


53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


56-56: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


119-119: Redefinition of unused authenticated_route from line 85

(F811)

Additional comments not posted (7)
cognee/infrastructure/databases/relational/create_relational_engine.py (4)

1-2: Import statement is correct.

The Enum import is necessary for defining the DBProvider enumeration.


3-7: Import statements are correct.

The imported modules and classes are necessary for the functionality of the create_relational_engine function.


8-11: Enumeration definition is correct.

The DBProvider enumeration is correctly defined and will facilitate the selection of the appropriate database adapter.


12-16: Function signature and directory existence check are correct.

The function signature has been expanded to include additional parameters for database configuration. The LocalStorage.ensure_directory_exists method ensures the directory exists.

docker-compose.yml (2)

63-75: Service configuration for PostgreSQL is correct.

The service configuration includes necessary environment variables, volume mapping, port exposure, and network connection.


81-83: Volume declaration for PostgreSQL data is correct.

The volume declaration ensures data persistence across container restarts.

pyproject.toml (1)

72-73: Dependencies are correctly added.

The fastapi-users and asyncpg dependencies are necessary for user management and asynchronous database operations.

Comment on lines 17 to 40
if provider == DBProvider.DUCKDB:
# return DuckDBAdapter(
# db_name = db_name,
# db_path = db_path,
# )
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
db_host=db_host,
db_port=db_port,
db_user=db_user,
db_password=db_password
)
elif provider == DBProvider.POSTGRES:
return SQLAlchemyAdapter(
db_name = db_name,
db_path = db_path,
db_type = db_provider,
db_host= db_host,
db_port= db_port,
db_user= db_user,
db_password= db_password
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code and add error handling for invalid db_provider values.

The function correctly selects the database adapter based on the db_provider value. However, the commented-out code for DuckDBAdapter should be removed to maintain code cleanliness. Additionally, error handling for invalid db_provider values should be added.

-    # return DuckDBAdapter(
-    #     db_name = db_name,
-    #     db_path = db_path,
-    # )
+    try:
+        provider = DBProvider(db_provider)
+    except ValueError:
+        raise ValueError(f"Invalid database provider: {db_provider}")

Committable suggestion was skipped due to low confidence.

Comment on lines 24 to 27
async def get_async_session(self):
async_session_maker = self.Session
async with async_session_maker() as session:
yield session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    async def get_async_session(self) -> AsyncSession:
+        """
+        Yields an asynchronous SQLAlchemy session.
+        """
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def get_async_session(self):
async_session_maker = self.Session
async with async_session_maker() as session:
yield session
async def get_async_session(self) -> AsyncSession:
"""
Yields an asynchronous SQLAlchemy session.
"""
async_session_maker = self.Session
async with async_session_maker() as session:
yield session

Comment on lines 29 to 32
def get_session(self):
session_maker = self.Session
with session_maker() as session:
yield session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def get_session(self) -> sessionmaker:
+        """
+        Yields a synchronous SQLAlchemy session.
+        """

Committable suggestion was skipped due to low confidence.

Comment on lines 144 to 148
def delete_database(self):
from cognee.infrastructure.files.storage import LocalStorage
LocalStorage.remove(self.db_location)
if LocalStorage.file_exists(self.db_location + ".wal"):
LocalStorage.remove(self.db_location + ".wal")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def delete_database(self) -> None:
+        """
+        Deletes the database file and its write-ahead log file.
+        """

Committable suggestion was skipped due to low confidence.

Comment on lines 129 to 142
def delete_cognify_data(self):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
connection.execute(text("DELETE FROM cognify;"))
connection.execute(text("DROP TABLE cognify;"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and remove redundant table creation logic.

Include type hints and docstrings for better readability and maintainability. Remove redundant table creation logic before deletion.

+    def delete_cognify_data(self) -> None:
+        """
+        Deletes data from the `cognify` table.
+        """
-            connection.execute(text("""
-                CREATE TABLE IF NOT EXISTS cognify (
-                    document_id STRING,
-                    layer_id STRING,
-                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-                    updated_at TIMESTAMP DEFAULT NULL,
-                    processed BOOLEAN DEFAULT FALSE,
-                    document_id_target STRING NULL
-                );
-            """))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_cognify_data(self):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
connection.execute(text("DELETE FROM cognify;"))
connection.execute(text("DROP TABLE cognify;"))
def delete_cognify_data(self) -> None:
"""
Deletes data from the `cognify` table.
"""
with self.engine.connect() as connection:
connection.execute(text("DELETE FROM cognify;"))
connection.execute(text("DROP TABLE cognify;"))

Comment on lines 34 to 43
def get_datasets(self):
with self.engine.connect() as connection:
result = connection.execute(text("SELECT DISTINCT schema_name FROM information_schema.tables;"))
tables = [row['schema_name'] for row in result]
return list(
filter(
lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
tables
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and improve filtering logic.

Include type hints and docstrings for better readability and maintainability. Improve filtering logic for readability.

+    def get_datasets(self) -> List[str]:
+        """
+        Retrieves distinct schema names from the database, excluding staging schemas and 'cognee'.
+        """
-        return list(
-            filter(
-                lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
-                tables
-            )
-        )
+        return [schema_name for schema_name in tables if not schema_name.endswith("staging") and schema_name != "cognee"]
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_datasets(self):
with self.engine.connect() as connection:
result = connection.execute(text("SELECT DISTINCT schema_name FROM information_schema.tables;"))
tables = [row['schema_name'] for row in result]
return list(
filter(
lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
tables
)
)
def get_datasets(self) -> List[str]:
"""
Retrieves distinct schema names from the database, excluding staging schemas and 'cognee'.
"""
with self.engine.connect() as connection:
result = connection.execute(text("SELECT DISTINCT schema_name FROM information_schema.tables;"))
tables = [row['schema_name'] for row in result]
return [schema_name for schema_name in tables if not schema_name.endswith("staging") and schema_name != "cognee"]

Comment on lines 89 to 103
def load_cognify_data(self, data):
metadata = MetaData()
cognify_table = Table(
'cognify', metadata,
Column('document_id', String),
Column('layer_id', String),
Column('created_at', TIMESTAMP, server_default=text('CURRENT_TIMESTAMP')),
Column('updated_at', TIMESTAMP, nullable=True, default=None),
Column('processed', Boolean, default=False),
Column('document_id_target', String, nullable=True)
)
metadata.create_all(self.engine)
insert_query = cognify_table.insert().values(document_id=text(':document_id'), layer_id=text(':layer_id'))
with self.engine.connect() as connection:
connection.execute(insert_query, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def load_cognify_data(self, data: List[Dict[str, Any]]) -> None:
+        """
+        Loads data into the `cognify` table.
+        """

Committable suggestion was skipped due to low confidence.

Comment on lines 7 to 21
def __init__(self, db_type: str, db_path: str, db_name: str, db_user:str, db_password:str, db_host:str, db_port:str):
self.db_location = os.path.abspath(os.path.join(db_path, db_name))
# self.engine = create_engine(f"{db_type}:///{self.db_location}")
if db_type == "duckdb":
self.engine = create_engine(f"duckdb:///{self.db_location}")
self.Session = sessionmaker(bind=self.engine)

else:
print("Name: ", db_name)
print("User: ", db_user)
print("Password: ", db_password)
print("Host: ", db_host)
print("Port: ", db_port)
self.engine = create_async_engine(f"postgresql+asyncpg://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}")
self.Session = sessionmaker(bind=self.engine, class_=AsyncSession, expire_on_commit=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging instead of print statements and conditionally construct self.db_location.

Replace print statements with logging for better maintainability. Construct self.db_location only if the database type is DuckDB.

-        self.db_location = os.path.abspath(os.path.join(db_path, db_name))
+        if db_type == "duckdb":
+            self.db_location = os.path.abspath(os.path.join(db_path, db_name))
-            print("Name: ", db_name)
-            print("User: ", db_user)
-            print("Password: ", db_password)
-            print("Host: ", db_host)
-            print("Port: ", db_port)
+            logger.info(f"Name: {db_name}")
+            logger.info(f"User: {db_user}")
+            logger.info(f"Password: {db_password}")
+            logger.info(f"Host: {db_host}")
+            logger.info(f"Port: {db_port}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, db_type: str, db_path: str, db_name: str, db_user:str, db_password:str, db_host:str, db_port:str):
self.db_location = os.path.abspath(os.path.join(db_path, db_name))
# self.engine = create_engine(f"{db_type}:///{self.db_location}")
if db_type == "duckdb":
self.engine = create_engine(f"duckdb:///{self.db_location}")
self.Session = sessionmaker(bind=self.engine)
else:
print("Name: ", db_name)
print("User: ", db_user)
print("Password: ", db_password)
print("Host: ", db_host)
print("Port: ", db_port)
self.engine = create_async_engine(f"postgresql+asyncpg://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}")
self.Session = sessionmaker(bind=self.engine, class_=AsyncSession, expire_on_commit=False)
def __init__(self, db_type: str, db_path: str, db_name: str, db_user:str, db_password:str, db_host:str, db_port:str):
if db_type == "duckdb":
self.db_location = os.path.abspath(os.path.join(db_path, db_name))
self.engine = create_engine(f"duckdb:///{self.db_location}")
self.Session = sessionmaker(bind=self.engine)
else:
logger.info(f"Name: {db_name}")
logger.info(f"User: {db_user}")
logger.info(f"Password: {db_password}")
logger.info(f"Host: {db_host}")
logger.info(f"Port: {db_port}")
self.engine = create_async_engine(f"postgresql+asyncpg://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}")
self.Session = sessionmaker(bind=self.engine, class_=AsyncSession, expire_on_commit=False)

Comment on lines 84 to 87
def execute_query(self, query):
with self.engine.connect() as connection:
result = connection.execute(text(query))
return [dict(row) for row in result]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def execute_query(self, query: str) -> List[Dict[str, Any]]:
+        """
+        Executes the provided SQL query.
+        """
-            result = connection.execute(text(query))
+            result = connection.execute(text(":query"), {'query': query})

Committable suggestion was skipped due to low confidence.

Comment on lines 105 to 127
def fetch_cognify_data(self, excluded_document_id: str):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
query = text(f"""
SELECT document_id, layer_id, created_at, updated_at, processed
FROM cognify
WHERE document_id != :excluded_document_id AND processed = FALSE;
""")
records = connection.execute(query, {'excluded_document_id': excluded_document_id}).fetchall()
if records:
document_ids = tuple(record['document_id'] for record in records)
update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
connection.execute(update_query, {'document_ids': document_ids})
return [dict(record) for record in records]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints, docstrings, and remove extraneous f prefixes.

Include type hints and docstrings for better readability and maintainability. Remove extraneous f prefixes from f-strings without placeholders.

+    def fetch_cognify_data(self, excluded_document_id: str) -> List[Dict[str, Any]]:
+        """
+        Fetches data from the `cognify` table based on the provided `excluded_document_id`.
+        """
-            query = text(f"""
-                SELECT document_id, layer_id, created_at, updated_at, processed
-                FROM cognify
-                WHERE document_id != :excluded_document_id AND processed = FALSE;
-            """)
+            query = text("""
+                SELECT document_id, layer_id, created_at, updated_at, processed
+                FROM cognify
+                WHERE document_id != :excluded_document_id AND processed = FALSE;
+            """)
-                update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
+                update_query = text("UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def fetch_cognify_data(self, excluded_document_id: str):
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
query = text(f"""
SELECT document_id, layer_id, created_at, updated_at, processed
FROM cognify
WHERE document_id != :excluded_document_id AND processed = FALSE;
""")
records = connection.execute(query, {'excluded_document_id': excluded_document_id}).fetchall()
if records:
document_ids = tuple(record['document_id'] for record in records)
update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
connection.execute(update_query, {'document_ids': document_ids})
return [dict(record) for record in records]
def fetch_cognify_data(self, excluded_document_id: str) -> List[Dict[str, Any]]:
"""
Fetches data from the `cognify` table based on the provided `excluded_document_id`.
"""
with self.engine.connect() as connection:
connection.execute(text("""
CREATE TABLE IF NOT EXISTS cognify (
document_id STRING,
layer_id STRING,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT NULL,
processed BOOLEAN DEFAULT FALSE,
document_id_target STRING NULL
);
"""))
query = text("""
SELECT document_id, layer_id, created_at, updated_at, processed
FROM cognify
WHERE document_id != :excluded_document_id AND processed = FALSE;
""")
records = connection.execute(query, {'excluded_document_id': excluded_document_id}).fetchall()
if records:
document_ids = tuple(record['document_id'] for record in records)
update_query = text("UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
connection.execute(update_query, {'document_ids': document_ids})
return [dict(record) for record in records]
Tools
Ruff

117-121: f-string without any placeholders

Remove extraneous f prefix

(F541)


125-125: f-string without any placeholders

Remove extraneous f prefix

(F541)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (1)
cognee/api/client.py (1)

Line range hint 31-59:
Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 91-123

Tools
Ruff

31-31: Module level import not at top of file

(E402)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e785b30 and dab5c87.

Files selected for processing (4)
  • cognee/api/client.py (4 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/routers.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py
Additional context used
Ruff
cognee/infrastructure/databases/relational/user_authentication/users.py

12-12: Redefinition of unused Depends from line 4

Remove definition: Depends

(F811)

cognee/api/client.py

31-31: Module level import not at top of file

(E402)


53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)


122-122: Redefinition of unused authenticated_route from line 88

(F811)

Additional comments not posted (3)
cognee/infrastructure/databases/relational/user_authentication/users.py (3)

37-38: LGTM!

The function get_user_manager is correctly implemented.


44-45: LGTM!

The function get_jwt_strategy is correctly implemented.


64-70: LGTM!

The function has_permission is correctly implemented.

Comment on lines 8 to 18
@permission_router.post("/groups/{group_id}/permissions")
async def assign_permission_to_group(group_id: int, permission: str, db: Session = Depends(get_user_db)):
group = db.query(Group).filter(Group.id == group_id).first()
if not group:
raise HTTPException(status_code=404, detail="Group not found")
perm = db.query(Permission).filter(Permission.name == permission).first()
if not perm:
perm = Permission(name=permission)
db.add(perm)
group.permissions.append(perm)
db.commit()
return {"msg": "Permission added to group"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and logging.

The function should handle database operation errors and log the actions for better traceability.

import logging

logger = logging.getLogger(__name__)

@permission_router.post("/groups/{group_id}/permissions")
async def assign_permission_to_group(group_id: int, permission: str, db: Session = Depends(get_user_db)):
    try:
        group = db.query(Group).filter(Group.id == group_id).first()
        if not group:
            raise HTTPException(status_code=404, detail="Group not found")
        perm = db.query(Permission).filter(Permission.name == permission).first()
        if not perm:
            perm = Permission(name=permission)
            db.add(perm)
        group.permissions.append(perm)
        db.commit()
        logger.info(f"Permission '{permission}' added to group '{group_id}'")
        return {"msg": "Permission added to group"}
    except Exception as e:
        db.rollback()
        logger.error(f"Error assigning permission to group: {e}")
        raise HTTPException(status_code=500, detail="Internal Server Error")

Comment on lines 21 to 29
@permission_router.post("/users/{user_id}/groups")
async def add_user_to_group(user_id: int, group_id: int, db: Session = Depends(get_user_db)):
user = db.query(User).filter(User.id == user_id).first()
group = db.query(Group).filter(Group.id == group_id).first()
if not user or not group:
raise HTTPException(status_code=404, detail="User or group not found")
user.groups.append(group)
db.commit()
return {"msg": "User added to group"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and logging.

The function should handle database operation errors and log the actions for better traceability.

import logging

logger = logging.getLogger(__name__)

@permission_router.post("/users/{user_id}/groups")
async def add_user_to_group(user_id: int, group_id: int, db: Session = Depends(get_user_db)):
    try:
        user = db.query(User).filter(User.id == user_id).first()
        group = db.query(Group).filter(Group.id == group_id).first()
        if not user or not group:
            raise HTTPException(status_code=404, detail="User or group not found")
        user.groups.append(group)
        db.commit()
        logger.info(f"User '{user_id}' added to group '{group_id}'")
        return {"msg": "User added to group"}
    except Exception as e:
        db.rollback()
        logger.error(f"Error adding user to group: {e}")
        raise HTTPException(status_code=500, detail="Internal Server Error")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Outside diff range, codebase verification and nitpick comments (1)
cognee/infrastructure/data/models/Dataset.py (1)

5-5: Verify the implications of changing the base class.

There are still several references to ModelBase in the codebase, indicating that the transition to Base is incomplete. Ensure all references to ModelBase are updated to Base and that all necessary migrations are handled.

  • cognee/infrastructure/pipeline/models/Operation.py
  • cognee/modules/pipelines/models/Task.py
  • cognee/modules/pipelines/models/Pipeline.py
  • cognee/modules/pipelines/models/PipelineTask.py
  • cognee/infrastructure/databases/relational/sqlite/SqliteEngine.py
Analysis chain

Verify the implications of changing the base class.

The change from ModelBase to Base might have significant implications on how the Dataset class interacts with the database. Ensure that this change is compatible with the rest of the codebase and that all necessary migrations are handled.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage and compatibility of the new base class `Base` in the codebase.

# Test: Search for the `Dataset` class usage. Expect: No issues with the new base class.
rg --type python -A 5 $'class Dataset(Base)'

Length of output: 76


Script:

#!/bin/bash
# Description: Verify the usage and compatibility of the new base class `Base` in the codebase.

# Test: Search for the `Dataset` class usage. Expect: No issues with the new base class.
rg --type py -A 5 $'class Dataset(Base)'

Length of output: 41


Script:

#!/bin/bash
# Description: Verify the current base class used by the `Dataset` class in the codebase.

# Test: Search for the `Dataset` class definition. Expect: Identify the current base class used.
rg --type py -A 5 $'class Dataset'

Length of output: 963


Script:

#!/bin/bash
# Description: Verify that there are no remaining references to `ModelBase` in the codebase.

# Test: Search for any references to `ModelBase`. Expect: No references found.
rg --type py $'ModelBase'

Length of output: 937

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dab5c87 and 14e1eba.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (11)
  • cognee/api/client.py (4 hunks)
  • cognee/infrastructure/data/models/Data.py (1 hunks)
  • cognee/infrastructure/data/models/Dataset.py (1 hunks)
  • cognee/infrastructure/data/models/DatasetData.py (1 hunks)
  • cognee/infrastructure/databases/relational/ModelBase.py (1 hunks)
  • cognee/infrastructure/databases/relational/init.py (1 hunks)
  • cognee/infrastructure/databases/relational/sqlite/SqliteEngine.py (2 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/routers.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
  • pyproject.toml (2 hunks)
Files skipped from review due to trivial changes (1)
  • cognee/infrastructure/databases/relational/sqlite/SqliteEngine.py
Files skipped from review as they are similar to previous changes (2)
  • cognee/infrastructure/databases/relational/user_authentication/routers.py
  • pyproject.toml
Additional context used
Ruff
cognee/infrastructure/databases/relational/user_authentication/users.py

12-12: Redefinition of unused Depends from line 4

Remove definition: Depends

(F811)

cognee/infrastructure/databases/relational/user_authentication/authentication_db.py

12-12: Redefinition of unused DeclarativeBase from line 7

Remove definition: DeclarativeBase

(F811)


12-12: Redefinition of unused sessionmaker from line 7

Remove definition: sessionmaker

(F811)


12-12: Redefinition of unused Session from line 7

Remove definition: Session

(F811)

cognee/api/client.py

31-31: Module level import not at top of file

(E402)


53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)


121-121: Redefinition of unused authenticated_route from line 87

(F811)

Additional comments not posted (14)
cognee/infrastructure/databases/relational/ModelBase.py (1)

2-6: LGTM! The new base class definition aligns with modern SQLAlchemy practices.

The use of DeclarativeBase is a more contemporary approach and improves maintainability.

cognee/infrastructure/databases/relational/__init__.py (1)

1-1: LGTM! The import statement change is consistent with the new base class definition.

The update ensures that the module imports the correct base class.

cognee/infrastructure/data/models/DatasetData.py (1)

4-6: LGTM! The class inheritance change is consistent with the new base class definition.

The update ensures that the DatasetData class inherits from the correct base class.

cognee/infrastructure/data/models/Data.py (1)

5-5: Verify the implications of changing the base class.

The change from ModelBase to Base might have significant implications on how the Data class interacts with the database. Ensure that this change is compatible with the rest of the codebase and that all necessary migrations are handled.

cognee/infrastructure/databases/relational/user_authentication/users.py (4)

37-38: LGTM!

The function get_user_manager correctly yields an instance of UserManager.


44-45: LGTM!

The function get_jwt_strategy correctly returns a JWTStrategy with the specified secret and lifetime.


59-63: LGTM!

The function get_user_permissions correctly retrieves and returns the permissions for a user.


65-71: LGTM!

The function has_permission correctly checks if a user has a specific permission.

cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (2)

44-57: LGTM!

The database configuration and session management are correctly implemented.


51-61: LGTM!

The utility functions for creating the database and tables, and for getting the async session and user database are correctly implemented.

cognee/api/client.py (4)

85-85: LGTM!

The permission router is correctly implemented.


Line range hint 124-139:
LGTM!

The root and health check endpoints are correctly implemented.

Tools
Ruff

53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)


121-121: Redefinition of unused authenticated_route from line 87

(F811)


Line range hint 140-261:
LGTM!

The dataset-related endpoints are correctly implemented.


Line range hint 262-349:
LGTM!

The endpoints for settings, adding data, cognitive processing, and searching are correctly implemented.

Comment on lines 14 to 22
# Association table for many-to-many relationship between users and groups
user_group = Table('user_group', Base.metadata,
Column('user_id', UUID, ForeignKey('users.id')),
Column('group_id', UUID, ForeignKey('groups.id')))

# Association table for many-to-many relationship between groups and permissions
group_permission = Table('group_permission', Base.metadata,
Column('group_id', UUID, ForeignKey('groups.id')),
Column('permission_id', UUID, ForeignKey('permissions.id')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding explicit primary keys for association tables.

While the association tables are correctly defined, adding explicit primary keys can help with performance and data integrity.

- user_group = Table('user_group', Base.metadata,
-                   Column('user_id', UUID, ForeignKey('users.id')),
-                   Column('group_id', UUID, ForeignKey('groups.id')))
+ user_group = Table('user_group', Base.metadata,
+                   Column('id', UUID, primary_key=True, index=True),
+                   Column('user_id', UUID, ForeignKey('users.id')),
+                   Column('group_id', UUID, ForeignKey('groups.id')))

- group_permission = Table('group_permission', Base.metadata,
-                         Column('group_id', UUID, ForeignKey('groups.id')),
-                         Column('permission_id', UUID, ForeignKey('permissions.id')))
+ group_permission = Table('group_permission', Base.metadata,
+                         Column('id', UUID, primary_key=True, index=True),
+                         Column('group_id', UUID, ForeignKey('groups.id')),
+                         Column('permission_id', UUID, ForeignKey('permissions.id')))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Association table for many-to-many relationship between users and groups
user_group = Table('user_group', Base.metadata,
Column('user_id', UUID, ForeignKey('users.id')),
Column('group_id', UUID, ForeignKey('groups.id')))
# Association table for many-to-many relationship between groups and permissions
group_permission = Table('group_permission', Base.metadata,
Column('group_id', UUID, ForeignKey('groups.id')),
Column('permission_id', UUID, ForeignKey('permissions.id')))
# Association table for many-to-many relationship between users and groups
user_group = Table('user_group', Base.metadata,
Column('id', UUID, primary_key=True, index=True),
Column('user_id', UUID, ForeignKey('users.id')),
Column('group_id', UUID, ForeignKey('groups.id')))
# Association table for many-to-many relationship between groups and permissions
group_permission = Table('group_permission', Base.metadata,
Column('id', UUID, primary_key=True, index=True),
Column('group_id', UUID, ForeignKey('groups.id')),
Column('permission_id', UUID, ForeignKey('permissions.id')))

Comment on lines 26 to 41
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
groups = relationship('Group', secondary=user_group, back_populates='users')

class Group(Base):
__tablename__ = 'groups'
id = Column(UUID, primary_key=True, index=True)
name = Column(String, unique=True, index=True)
users = relationship('User', secondary=user_group, back_populates='groups')
permissions = relationship('Permission', secondary=group_permission, back_populates='groups')

class Permission(Base):
__tablename__ = 'permissions'
id = Column(UUID, primary_key=True, index=True)
name = Column(String, unique=True, index=True)
groups = relationship('Group', secondary=group_permission, back_populates='permissions')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add id column to the User model.

The User model is missing the id column, which is necessary for primary key identification.

- class User(SQLAlchemyBaseUserTableUUID, Base):
-    __tablename__ = 'users'
-    groups = relationship('Group', secondary=user_group, back_populates='users')
+ class User(SQLAlchemyBaseUserTableUUID, Base):
+    __tablename__ = 'users'
+    id = Column(UUID, primary_key=True, index=True)
+    groups = relationship('Group', secondary=user_group, back_populates='users')
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
groups = relationship('Group', secondary=user_group, back_populates='users')
class Group(Base):
__tablename__ = 'groups'
id = Column(UUID, primary_key=True, index=True)
name = Column(String, unique=True, index=True)
users = relationship('User', secondary=user_group, back_populates='groups')
permissions = relationship('Permission', secondary=group_permission, back_populates='groups')
class Permission(Base):
__tablename__ = 'permissions'
id = Column(UUID, primary_key=True, index=True)
name = Column(String, unique=True, index=True)
groups = relationship('Group', secondary=group_permission, back_populates='permissions')
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
id = Column(UUID, primary_key=True, index=True)
groups = relationship('Group', secondary=user_group, back_populates='users')
class Group(Base):
__tablename__ = 'groups'
id = Column(UUID, primary_key=True, index=True)
name = Column(String, unique=True, index=True)
users = relationship('User', secondary=user_group, back_populates='groups')
permissions = relationship('Permission', secondary=group_permission, back_populates='groups')
class Permission(Base):
__tablename__ = 'permissions'
id = Column(UUID, primary_key=True, index=True)
name = Column(String, unique=True, index=True)
groups = relationship('Group', secondary=group_permission, back_populates='permissions')

@@ -13,6 +13,8 @@
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel

from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 31-31, 53-59

Comment on lines 32 to 37
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app = FastAPI(debug = os.getenv("ENV") != "prod", lifespan=lifespan)
Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield

Also applies to: 90-95

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines 86 to 89
@app.get("/authenticated-route")
async def authenticated_route(user: User = Depends(current_active_user)):
return {"message": f"Hello {user.email}!"}

Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function authenticated_route.

The function authenticated_route is defined twice, which can cause conflicts.

- @app.get("/authenticated-route")
- async def authenticated_route(user: User = Depends(current_active_user)):
-    return {"message": f"Hello {user.email}!"}

Also applies to: 120-123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Outside diff range, codebase verification and nitpick comments (2)
cognee/modules/data/processing/process_documents.py (1)

4-4: Document the new parameters.

The new parameters user and user_permissions should be documented to explain their purpose and usage.

+    """
+    Process documents and add user-specific data.
+
+    Args:
+        documents (list[Document]): List of documents to process.
+        parent_node_id (str, optional): ID of the parent node. Defaults to None.
+        user (str, optional): User information. Defaults to None.
+        user_permissions (str, optional): User permissions. Defaults to None.
+    """
cognee/api/v1/cognify/cognify_v2.py (1)

39-39: Document the new user_id parameter.

The new parameter user_id should be documented to explain its purpose and usage.

+    """
+    Cognify datasets and add user-specific data.
+
+    Args:
+        datasets (Union[str, list[str]], optional): List of datasets to process. Defaults to None.
+        root_node_id (str, optional): ID of the root node. Defaults to None.
+        user_id (str, optional): User ID. Defaults to "default_user".
+    """
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 14e1eba and 36e156e.

Files selected for processing (3)
  • cognee/api/v1/cognify/cognify_v2.py (3 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
  • cognee/modules/data/processing/process_documents.py (2 hunks)
Additional context used
Ruff
cognee/infrastructure/databases/relational/user_authentication/authentication_db.py

12-12: Redefinition of unused DeclarativeBase from line 7

Remove definition: DeclarativeBase

(F811)


12-12: Redefinition of unused sessionmaker from line 7

Remove definition: sessionmaker

(F811)


12-12: Redefinition of unused Session from line 7

Remove definition: Session

(F811)

Additional comments not posted (4)
cognee/api/v1/cognify/cognify_v2.py (4)

1-11: Imports look good.

The imports include necessary modules for user authentication and permission management.


34-38: PermissionDeniedException class looks good.

The class is correctly defined and necessary for handling permission errors.


40-47: Permission check logic looks good.

The permission check logic ensures that only authorized users can proceed with the operations.


83-83: Pass user information to tasks.

The tasks now include user information, which is necessary for user-specific processing.

Comment on lines +19 to +21
document_dict = document.to_dict()
document_dict["user"] = user
document_dict["user_permissions"] = user_permissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the unused document_dict.

The new dictionary document_dict is created but not used in the nodes list. Use document_dict instead of calling document.to_dict() again.

-            nodes.append((str(document.id), document.to_dict()))
+            nodes.append((str(document.id), document_dict))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
document_dict = document.to_dict()
document_dict["user"] = user
document_dict["user_permissions"] = user_permissions
document_dict = document.to_dict()
document_dict["user"] = user
document_dict["user_permissions"] = user_permissions
nodes.append((str(document.id), document_dict))

Comment on lines 26 to 28
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
groups = relationship('Group', secondary=user_group, back_populates='users')
Copy link
Contributor

@coderabbitai coderabbitai bot Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add id column to the User model.

The User model is missing the id column, which is necessary for primary key identification.

- class User(SQLAlchemyBaseUserTableUUID, Base):
-    __tablename__ = 'users'
-    groups = relationship('Group', secondary=user_group, back_populates='users')
+ class User(SQLAlchemyBaseUserTableUUID, Base):
+    __tablename__ = 'users'
+    id = Column(UUID, primary_key=True, index=True)
+    groups = relationship('Group', secondary=user_group, back_populates='users')
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
groups = relationship('Group', secondary=user_group, back_populates='users')
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
id = Column(UUID, primary_key=True, index=True)
groups = relationship('Group', secondary=user_group, back_populates='users')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already does this when it inherits from SQLAlchemyBaseUserTableUUID

Comment on lines 1 to 12
from typing import AsyncGenerator, Generator

from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
from sqlalchemy import Column, String, ForeignKey, Table, Integer


from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redefined imports and organize them.

The imports DeclarativeBase, sessionmaker, and Session are redefined on line 12. Additionally, imports should be organized according to PEP 8 guidelines.

- from typing import AsyncGenerator, Generator
- from fastapi import Depends
- from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
- from sqlalchemy import create_engine, UUID
- from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
- from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
- from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
- from sqlalchemy import Column, String, ForeignKey, Table, Integer
- from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
+ from typing import AsyncGenerator, Generator
+ from fastapi import Depends
+ from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
+ from sqlalchemy import create_engine, UUID, Column, String, ForeignKey, Table, Integer
+ from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
+ from sqlalchemy.orm import sessionmaker, Session, relationship
+ from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import AsyncGenerator, Generator
from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
from sqlalchemy import Column, String, ForeignKey, Table, Integer
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
from typing import AsyncGenerator, Generator
from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID, Column, String, ForeignKey, Table, Integer
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import sessionmaker, Session, relationship
from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
Tools
Ruff

12-12: Redefinition of unused DeclarativeBase from line 7

Remove definition: DeclarativeBase

(F811)


12-12: Redefinition of unused sessionmaker from line 7

Remove definition: sessionmaker

(F811)


12-12: Redefinition of unused Session from line 7

Remove definition: Session

(F811)

Comment on lines 32 to 37
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app = FastAPI(debug = os.getenv("ENV") != "prod", lifespan=lifespan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

Comment on lines 86 to 89
@app.get("/authenticated-route")
async def authenticated_route(user: User = Depends(current_active_user)):
return {"message": f"Hello {user.email}!"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

@@ -267,8 +345,8 @@ def start_api_server(host: str = "0.0.0.0", port: int = 8000):
relational_config.create_engine()

from cognee.modules.data.deletion import prune_system, prune_data
asyncio.run(prune_data())
asyncio.run(prune_system(metadata = True))
# asyncio.run(prune_data())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please uncomment this after you are done.

super().__init__(self.message)

async def cognify(datasets: Union[str, list[str]] = None, root_node_id: str = None, user_id:str="default_user"):
session: AsyncSession = async_session_maker()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this login in a function in some module and just use it here.
Something like def check_permissions(user) for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this live

self.Session = sessionmaker(bind=self.engine)

else:
print("Name: ", db_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove print statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship

# Association table for many-to-many relationship between users and groups
user_group = Table('user_group', Base.metadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define this like we did for Data and Dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify exactly what needs to be done?

Comment on lines 26 to 28
class User(SQLAlchemyBaseUserTableUUID, Base):
__tablename__ = 'users'
groups = relationship('Group', secondary=user_group, back_populates='users')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vasilije1990 This one

if not group:
raise HTTPException(status_code=404, detail="Group not found")
perm = db.query(Permission).filter(Permission.name == permission).first()
if not perm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perm -> permission

Do we want to create a new permission freely here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't get it, what is the exact problem?

permissions.update(permission.name for permission in group.permissions)
return permissions

def has_permission(permission: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this function where we need to check for permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly?

@@ -60,8 +60,24 @@ services:
- 3001:3000
networks:
- cognee-network
postgres:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range, codebase verification and nitpick comments (1)
cognee/api/client.py (1)

345-346: Uncomment or remove the commented-out code.

Please uncomment this code if it is needed, or remove it if it is no longer necessary.

- # asyncio.run(prune_data())
- # asyncio.run(prune_system(metadata = True))
+ asyncio.run(prune_data())
+ asyncio.run(prune_system(metadata = True))
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 36e156e and 3f5e665.

Files selected for processing (1)
  • cognee/api/client.py (4 hunks)
Additional context used
Ruff
cognee/api/client.py

31-31: Module level import not at top of file

(E402)


53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)

Additional comments not posted (3)
cognee/api/client.py (3)

85-85: LGTM!

The inclusion of permission_router appears correct and follows the pattern used for other routers.


61-83: LGTM!

The inclusion of authentication and user management routers appears correct and follows the pattern used for other routers.

Also applies to: 96-117


Line range hint 1-345:
LGTM!

The overall structure and completeness of the file appear to be well-organized and functional.

Tools
Ruff

53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)

Comment on lines 16 to 17
from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from fastapi import Depends, FastAPI
- from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
- from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
- from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users
+ from contextlib import asynccontextmanager
+ from fastapi import Depends, FastAPI
+ from cognee.infrastructure.databases.relational.user_authentication.routers import permission_router
+ from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, create_db_and_tables
+ from cognee.infrastructure.databases.relational.user_authentication.schemas import UserCreate, UserRead, UserUpdate
+ from cognee.infrastructure.databases.relational.user_authentication.users import auth_backend, current_active_user, fastapi_users

Also applies to: 31-32, 53-60

Comment on lines 32 to 37
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield
app = FastAPI(debug = os.getenv("ENV") != "prod", lifespan=lifespan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield

Also applies to: 90-95

Comment on lines 86 to 89
@app.get("/authenticated-route")
async def authenticated_route(user: User = Depends(current_active_user)):
return {"message": f"Hello {user.email}!"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function authenticated_route.

The function authenticated_route is defined twice, which can cause conflicts.

- @app.get("/authenticated-route")
- async def authenticated_route(user: User = Depends(current_active_user)):
-    return {"message": f"Hello {user.email}!"}

Also applies to: 120-123

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3f5e665 and a93bd53.

Files selected for processing (2)
  • cognee/api/client.py (4 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
Additional context used
Ruff
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py

112-116: f-string without any placeholders

Remove extraneous f prefix

(F541)


120-120: f-string without any placeholders

Remove extraneous f prefix

(F541)

cognee/api/client.py

31-31: Module level import not at top of file

(E402)


53-53: Module level import not at top of file

(E402)


55-55: Module level import not at top of file

(E402)


57-57: Module level import not at top of file

(E402)


58-58: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)

Additional comments not posted (14)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (14)

7-16: Add type hints, docstrings, and conditionally construct db_location.

Include type hints and docstrings for better readability and maintainability. Construct db_location only if the database type is DuckDB.

-    def __init__(self, db_type: str, db_path: str, db_name: str, db_user:str, db_password:str, db_host:str, db_port:str):
+    def __init__(self, db_type: str, db_path: str, db_name: str, db_user: str, db_password: str, db_host: str, db_port: str):
+        """
+        Initializes the SQLAlchemyAdapter with the specified database configuration.
+        """
-        self.db_location = os.path.abspath(os.path.join(db_path, db_name))
+        if db_type == "duckdb":
+            self.db_location = os.path.abspath(os.path.join(db_path, db_name))

19-22: Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    async def get_async_session(self) -> AsyncSession:
+        """
+        Yields an asynchronous SQLAlchemy session.
+        """

24-27: Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def get_session(self) -> sessionmaker:
+        """
+        Yields a synchronous SQLAlchemy session.
+        """

29-38: Add type hints, docstrings, and improve filtering logic.

Include type hints and docstrings for better readability and maintainability. Improve filtering logic for readability.

+    def get_datasets(self) -> List[str]:
+        """
+        Retrieves distinct schema names from the database, excluding staging schemas and 'cognee'.
+        """
-        return list(
-            filter(
-                lambda schema_name: not schema_name.endswith("staging") and schema_name != "cognee",
-                tables
-            )
-        )
+        return [schema_name for schema_name in tables if not schema_name.endswith("staging") and schema_name != "cognee"]

40-43: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def get_files_metadata(self, dataset_name: str) -> List[Dict[str, Any]]:
+        """
+        Retrieves file metadata from the specified dataset.
+        """
-            result = connection.execute(text(f"SELECT id, name, file_path, extension, mime_type FROM {dataset_name}.file_metadata;"))
+            result = connection.execute(text("SELECT id, name, file_path, extension, mime_type FROM :dataset_name.file_metadata;"), {'dataset_name': dataset_name})

45-49: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def create_table(self, schema_name: str, table_name: str, table_config: List[Dict[str, str]]) -> None:
+        """
+        Creates a table with the specified schema and configuration.
+        """
-            connection.execute(text(f"CREATE SCHEMA IF NOT EXISTS {schema_name};"))
-            connection.execute(text(f"CREATE TABLE IF NOT EXISTS {schema_name}.{table_name} ({', '.join(fields_query_parts)});"))
+            connection.execute(text("CREATE SCHEMA IF NOT EXISTS :schema_name;"), {'schema_name': schema_name})
+            connection.execute(text("CREATE TABLE IF NOT EXISTS :schema_name.:table_name (:fields_query_parts);"), {'schema_name': schema_name, 'table_name': table_name, 'fields_query_parts': ', '.join(fields_query_parts)})

51-53: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def delete_table(self, table_name: str) -> None:
+        """
+        Deletes the specified table.
+        """
-            connection.execute(text(f"DROP TABLE IF EXISTS {table_name};"))
+            connection.execute(text("DROP TABLE IF EXISTS :table_name;"), {'table_name': table_name})

55-60: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def insert_data(self, schema_name: str, table_name: str, data: List[Dict[str, Any]]) -> None:
+        """
+        Inserts data into the specified table.
+        """
-        insert_query = text(f"INSERT INTO {schema_name}.{table_name} ({columns}) VALUES {values};")
+        insert_query = text("INSERT INTO :schema_name.:table_name (:columns) VALUES :values;")
+        with self.engine.connect() as connection:
+            connection.execute(insert_query, {'schema_name': schema_name, 'table_name': table_name, 'columns': columns, 'values': values, **data})

62-77: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def get_data(self, table_name: str, filters: Optional[Dict[str, Union[str, List[str]]]] = None) -> Dict[str, Any]:
+        """
+        Retrieves data from the specified table based on the provided filters.
+        """
-            query = f"SELECT * FROM {table_name}"
+            query = "SELECT * FROM :table_name"
-                filter_conditions = " AND ".join([
-                    f"{key} IN ({', '.join([f':{key}{i}' for i in range(len(value))])})" if isinstance(value, list)
-                    else f"{key} = :{key}" for key, value in filters.items()
-                ])
-                query += f" WHERE {filter_conditions};"
+                filter_conditions = " AND ".join([
+                    f"{key} IN (:values)" if isinstance(value, list)
+                    else f"{key} = :{key}" for key, value in filters.items()
+                ])
+                query += f" WHERE {filter_conditions};"
+                query = text(query)
+                results = connection.execute(query, {'table_name': table_name, 'values': ', '.join([f':{key}{i}' for i in range(len(value))]), **filters})

79-82: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def execute_query(self, query: str) -> List[Dict[str, Any]]:
+        """
+        Executes the provided SQL query.
+        """
-            result = connection.execute(text(query))
+            result = connection.execute(text(":query"), {'query': query})

84-98: Add type hints, docstrings, and use parameterized queries.

Include type hints and docstrings for better readability and maintainability. Use parameterized queries to prevent SQL injection.

+    def load_cognify_data(self, data: List[Dict[str, Any]]) -> None:
+        """
+        Loads data into the `cognify` table.
+        """
-        insert_query = cognify_table.insert().values(document_id=text(':document_id'), layer_id=text(':layer_id'))
+        insert_query = cognify_table.insert().values(document_id=:document_id, layer_id=:layer_id)

100-122: Add type hints, docstrings, and remove redundant table creation logic.

Include type hints and docstrings for better readability and maintainability. Remove redundant table creation logic before fetching data. Remove extraneous f prefixes from f-strings without placeholders.

+    def fetch_cognify_data(self, excluded_document_id: str) -> List[Dict[str, Any]]:
+        """
+        Fetches data from the `cognify` table based on the provided `excluded_document_id`.
+        """
-            connection.execute(text("""
-                CREATE TABLE IF NOT EXISTS cognify (
-                    document_id STRING,
-                    layer_id STRING,
-                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-                    updated_at TIMESTAMP DEFAULT NULL,
-                    processed BOOLEAN DEFAULT FALSE,
-                    document_id_target STRING NULL
-                );
-            """))
-            query = text(f"""
+            query = text("""
+                SELECT document_id, layer_id, created_at, updated_at, processed
+                FROM cognify
+                WHERE document_id != :excluded_document_id AND processed = FALSE;
+            """)
-                update_query = text(f"UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
+                update_query = text("UPDATE cognify SET processed = TRUE WHERE document_id IN :document_ids;")
Tools
Ruff

112-116: f-string without any placeholders

Remove extraneous f prefix

(F541)


120-120: f-string without any placeholders

Remove extraneous f prefix

(F541)


124-137: Add type hints, docstrings, and remove redundant table creation logic.

Include type hints and docstrings for better readability and maintainability. Remove redundant table creation logic before deletion.

+    def delete_cognify_data(self) -> None:
+        """
+        Deletes data from the `cognify` table.
+        """
-            connection.execute(text("""
-                CREATE TABLE IF NOT EXISTS cognify (
-                    document_id STRING,
-                    layer_id STRING,
-                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-                    updated_at TIMESTAMP DEFAULT NULL,
-                    processed BOOLEAN DEFAULT FALSE,
-                    document_id_target STRING NULL
-                );
-            """))

139-143: Add type hints and docstrings.

Include type hints and docstrings for better readability and maintainability.

+    def delete_database(self) -> None:
+        """
+        Deletes the database file and its write-ahead log file.
+        """

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Outside diff range, codebase verification and nitpick comments (2)
cognee/api/v1/authenticate_user/authenticate_user.py (1)

1-1: The authenticate_user_method function does not exist in the authentication_db module.

  • The file cognee/infrastructure/databases/relational/user_authentication/authentication_db.py does not contain a function named authenticate_user_method.

Please review the import statement in cognee/api/v1/authenticate_user/authenticate_user.py and ensure that the correct module and function are being imported.

Analysis chain

Verify the import statement.

Ensure that the authenticate_user_method function exists in the authentication_db module.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of the `authenticate_user_method` function in the `authentication_db` module.

# Test: Search for the function definition. Expect: The function definition exists.
ast-grep --lang python --pattern $'def authenticate_user_method($_):'

Length of output: 70


Script:

#!/bin/bash
# Description: Locate the `authentication_db` module and verify the existence of the `authenticate_user_method` function.

# Locate the `authentication_db` module
fd 'authentication_db.py'

# Search for the function definition within the located file(s)
fd 'authentication_db.py' --exec ast-grep --lang python --pattern $'def authenticate_user_method($_):' {}

Length of output: 216

cognee/api/v1/create_user/create_user.py (1)

1-1: The create_user_method function does not exist in the authentication_db module.

  • The module cognee/infrastructure/databases/relational/user_authentication/authentication_db.py exists.
  • However, the function create_user_method was not found within this module.

Please review and correct the import statement in cognee/api/v1/create_user/create_user.py.

Analysis chain

Verify the import statement.

Ensure that the create_user_method function exists in the authentication_db module.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of the `create_user_method` function in the `authentication_db` module.

# Test: Search for the function definition. Expect: The function definition exists.
ast-grep --lang python --pattern $'def create_user_method($_, $_, $_):'

Length of output: 72


Script:

#!/bin/bash
# Description: Re-verify the existence of the `create_user_method` function in the `authentication_db` module with a broader search pattern.

# Test: Search for any function named `create_user_method`. Expect: The function definition exists.
ast-grep --lang python --pattern 'def create_user_method($$):'

# Additionally, verify the module path to ensure it exists.
fd 'authentication_db.py' | xargs -I {} echo "Found module: {}"

Length of output: 228

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a93bd53 and 9616545.

Files selected for processing (8)
  • cognee/api/v1/authenticate_user/init.py (1 hunks)
  • cognee/api/v1/authenticate_user/authenticate_user.py (1 hunks)
  • cognee/api/v1/create_user/init.py (1 hunks)
  • cognee/api/v1/create_user/create_user.py (1 hunks)
  • cognee/api/v1/reset_user_password/init.py (1 hunks)
  • cognee/api/v1/verify_user_token/init.py (1 hunks)
  • cognee/api/v1/verify_user_token/verify_user_token.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
Files skipped from review due to trivial changes (4)
  • cognee/api/v1/authenticate_user/init.py
  • cognee/api/v1/create_user/init.py
  • cognee/api/v1/reset_user_password/init.py
  • cognee/api/v1/verify_user_token/init.py
Additional context used
Ruff
cognee/infrastructure/databases/relational/user_authentication/authentication_db.py

15-15: Redefinition of unused DeclarativeBase from line 9

Remove definition: DeclarativeBase

(F811)


15-15: Redefinition of unused sessionmaker from line 9

Remove definition: sessionmaker

(F811)


15-15: Redefinition of unused Session from line 9

Remove definition: Session

(F811)


149-149: Local variable user_manager is assigned to but never used

Remove assignment to unused variable user_manager

(F841)


158-158: Do not use bare except

(E722)


174-174: Do not use bare except

(E722)

Additional comments not posted (10)
cognee/api/v1/verify_user_token/verify_user_token.py (1)

1-1: Verify the import statement.

Ensure that the user_check_token function exists in the authentication_db module.

Verification successful

The user_check_token function exists in the authentication_db module. The import statement is correct.

  • The user_check_token function is defined towards the end of the authentication_db module.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of the `user_check_token` function in the `authentication_db` module.

# Test: Search for the function definition. Expect: The function definition exists.
ast-grep --lang python --pattern $'def user_check_token($_):'

Length of output: 62


Script:

#!/bin/bash
# Description: List all function definitions in the `authentication_db` module to verify the presence of `user_check_token`.

# Locate the `authentication_db` module
auth_db_path=$(fd authentication_db.py)

# List all function definitions in the `authentication_db` module
ast-grep --lang python --pattern $'def $_($_):' $auth_db_path

Length of output: 258


Script:

#!/bin/bash
# Description: Display the contents of the `authentication_db` module to verify the presence of `user_check_token`.

# Locate the `authentication_db` module
auth_db_path=$(fd authentication_db.py)

# Display the contents of the `authentication_db` module
cat $auth_db_path

Length of output: 7948

cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (9)

20-27: Consider adding explicit primary keys for association tables.

While the association tables are correctly defined, adding explicit primary keys can help with performance and data integrity.

- user_group = Table('user_group', Base.metadata,
-                   Column('user_id', UUID, ForeignKey('users.id')),
-                   Column('group_id', UUID, ForeignKey('groups.id')))
+ user_group = Table('user_group', Base.metadata,
+                   Column('id', UUID, primary_key=True, index=True),
+                   Column('user_id', UUID, ForeignKey('users.id')),
+                   Column('group_id', UUID, ForeignKey('groups.id')))

- group_permission = Table('group_permission', Base.metadata,
-                         Column('group_id', UUID, ForeignKey('groups.id')),
-                         Column('permission_id', UUID, ForeignKey('permissions.id')))
+ group_permission = Table('group_permission', Base.metadata,
+                         Column('id', UUID, primary_key=True, index=True),
+                         Column('group_id', UUID, ForeignKey('groups.id')),
+                         Column('permission_id', UUID, ForeignKey('permissions.id')))

35-46: LGTM!

The model classes Group and Permission are correctly defined with primary keys and relationships.


60-65: LGTM!

The configuration section is correctly defined.


67-69: LGTM!

The function create_db_and_tables is correctly defined.


71-73: LGTM!

The function get_async_session is correctly defined.


76-77: LGTM!

The function get_user_db is correctly defined.


80-81: LGTM!

The function hash_password is correctly defined.


114-122: LGTM!

The function reset_user_password_method is correctly defined.


48-57: Verify the relationship to dataset_data.

The ACL class includes a relationship to dataset_data, which should be reviewed for correctness.

Verification successful

Verify the relationship to dataset_data.

The ACL class includes a relationship to dataset_data, which should be reviewed for correctness.


Verified: The relationship to dataset_data is correctly defined.

The ACL class includes a correctly defined relationship to dataset_data based on the table name specified in the DatasetData model.

  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py: Relationship to dataset_data is correctly defined.
  • cognee/infrastructure/data/models/DatasetData.py: __tablename__ is set to "dataset_data".
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the relationship to `dataset_data` in the `ACL` class.

# Test: Search for the `dataset_data` definition. Expect: Correct relationship definition.
rg --type python -A 5 $'dataset_data'

Length of output: 67


Script:

#!/bin/bash
# Description: Verify the relationship to `dataset_data` in the `ACL` class.

# Test: Search for the `dataset_data` definition. Expect: Correct relationship definition.
rg --type py -A 5 $'dataset_data'

Length of output: 4769

Comment on lines 5 to 8
async def verify_user_token(token: str):

output = await user_check_token(token=token)
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and a docstring.

The function lacks error handling and documentation. Consider adding a try-except block and a docstring.

+ """
+ Verify the user token.
+
+ Args:
+     token (str): The user token to verify.
+
+ Returns:
+     output: The result of the token verification.
+ """
async def verify_user_token(token: str):
+    try:
        output = await user_check_token(token=token)
+    except Exception as e:
+        # Handle the exception (e.g., log the error, raise a custom exception, etc.)
+        raise e
    return output

Committable suggestion was skipped due to low confidence.

Comment on lines 124 to 142
# async def generate_verification_token(email: str, tokens_db: dict) -> str:
# async with get_async_session_context() as session:
# async with get_user_db_context(session) as user_db:
# if not await user_db.get_by_email(email):
# raise ValueError("User does not exist")
# token = str(uuid.uuid4())
# tokens_db[token] = email
# return token

# async def verify_user_method(token: str, tokens_db: dict) -> bool:
# async with get_async_session_context() as session:
# async with get_user_db_context(session) as user_db:
# email = tokens_db.get(token)
# if not email or not await user_db.get_by_email(email):
# return False
# user = await user_db.get_by_email(email)
# user.is_verified = True
# await user_db.update(user)
# return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address commented-out functions.

The commented-out functions for generating and verifying tokens should be reviewed for relevance and either implemented or removed.

Do you want me to help implement these functions or open a GitHub issue to track this task?

Comment on lines 4 to 9
async def authenticate_user():
"""
This function is used to authenticate a user.
"""
output = await authenticate_user_method()
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and improve the docstring.

The function lacks error handling and the docstring could be more descriptive. Consider adding a try-except block and enhancing the docstring.

async def authenticate_user():
-    """
-    This function is used to authenticate a user.
-    """
+    """
+    Authenticate a user.
+
+    Returns:
+        output: The result of the user authentication.
+    """
+    try:
        output = await authenticate_user_method()
+    except Exception as e:
+        # Handle the exception (e.g., log the error, raise a custom exception, etc.)
+        raise e
    return output
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def authenticate_user():
"""
This function is used to authenticate a user.
"""
output = await authenticate_user_method()
return output
async def authenticate_user():
"""
Authenticate a user.
Returns:
output: The result of the user authentication.
"""
try:
output = await authenticate_user_method()
except Exception as e:
# Handle the exception (e.g., log the error, raise a custom exception, etc.)
raise e
return output

Comment on lines 100 to 112
async def authenticate_user_method(email: str, password: str) -> Optional[User]:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
credentials = OAuth2PasswordRequestForm(username=email, password=password)
user = await user_manager.authenticate(credentials)
if user is None or not user.is_active:
return None
return user
except Exception as e:
print(f"Error during authentication: {e}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve exception handling by logging the exception.

The function authenticate_user_method is correctly defined, but the exception handling can be improved by logging the exception.

-    except Exception as e:
-        print(f"Error during authentication: {e}")
+    except Exception as e:
+        print(f"Error during authentication: {e}")
+        raise e

Committable suggestion was skipped due to low confidence.

Comment on lines 88 to 98
async def create_user_method(email: str, password: str, is_superuser: bool = False):
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
user = await user_manager.create(
UserCreate(email=email, password=password, is_superuser=is_superuser)
)
print(f"User created: {user.email}")
except UserAlreadyExists:
print(f"User {email} already exists")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve exception handling by logging the exception.

The function create_user_method is correctly defined, but the exception handling can be improved by logging the exception.

-    except UserAlreadyExists:
-        print(f"User {email} already exists")
+    except UserAlreadyExists as e:
+        print(f"User {email} already exists: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def create_user_method(email: str, password: str, is_superuser: bool = False):
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
user = await user_manager.create(
UserCreate(email=email, password=password, is_superuser=is_superuser)
)
print(f"User created: {user.email}")
except UserAlreadyExists:
print(f"User {email} already exists")
async def create_user_method(email: str, password: str, is_superuser: bool = False):
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
user = await user_manager.create(
UserCreate(email=email, password=password, is_superuser=is_superuser)
)
print(f"User created: {user.email}")
except UserAlreadyExists as e:
print(f"User {email} already exists: {e}")

Comment on lines 1 to 15
import hashlib
import uuid
from typing import AsyncGenerator, Generator, Optional

from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
from sqlalchemy import Column, String, ForeignKey, Table, Integer
from contextlib import asynccontextmanager
from fastapi_users.exceptions import UserAlreadyExists
from fastapi_users.authentication import JWTStrategy
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up redundant and redefined imports.

The imports DeclarativeBase, sessionmaker, and Session are redefined on line 15. Additionally, there are redundant imports that can be removed to improve readability and maintainability.

- from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
- from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
+ from sqlalchemy.orm import relationship
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import hashlib
import uuid
from typing import AsyncGenerator, Generator, Optional
from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
from sqlalchemy import Column, String, ForeignKey, Table, Integer
from contextlib import asynccontextmanager
from fastapi_users.exceptions import UserAlreadyExists
from fastapi_users.authentication import JWTStrategy
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
import hashlib
import uuid
from typing import AsyncGenerator, Generator, Optional
from fastapi import Depends
from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from cognee.infrastructure.databases.relational import get_relationaldb_config, Base
from sqlalchemy import Column, String, ForeignKey, Table, Integer
from contextlib import asynccontextmanager
from fastapi_users.exceptions import UserAlreadyExists
from fastapi_users.authentication import JWTStrategy
from sqlalchemy.orm import relationship
Tools
Ruff

15-15: Redefinition of unused DeclarativeBase from line 9

Remove definition: DeclarativeBase

(F811)


15-15: Redefinition of unused sessionmaker from line 9

Remove definition: sessionmaker

(F811)


15-15: Redefinition of unused Session from line 9

Remove definition: Session

(F811)

Comment on lines 145 to 159
async def user_create_token(user: User) -> Optional[str]:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
if user is None:
return None
strategy = get_jwt_strategy()
token = await strategy.write_token(user)
if token is not None:
return token
else:
return None
except:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused variable and improve exception handling.

The function user_create_token is correctly defined, but the unused user_manager variable should be removed. Additionally, the bare except should be replaced with a specific exception.

-                async with get_user_manager_context(user_db) as user_manager:
+                async with get_user_manager_context(user_db):
...
-    except:
+    except Exception as e:
+        print(f"Error during token creation: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def user_create_token(user: User) -> Optional[str]:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
if user is None:
return None
strategy = get_jwt_strategy()
token = await strategy.write_token(user)
if token is not None:
return token
else:
return None
except:
return None
async def user_create_token(user: User) -> Optional[str]:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db):
if user is None:
return None
strategy = get_jwt_strategy()
token = await strategy.write_token(user)
if token is not None:
return token
else:
return None
except Exception as e:
print(f"Error during token creation: {e}")
return None
Tools
Ruff

149-149: Local variable user_manager is assigned to but never used

Remove assignment to unused variable user_manager

(F841)


158-158: Do not use bare except

(E722)

Comment on lines 161 to 175
async def user_check_token(token: str) -> bool:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
if token is None:
return False
strategy = get_jwt_strategy()
user = await strategy.read_token(token, user_manager)
if user is None or not user.is_active:
return False
else:
return True
except:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve exception handling.

The function user_check_token is correctly defined, but the bare except should be replaced with a specific exception.

-    except:
-        return False
+    except Exception as e:
+        print(f"Error during token validation: {e}")
+        return False
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def user_check_token(token: str) -> bool:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
if token is None:
return False
strategy = get_jwt_strategy()
user = await strategy.read_token(token, user_manager)
if user is None or not user.is_active:
return False
else:
return True
except:
return False
async def user_check_token(token: str) -> bool:
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
if token is None:
return False
strategy = get_jwt_strategy()
user = await strategy.read_token(token, user_manager)
if user is None or not user.is_active:
return False
else:
return True
except Exception as e:
print(f"Error during token validation: {e}")
return False
Tools
Ruff

174-174: Do not use bare except

(E722)

Comment on lines 5 to 7
async def create_user(email: str, password: str, is_superuser: bool = False):
output = await create_user_method(email=email, password=password, is_superuser=is_superuser)
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling and a docstring.

The function lacks error handling and documentation. Consider adding a try-except block and a docstring.

+ """
+ Create a new user.
+
+ Args:
+     email (str): The email of the user.
+     password (str): The password of the user.
+     is_superuser (bool): Whether the user is a superuser. Defaults to False.
+
+ Returns:
+     output: The result of the user creation.
+ """
async def create_user(email: str, password: str, is_superuser: bool = False):
+    try:
        output = await create_user_method(email=email, password=password, is_superuser=is_superuser)
+    except Exception as e:
+        # Handle the exception (e.g., log the error, raise a custom exception, etc.)
+        raise e
    return output
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def create_user(email: str, password: str, is_superuser: bool = False):
output = await create_user_method(email=email, password=password, is_superuser=is_superuser)
return output
"""
Create a new user.
Args:
email (str): The email of the user.
password (str): The password of the user.
is_superuser (bool): Whether the user is a superuser. Defaults to False.
Returns:
output: The result of the user creation.
"""
async def create_user(email: str, password: str, is_superuser: bool = False):
try:
output = await create_user_method(email=email, password=password, is_superuser=is_superuser)
except Exception as e:
# Handle the exception (e.g., log the error, raise a custom exception, etc.)
raise e
return output

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9616545 and 218d322.

Files selected for processing (7)
  • cognee/api/v1/authenticate_user/authenticate_user.py (1 hunks)
  • cognee/api/v1/cognify/cognify_v2.py (3 hunks)
  • cognee/api/v1/create_user/create_user.py (1 hunks)
  • cognee/api/v1/verify_user_token/verify_user_token.py (1 hunks)
  • cognee/infrastructure/data/models/DatasetData.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (1 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • cognee/api/v1/authenticate_user/authenticate_user.py
  • cognee/api/v1/create_user/create_user.py
  • cognee/api/v1/verify_user_token/verify_user_token.py
Additional context used
Ruff
cognee/infrastructure/databases/relational/user_authentication/authentication_db.py

15-15: Redefinition of unused DeclarativeBase from line 9

Remove definition: DeclarativeBase

(F811)


15-15: Redefinition of unused sessionmaker from line 9

Remove definition: sessionmaker

(F811)


15-15: Redefinition of unused Session from line 9

Remove definition: Session

(F811)

cognee/api/v1/cognify/cognify_v2.py

83-83: Undefined name hashed_user_id

(F821)


83-83: Undefined name user_permissions

(F821)

cognee/infrastructure/databases/relational/user_authentication/users.py

14-14: Redefinition of unused Depends from line 5

Remove definition: Depends

(F811)


19-19: Redefinition of unused JWTStrategy from line 10

Remove definition: JWTStrategy

(F811)


150-150: Local variable user_manager is assigned to but never used

Remove assignment to unused variable user_manager

(F841)


159-159: Do not use bare except

(E722)


175-175: Do not use bare except

(E722)

Additional comments not posted (17)
cognee/infrastructure/data/models/DatasetData.py (4)

8-8: Ensure consistency with ORM framework.

The class now inherits from Base instead of ModelBase. Verify that this change aligns with the ORM framework being used.


13-14: Primary key adjustment.

The primary key definitions for dataset_id and data_id have been removed. Ensure that the new primary key setup with id is consistent with the overall data model.


15-17: Unique constraint added.

The unique constraint on dataset_id and data_id ensures data integrity. This is a good practice to prevent duplicate entries.


19-19: Relationship with ACL model.

The relationship with the ACL model establishes a back-reference. Ensure that the ACL model has the corresponding relationship defined.

cognee/infrastructure/databases/relational/user_authentication/authentication_db.py (4)

36-42: Ensure relationships are correctly defined.

The Group model defines several relationships. Ensure that the corresponding relationships are defined in the related models.


43-47: Ensure relationships are correctly defined.

The Permission model defines a relationship with the Group model. Ensure that the corresponding relationship is defined in the Group model.


49-58: Ensure relationships are correctly defined.

The ACL model defines relationships with DatasetData, User, and Group. Ensure that the corresponding relationships are defined in the related models.


31-34: Add id column to the User model.

The User model is missing the id column, which is necessary for primary key identification.

+    id = Column(UUID, primary_key=True, index=True)

Likely invalid or redundant comment.

cognee/api/v1/cognify/cognify_v2.py (2)

34-37: LGTM!

The PermissionDeniedException class is correctly implemented.


39-47: Verify the integration of user permissions.

The function now retrieves user permissions and checks for the required "write" permission. Ensure that the get_user_permissions function and the fastapi_users manager are correctly implemented and integrated.

cognee/infrastructure/databases/relational/user_authentication/users.py (7)

26-41: Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

import logging

logger = logging.getLogger(__name__)

class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
    reset_password_token_secret = SECRET
    verification_token_secret = SECRET

    async def on_after_register(self, user: User, request: Optional[Request] = None):
        logger.info(f"User {user.id} has registered.")

    async def on_after_forgot_password(self, user: User, token: str, request: Optional[Request] = None):
        logger.info(f"User {user.id} has forgot their password. Reset token: {token}")

    async def on_after_request_verify(self, user: User, token: str, request: Optional[Request] = None):
        logger.info(f"Verification requested for user {user.id}. Verification token: {token}")

44-45: LGTM!

The get_user_manager function correctly yields the UserManager instance.


51-52: LGTM!

The get_jwt_strategy function correctly returns the JWT strategy.


66-70: LGTM!

The get_user_permissions function correctly retrieves and returns the user's permissions.


72-78: LGTM!

The has_permission function correctly checks the user's permissions and raises an HTTP exception if the user lacks the required permission.


81-82: LGTM!

The hash_password function correctly hashes the password using SHA-256.


115-123: LGTM!

The reset_user_password_method function correctly resets the user's password and updates the database.

from fastapi_users.db import SQLAlchemyBaseUserTableUUID, SQLAlchemyUserDatabase
from sqlalchemy import create_engine, UUID
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up redundant and redefined imports.

The imports DeclarativeBase, sessionmaker, and Session are redefined. Additionally, there are redundant imports that can be removed to improve readability and maintainability.

- from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
- from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session, relationship
+ from sqlalchemy.orm import relationship

Also applies to: 15-15

Comment on lines 20 to 27
user_group = Table('user_group', Base.metadata,
Column('user_id', UUID, ForeignKey('users.id')),
Column('group_id', UUID, ForeignKey('groups.id')))

# Association table for many-to-many relationship between groups and permissions
group_permission = Table('group_permission', Base.metadata,
Column('group_id', UUID, ForeignKey('groups.id')),
Column('permission_id', UUID, ForeignKey('permissions.id')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding explicit primary keys for association tables.

While the association tables are correctly defined, adding explicit primary keys can help with performance and data integrity.

- user_group = Table('user_group', Base.metadata,
-                   Column('user_id', UUID, ForeignKey('users.id')),
-                   Column('group_id', UUID, ForeignKey('groups.id')))
+ user_group = Table('user_group', Base.metadata,
+                   Column('id', UUID, primary_key=True, index=True),
+                   Column('user_id', UUID, ForeignKey('users.id')),
+                   Column('group_id', UUID, ForeignKey('groups.id')))

- group_permission = Table('group_permission', Base.metadata,
-                         Column('group_id', UUID, ForeignKey('groups.id')),
-                         Column('permission_id', UUID, ForeignKey('permissions.id')))
+ group_permission = Table('group_permission', Base.metadata,
+                         Column('id', UUID, primary_key=True, index=True),
+                         Column('group_id', UUID, ForeignKey('groups.id')),
+                         Column('permission_id', UUID, ForeignKey('permissions.id')))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user_group = Table('user_group', Base.metadata,
Column('user_id', UUID, ForeignKey('users.id')),
Column('group_id', UUID, ForeignKey('groups.id')))
# Association table for many-to-many relationship between groups and permissions
group_permission = Table('group_permission', Base.metadata,
Column('group_id', UUID, ForeignKey('groups.id')),
Column('permission_id', UUID, ForeignKey('permissions.id')))
user_group = Table('user_group', Base.metadata,
Column('id', UUID, primary_key=True, index=True),
Column('user_id', UUID, ForeignKey('users.id')),
Column('group_id', UUID, ForeignKey('groups.id')))
# Association table for many-to-many relationship between groups and permissions
group_permission = Table('group_permission', Base.metadata,
Column('id', UUID, primary_key=True, index=True),
Column('group_id', UUID, ForeignKey('groups.id')),
Column('permission_id', UUID, ForeignKey('permissions.id')))

Comment on lines 89 to 99
async def create_user_method(email: str, password: str, is_superuser: bool = False):
try:
async with get_async_session_context() as session:
async with get_user_db_context(session) as user_db:
async with get_user_manager_context(user_db) as user_manager:
user = await user_manager.create(
UserCreate(email=email, password=password, is_superuser=is_superuser)
)
print(f"User created: {user.email}")
except UserAlreadyExists:
print(f"User {email} already exists")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

import logging

logger = logging.getLogger(__name__)

async def create_user_method(email: str, password: str, is_superuser: bool = False):
    try:
        async with get_async_session_context() as session:
            async with get_user_db_context(session) as user_db:
                async with get_user_manager_context(user_db) as user_manager:
                    user = await user_manager.create(
                        UserCreate(email=email, password=password, is_superuser=is_superuser)
                    )
                    logger.info(f"User created: {user.email}")
    except UserAlreadyExists:
        logger.warning(f"User {email} already exists")

@@ -60,7 +80,7 @@ async def run_cognify_pipeline(dataset_name: str, files: list[dict]):
root_node_id = "ROOT"

tasks = [
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined variables hashed_user_id and user_permissions.

The variables hashed_user_id and user_permissions are used but not defined in the provided code. Ensure that these variables are correctly defined and passed to the Task instances.

-                Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
+                Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Tools
Ruff

83-83: Undefined name hashed_user_id

(F821)


83-83: Undefined name user_permissions

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Outside diff range, codebase verification and nitpick comments (3)
cognee/api/v1/add/add.py (2)

89-89: The function call to data_resources does not match the new signature.

  • File: cognee/api/v1/add/add.py
  • Lines: 89-89
  • Issue: The function call to data_resources does not include the user_id parameter, which is now required by the updated function signature.
Analysis chain

LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to data_resources match the new signature.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `data_resources` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'data_resources'

Length of output: 69


Script:

#!/bin/bash
# Description: Verify all function calls to `data_resources` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'data_resources'

Length of output: 920


55-55: Update the function call to match the new signature.

The function call to add_files in cognee/api/v1/add/add.py does not include the user_id parameter. Update this call to match the new function signature.

  • cognee/api/v1/add/add.py: Line with return await add_files(file_paths, dataset_name)
Analysis chain

LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to add_files match the new signature.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `add_files` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'add_files'

Length of output: 64


Script:

#!/bin/bash
# Description: Verify all function calls to `add_files` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'add_files'

Length of output: 573

cognee/api/v1/cognify/cognify_v2.py (1)

43-43: Function calls to cognify need updating to match the new signature.

The cognify function has been updated to include a user_id parameter, but several calls to this function in the codebase do not include this new parameter. Please update these calls to match the new function signature.

  • cognee/tests/test_weaviate.py
  • cognee/tests/test_qdrant.py
  • cognee/tests/test_neo4j.py
  • cognee/tests/test_library.py
Analysis chain

LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to cognify match the new signature.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `cognify` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'cognify'

Length of output: 62


Script:

#!/bin/bash
# Description: Verify all function calls to `cognify` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'cognify'

Length of output: 34467

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 218d322 and b4d1a73.

Files selected for processing (3)
  • cognee/api/v1/add/add.py (3 hunks)
  • cognee/api/v1/cognify/cognify_v2.py (3 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
Additional context used
Ruff
cognee/api/v1/cognify/cognify_v2.py

61-61: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)


89-89: Undefined name hashed_user_id

(F821)


89-89: Undefined name user_permissions

(F821)

cognee/infrastructure/databases/relational/user_authentication/users.py

14-14: Redefinition of unused Depends from line 5

Remove definition: Depends

(F811)


20-20: Redefinition of unused JWTStrategy from line 10

Remove definition: JWTStrategy

(F811)


151-151: Local variable user_manager is assigned to but never used

Remove assignment to unused variable user_manager

(F841)


160-160: Do not use bare except

(E722)


176-176: Do not use bare except

(E722)


async with get_async_session_context() as session:

out = await has_permission_document(user_id, file["id"], "write", session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove assignment to unused variable out.

The variable out is assigned but never used.

-                out = await has_permission_document(user_id, file["id"], "write", session)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
out = await has_permission_document(user_id, file["id"], "write", session)
await has_permission_document(user_id, file["id"], "write", session)
Tools
Ruff

61-61: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)

Comment on lines 220 to 240
async def give_permission_document(user: Optional[User], document_id: str, permission: str,
session: AsyncSession):

acl_entry = ACL(
document_id=document_id,
user_id=user.id,
permission=permission
)
session.add(acl_entry)
await session.commit()


if user.is_superuser:
permission = 'all_permissions' # Example permission, change as needed
acl_entry = ACL(
document_id=document_id,
user_id=user.id,
permission=permission
)
session.add(acl_entry)
await session.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify code for superuser permissions.

The code for superuser permissions is redundant and can be simplified.

async def give_permission_document(user: Optional[User], document_id: str, permission: str,
                                   session: AsyncSession):

    acl_entry = ACL(
        document_id=document_id,
        user_id=user.id,
        permission=permission
    )
    session.add(acl_entry)
    await session.commit()

    if user.is_superuser:
        acl_entry = ACL(
            document_id=document_id,
            user_id=user.id,
            permission='all_permissions'  # Example permission, change as needed
        )
        session.add(acl_entry)
        await session.commit()
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def give_permission_document(user: Optional[User], document_id: str, permission: str,
session: AsyncSession):
acl_entry = ACL(
document_id=document_id,
user_id=user.id,
permission=permission
)
session.add(acl_entry)
await session.commit()
if user.is_superuser:
permission = 'all_permissions' # Example permission, change as needed
acl_entry = ACL(
document_id=document_id,
user_id=user.id,
permission=permission
)
session.add(acl_entry)
await session.commit()
async def give_permission_document(user: Optional[User], document_id: str, permission: str,
session: AsyncSession):
acl_entry = ACL(
document_id=document_id,
user_id=user.id,
permission=permission
)
session.add(acl_entry)
await session.commit()
if user.is_superuser:
acl_entry = ACL(
document_id=document_id,
user_id=user.id,
permission='all_permissions' # Example permission, change as needed
)
session.add(acl_entry)
await session.commit()

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b4d1a73 and 7930586.

Files selected for processing (2)
  • cognee/api/v1/cognify/cognify_v2.py (3 hunks)
  • cognee/infrastructure/databases/relational/user_authentication/users.py (1 hunks)
Additional context used
Ruff
cognee/api/v1/cognify/cognify_v2.py

62-62: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)


90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)

cognee/infrastructure/databases/relational/user_authentication/users.py

14-14: Redefinition of unused Depends from line 5

Remove definition: Depends

(F811)


20-20: Redefinition of unused JWTStrategy from line 10

Remove definition: JWTStrategy

(F811)


154-154: Local variable user_manager is assigned to but never used

Remove assignment to unused variable user_manager

(F841)


163-163: Do not use bare except

(E722)


179-179: Do not use bare except

(E722)

Additional comments not posted (11)
cognee/api/v1/cognify/cognify_v2.py (7)

4-13: Imports look good.

The new imports are necessary for the added user management functionalities.


38-41: Exception class implementation looks good.

The PermissionDeniedException class is correctly implemented.


42-43: Function signature update looks good.

The cognify function now includes a user_id parameter, which is necessary for user permission checks.


Line range hint 44-48:
Initialization segment looks good.

The segment initializes configurations and creates a task status table, which is necessary for the function.


Line range hint 49-53:
Dataset handling segment looks good.

The segment handles the datasets parameter and calls cognify recursively if datasets are not provided.

Tools
Ruff

62-62: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)


90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)


65-73: Task status update segment looks good.

The segment handles task status updates using a lock, which is necessary for synchronization.


Line range hint 127-137:
Existing datasets handling segment looks good.

The segment handles existing datasets and calls run_cognify_pipeline.

Tools
Ruff

62-62: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)


90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)

cognee/infrastructure/databases/relational/user_authentication/users.py (4)

45-46: Function implementation looks good.

The get_user_manager function is correctly implemented.


52-53: Function implementation looks good.

The get_jwt_strategy function is correctly implemented.


70-74: Function implementation looks good.

The get_user_permissions function is correctly implemented.


76-82: Function implementation looks good.

The has_permission function is correctly implemented.

Comment on lines 54 to 62

for file in files:
file["id"] = str(uuid.uuid4())
file["name"] = file["name"].replace(" ", "_")

async with get_async_session_context() as session:
active_user = await fast_api_users_init()

out = await has_permission_document(active_user.current_user(active=True), file["id"], "write", session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove assignment to unused variable out.

The variable out is assigned but never used.

-                out = await has_permission_document(active_user.current_user(active=True), file["id"], "write", session)
+                await has_permission_document(active_user.current_user(active=True), file["id"], "write", session)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for file in files:
file["id"] = str(uuid.uuid4())
file["name"] = file["name"].replace(" ", "_")
async with get_async_session_context() as session:
active_user = await fast_api_users_init()
out = await has_permission_document(active_user.current_user(active=True), file["id"], "write", session)
for file in files:
file["id"] = str(uuid.uuid4())
file["name"] = file["name"].replace(" ", "_")
async with get_async_session_context() as session:
active_user = await fast_api_users_init()
await has_permission_document(active_user.current_user(active=True), file["id"], "write", session)
Tools
Ruff

62-62: Local variable out is assigned to but never used

Remove assignment to unused variable out

(F841)

Comment on lines 74 to 126
cognee_config = get_cognify_config()
graph_config = get_graph_config()
root_node_id = None

if graph_config.infer_graph_topology and graph_config.graph_topology_task:
from cognee.modules.topology.topology import TopologyEngine
topology_engine = TopologyEngine(infer=graph_config.infer_graph_topology)
root_node_id = await topology_engine.add_graph_topology(files = files)
elif graph_config.infer_graph_topology and not graph_config.infer_graph_topology:
from cognee.modules.topology.topology import TopologyEngine
topology_engine = TopologyEngine(infer=graph_config.infer_graph_topology)
await topology_engine.add_graph_topology(graph_config.topology_file_path)
elif not graph_config.graph_topology_task:
root_node_id = "ROOT"

tasks = [
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(establish_graph_topology, topology_model = KnowledgeGraph), # Set the graph topology for the document chunk data
Task(expand_knowledge_graph, graph_model = KnowledgeGraph), # Generate knowledge graphs from the document chunks and attach it to chunk nodes
Task(filter_affected_chunks, collection_name = "chunks"), # Find all affected chunks, so we don't process unchanged chunks
Task(
save_data_chunks,
collection_name = "chunks",
), # Save the document chunks in vector db and as nodes in graph db (connected to the document node and between each other)
run_tasks_parallel([
Task(
summarize_text_chunks,
summarization_model = cognee_config.summarization_model,
collection_name = "chunk_summaries",
), # Summarize the document chunks
Task(
classify_text_chunks,
classification_model = cognee_config.classification_model,
),
]),
Task(remove_obsolete_chunks), # Remove the obsolete document chunks.
]

pipeline = run_tasks(tasks, [
PdfDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "pdf" else
AudioDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "audio" else
ImageDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"]) if file["extension"] == "image" else
TextDocument(title=f"{file['name']}.{file['extension']}", file_path=file["file_path"])
for file in files
])

async for result in pipeline:
print(result)

update_task_status(dataset_name, "DATASET_PROCESSING_FINISHED")
except Exception as error:
update_task_status(dataset_name, "DATASET_PROCESSING_ERROR")
raise error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error handling in the task execution segment.

Avoid using bare except statements to catch specific exceptions.

-                except Exception as error:
+                except Exception as e:
+                    logger.error(f"Error during task execution: {e}")

Committable suggestion was skipped due to low confidence.

Tools
Ruff

90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)

root_node_id = "ROOT"

tasks = [
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined variables hashed_user_id and user_permissions.

The variables hashed_user_id and user_permissions are used but not defined in the provided code. Ensure that these variables are correctly defined and passed to the Task instances.

-                        Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
+                        Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }, user_id = hashed_user_id, user_permissions=user_permissions), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Task(process_documents, parent_node_id = root_node_id, task_config = { "batch_size": 10 }), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
Tools
Ruff

90-90: Undefined name hashed_user_id

(F821)


90-90: Undefined name user_permissions

(F821)

from cognee.infrastructure.databases.relational.user_authentication.authentication_db import User, get_user_db, \
get_async_session, ACL
from fastapi.security import OAuth2PasswordRequestForm
from fastapi_users.authentication import JWTStrategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant import.

The JWTStrategy import on line 20 is redundant and should be removed.

- from fastapi_users.authentication import JWTStrategy
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from fastapi_users.authentication import JWTStrategy
Tools
Ruff

20-20: Redefinition of unused JWTStrategy from line 10

Remove definition: JWTStrategy

(F811)

Comment on lines 101 to 103
print(f"User created: {user.email}")
except UserAlreadyExists:
print(f"User {email} already exists")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

-        print(f"User {email} already exists")
+        logger.warning(f"User {email} already exists")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"User created: {user.email}")
except UserAlreadyExists:
print(f"User {email} already exists")
print(f"User created: {user.email}")
except UserAlreadyExists:
logger.warning(f"User {email} already exists")

async def on_after_request_verify(
self, user: User, token: str, request: Optional[Request] = None
):
print(f"Verification requested for user {user.id}. Verification token: {token}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging.

Using print statements for logging is not ideal for production code. Consider using the logging module.

-        print(f"Verification requested for user {user.id}. Verification token: {token}")
+        logger.info(f"Verification requested for user {user.id}. Verification token: {token}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"Verification requested for user {user.id}. Verification token: {token}")
logger.info(f"Verification requested for user {user.id}. Verification token: {token}")

return False
else:
return True
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid bare except statements.

Avoid using bare except statements to catch specific exceptions.

-    except:
+    except Exception as e:
+        logger.error(f"Error checking token: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except:
except Exception as e:
logger.error(f"Error checking token: {e}")
Tools
Ruff

179-179: Do not use bare except

(E722)

return token
else:
return None
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid bare except statements.

Avoid using bare except statements to catch specific exceptions.

-    except:
+    except Exception as e:
+        logger.error(f"Error creating token: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except:
except Exception as e:
logger.error(f"Error creating token: {e}")
Tools
Ruff

163-163: Do not use bare except

(E722)

)
from fastapi_users.exceptions import UserAlreadyExists
from fastapi_users.db import SQLAlchemyUserDatabase
from fastapi import Depends, HTTPException, status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant import.

The Depends import on line 14 is redundant and should be removed.

- from fastapi import Depends, HTTPException, status
+ from fastapi import HTTPException, status
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from fastapi import Depends, HTTPException, status
from fastapi import HTTPException, status
Tools
Ruff

14-14: Redefinition of unused Depends from line 5

Remove definition: Depends

(F811)

return None
return user
except Exception as e:
print(f"Error during authentication: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statements with proper logging and avoid bare except statements.

Using print statements for logging is not ideal for production code. Consider using the logging module. Additionally, avoid using bare except statements to catch specific exceptions.

-        print(f"Error during authentication: {e}")
+        logger.error(f"Error during authentication: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"Error during authentication: {e}")
logger.error(f"Error during authentication: {e}")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 709a10c and 0519986.

Files selected for processing (1)
  • .github/workflows/test_neo4j.yml (3 hunks)
Additional comments not posted (3)
.github/workflows/test_neo4j.yml (3)

31-42: LGTM! PostgreSQL service configuration is correctly added.

The services section is correctly configured to add a PostgreSQL service for the CI workflow.


59-65: LGTM! Waiting for PostgreSQL to be ready is a good practice.

The step ensures that PostgreSQL is ready before running tests, which is crucial for avoiding connection issues.


82-86: LGTM! PostgreSQL environment variables are correctly added.

The environment variables are necessary for the tests to connect to the PostgreSQL instance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Outside diff range, codebase verification and nitpick comments (3)
cognee/modules/search/graph/search_similarity.py (1)

Line range hint 3-20:
Correct the return type hint and update the docstring.

The return type hint should be list[dict] instead of list[str, str]. Additionally, update the docstring to reflect the changes.

- async def search_similarity(query: str) -> list[str, str]:
+ async def search_similarity(query: str) -> list[dict]:
    """
    Parameters:
    - query (str): The query string to filter nodes by.

    Returns:
-   - list(chunk): A list of objects providing information about the chunks related to query.
+   - list[dict]: A list of dictionaries providing information about the chunks related to query.
    """
cognee/api/v1/cognify/cognify_v2.py (1)

40-50: Update docstring to reflect the new function signature.

The function signature now includes a user parameter. Update the docstring to reflect this change.

"""
Process datasets and manage user permissions.

Args:
    datasets (Union[str, list[str]], optional): The datasets to process.
    user (User, optional): The user performing the operation.
"""
Tools
Ruff

46-46: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

cognee/api/v1/cognify/cognify.py (1)

Line range hint 61-85:
Improve error handling in the task execution segment.

Avoid using bare except statements to catch specific exceptions.

-        except Exception as error:
+        except (SQLAlchemyError, ValueError) as error:
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 0519986 and 4749995.

Files selected for processing (27)
  • cognee-frontend/src/app/page.tsx (1 hunks)
  • cognee-frontend/src/modules/datasets/cognifyDataset.ts (1 hunks)
  • cognee-frontend/src/modules/ingestion/DataView/DataView.module.css (1 hunks)
  • cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (4 hunks)
  • cognee-frontend/src/modules/ingestion/DatasetsView/DatasetsView.tsx (2 hunks)
  • cognee-frontend/src/modules/ingestion/useDatasets.ts (2 hunks)
  • cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2 hunks)
  • cognee/api/client.py (8 hunks)
  • cognee/api/v1/add/add.py (5 hunks)
  • cognee/api/v1/cognify/cognify.py (5 hunks)
  • cognee/api/v1/cognify/cognify_v2.py (4 hunks)
  • cognee/api/v1/datasets/datasets.py (1 hunks)
  • cognee/api/v1/search/search.py (4 hunks)
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1 hunks)
  • cognee/modules/data/extraction/data_summary/summarize_text_chunks.py (1 hunks)
  • cognee/modules/data/models/Data.py (1 hunks)
  • cognee/modules/data/models/Dataset.py (1 hunks)
  • cognee/modules/data/operations/get_dataset_data.py (1 hunks)
  • cognee/modules/data/operations/retrieve_datasets.py (1 hunks)
  • cognee/modules/pipelines/models/PipelineRun.py (1 hunks)
  • cognee/modules/pipelines/operations/get_pipeline_status.py (1 hunks)
  • cognee/modules/pipelines/operations/log_pipeline_status.py (1 hunks)
  • cognee/modules/search/graph/search_similarity.py (1 hunks)
  • cognee/modules/topology/topology.py (2 hunks)
  • cognee/modules/users/models/Permission.py (1 hunks)
  • cognee/modules/users/permissions/methods/check_permissions_on_documents.py (1 hunks)
  • cognee/modules/users/permissions/methods/get_document_ids_for_user.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • cognee-frontend/src/modules/ingestion/DataView/DataView.module.css
  • cognee-frontend/src/modules/ingestion/useDatasets.ts
Files skipped from review as they are similar to previous changes (10)
  • cognee-frontend/src/modules/datasets/cognifyDataset.ts
  • cognee-frontend/src/modules/ingestion/DataView/DataView.tsx
  • cognee/api/v1/datasets/datasets.py
  • cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py
  • cognee/modules/pipelines/models/PipelineRun.py
  • cognee/modules/pipelines/operations/get_pipeline_status.py
  • cognee/modules/pipelines/operations/log_pipeline_status.py
  • cognee/modules/topology/topology.py
  • cognee/modules/users/permissions/methods/check_permissions_on_documents.py
  • cognee/modules/users/permissions/methods/get_document_ids_for_user.py
Additional context used
Ruff
cognee/modules/data/models/Dataset.py

19-19: Undefined name Data

(F821)

cognee/modules/data/models/Data.py

22-22: Undefined name Dataset

(F821)

cognee/api/v1/cognify/cognify_v2.py

46-46: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

cognee/api/client.py

31-31: Module level import not at top of file

(E402)


56-57: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)

Additional comments not posted (51)
cognee/modules/data/operations/retrieve_datasets.py (1)

1-3: Imports look good.

The necessary modules for database operations are correctly imported.

cognee/modules/users/models/Permission.py (1)

1-5: Imports look good.

The necessary modules for defining the SQLAlchemy model are correctly imported.

cognee/modules/search/graph/search_similarity.py (1)

22-27: Function looks good.

The parse_payload function is well-structured and correctly extracts the required fields from the payload.

cognee/modules/data/operations/get_dataset_data.py (3)

1-5: Import statements look good.

The necessary modules are imported, including UUID, SQLAlchemy components, and project-specific modules.


6-8: Good error handling for missing parameters.

The function correctly raises a ValueError if neither dataset_id nor dataset_name is provided.


10-11: Efficient database engine retrieval.

The function retrieves the database engine efficiently using get_relational_engine().

cognee/modules/data/models/Dataset.py (5)

9-11: Correct table name definition.

The __tablename__ attribute correctly defines the table name as "datasets".


12-17: Proper column definitions.

The columns are defined correctly, including UUID primary key and datetime fields with default values.


19-22: Fix undefined name Data.

The type hint for data uses Data, which is not defined in this file. This will cause a runtime error.

- data: Mapped[List["Data"]] = relationship(
+ from .Data import Data
+ data: Mapped[List[Data]] = relationship(
Tools
Ruff

19-19: Undefined name Data

(F821)


24-31: Good JSON serialization method.

The to_json method correctly serializes the Dataset object to a JSON-compatible dictionary.


1-8: Import statements look good, but missing import for Data.

The necessary modules are imported, including UUID, typing, datetime, and SQLAlchemy components. However, the Data model is referenced but not imported.

+ from .Data import Data

Likely invalid or redundant comment.

cognee/modules/data/extraction/data_summary/summarize_text_chunks.py (5)

Line range hint 1-10:
Import statements look good.

The necessary modules are imported, including asyncio, typing, pydantic, and project-specific modules.


Line range hint 12-13:
Good initial check for empty data chunks.

The function correctly returns early if the data_chunks list is empty.


Line range hint 15-17:
Efficient use of asyncio.gather for concurrent processing.

The function uses asyncio.gather to process text chunks concurrently, which is efficient.


Line range hint 19-21:
Efficient vector engine retrieval and collection creation.

The function retrieves the vector engine and creates a collection efficiently.


Line range hint 23-32:
Good addition of document_id to payload.

The addition of document_id to the payload enhances the data structure by including the document identifier, which is crucial for tracking or referencing the source document.

cognee/modules/data/models/Data.py (4)

1-7: Imports look good.

The imports are appropriate and necessary for the functionality of the file.


9-10: Class declaration looks good.

The class declaration and table name definition are correct.


12-20: Column definitions look good.

The columns are appropriately defined with correct data types and default values.


27-37: Address the commented-out datasets field.

The to_json method is well-defined, but the datasets field is commented out. Ensure it is either included or removed based on the requirements.

-            # "datasets": [dataset.to_json() for dataset in self.datasets]
+            "datasets": [dataset.to_json() for dataset in self.datasets]
cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (2)

10-10: Review the change of text property to any.

Changing the text property to any can introduce potential type safety issues. Ensure that this change is necessary and justified.


101-103: Rendering logic looks good.

The conditional rendering logic ensures that message.text is displayed correctly based on its type, enhancing the component's robustness.

cognee/api/v1/search/search.py (4)

Line range hint 1-15:
Imports look good.

The imports are appropriate and necessary for the functionality of the file.


Line range hint 17-39:
Class declarations look good.

The class declarations are well-defined and appropriate for the file's functionality.


45-63: Function changes look good.

The enhancements to the search function improve functionality by integrating user context into the search process.


Line range hint 65-89:
Function looks good.

The specific_search function is well-defined and leverages asyncio for concurrent execution of search tasks.

cognee-frontend/src/modules/ingestion/DatasetsView/DatasetsView.tsx (2)

43-43: Update state variable to include name property.

The dataset state variable has been expanded to include both id and name properties. This change is necessary to support the updated rendering logic.


100-100: Render dataset name instead of id.

The rendering logic within the modal has been updated to display the name of the dataset instead of its id. This improves the user interface by providing more meaningful information.

cognee-frontend/src/app/page.tsx (4)

51-51: Update function parameter to include name property.

The onDatasetCognify function's parameter type has been updated to include both id and name properties. This change is necessary to support the updated notification messages.


52-52: Update notification message to use dataset name.

The notification message has been updated to use the name property of the dataset instead of the id. This improves the clarity of the notifications.


56-56: Update success notification message to use dataset name.

The success notification message has been updated to use the name property of the dataset instead of the id. This improves the clarity of the notifications.


59-59: Update failure notification message to use dataset name.

The failure notification message has been updated to use the name property of the dataset instead of the id. This improves the clarity of the notifications.

cognee/api/v1/add/add.py (6)

18-18: Introduce user parameter in add function.

The user parameter allows for user-specific operations during data processing. This change enhances the functionality by enabling permission checks and assignments.


54-54: Pass user parameter to add_files function.

The user parameter is passed to the add_files function to ensure user-specific operations are handled during file processing.


58-58: Introduce user parameter in add_files function.

The user parameter allows for user-specific operations during file processing. This change enhances the functionality by enabling permission checks and assignments.


107-107: Introduce user parameter in data_resources function.

The user parameter allows for user-specific operations during data processing. This change enhances the functionality by enabling permission checks and assignments.


153-154: Assign read and write permissions to the user.

The give_permission_on_document function is used to assign read and write permissions to the user for the processed data.


157-157: Get or create default user if none is provided.

If no user is provided, the function retrieves or creates a default user. This ensures that user-specific operations can still be performed.

cognee/api/v1/cognify/cognify_v2.py (6)

25-29: New imports for user management.

The imports for user management are correctly added. Ensure that these modules are available and correctly implemented.


35-39: LGTM! The PermissionDeniedException class is correctly defined.

The exception class is well-defined and follows standard practices.


49-50: Ensure default user retrieval is correct.

The default user is retrieved if no user is provided. Ensure that the get_default_user function is correctly implemented and handles any potential errors.


66-70: Ensure permissions are correctly checked.

The check_permissions_on_documents function is used to check user permissions. Ensure that this function is correctly implemented and handles any potential errors.


154-155: LGTM! The generate_dataset_name function is correctly implemented.

The function follows best practices and is correctly implemented.


7-7: Update import statement for get_relational_engine.

The import statement has been updated to import get_relational_engine. Ensure that this change aligns with the new database interaction approach.

Verification successful

Update import statement for get_relational_engine.

The import statement has been updated to import get_relational_engine. This change aligns with the new database interaction approach and is correctly used across the codebase.

  • cognee/api/v1/cognify/cognify_v2.py
  • cognee/modules/users/methods/get_default_user.py
  • cognee/modules/users/permissions/methods/give_permission_on_document.py
  • cognee/modules/pipelines/operations/get_pipeline_status.py
  • cognee/modules/users/permissions/methods/get_document_ids_for_user.py
  • cognee/modules/users/permissions/methods/check_permissions_on_documents.py
  • cognee/modules/pipelines/operations/log_pipeline_status.py
  • cognee/modules/data/deletion/prune_system.py
  • cognee/modules/data/operations/ensure_dataset_exists.py
  • cognee/modules/data/operations/retrieve_datasets.py
  • cognee/modules/users/methods/create_user.py
  • cognee/modules/data/operations/get_dataset_data.py
  • cognee/infrastructure/databases/relational/get_relational_engine.py
  • cognee/infrastructure/databases/relational/create_db_and_tables.py
  • cognee/infrastructure/databases/relational/__init__.py
  • cognee/api/v1/add/add.py
  • cognee/api/v1/datasets/datasets.py
  • cognee/api/v1/cognify/cognify.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `get_relational_engine` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the new function.
rg --type python -A 5 $'get_relational_engine'

Length of output: 76


Script:

#!/bin/bash
# Description: Verify the usage of `get_relational_engine` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the new function.
rg -t py -A 5 'get_relational_engine'

Length of output: 18774

cognee/api/client.py (1)

291-292: LGTM! The get_settings function is correctly implemented.

The function follows best practices and is correctly implemented.

cognee/api/v1/cognify/cognify.py (6)

49-49: Ensure create_task_status_table is correctly awaited.

The create_task_status_table function is now awaited. Ensure that this function is correctly implemented and handles any potential errors.


56-56: Ensure get_datasets is correctly awaited.

The get_datasets function is now awaited. Ensure that this function is correctly implemented and handles any potential errors.


86-94: Ensure get_dataset_data is correctly awaited.

The get_dataset_data function is now awaited. Ensure that this function is correctly implemented and handles any potential errors.


Line range hint 96-138:
Ensure file processing logic is correct.

The function process_batch processes batches of files. Ensure that the file processing logic is correctly implemented and handles any potential errors.


Line range hint 170-229:
Ensure text processing logic is correct.

The function process_text processes text chunks. Ensure that the text processing logic is correctly implemented and handles any potential errors.


33-36: Update import statement for get_relational_engine.

The import statement has been updated to import get_relational_engine. Ensure that this change aligns with the new database interaction approach.

Verification successful

Update import statement for get_relational_engine.

The import statement for get_relational_engine is correctly updated and aligns with the new database interaction approach. The function is widely used across the codebase, indicating its necessity and proper integration.

  • cognee/api/v1/cognify/cognify.py: The import and usage of get_relational_engine are consistent with the rest of the codebase.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `get_relational_engine` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the new function.
rg --type python -A 5 $'get_relational_engine'

Length of output: 76


Script:

#!/bin/bash
# Description: Verify the usage of `get_relational_engine` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the new function.
rg -A 5 'get_relational_engine'

Length of output: 18768

Comment on lines +22 to +25
datasets: Mapped[List["Dataset"]] = relationship(
secondary = DatasetData.__tablename__,
back_populates = "data"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix undefined name Dataset.

The datasets relationship references Dataset, which is not defined or imported. Change it to DatasetData to align with the import.

-    datasets: Mapped[List["Dataset"]] = relationship(
+    datasets: Mapped[List["DatasetData"]] = relationship(
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
datasets: Mapped[List["Dataset"]] = relationship(
secondary = DatasetData.__tablename__,
back_populates = "data"
)
datasets: Mapped[List["DatasetData"]] = relationship(
secondary = DatasetData.__tablename__,
back_populates = "data"
)
Tools
Ruff

22-22: Undefined name Dataset

(F821)

Comment on lines +5 to +13
async def retrieve_datasets(dataset_names: list[str]) -> list[Dataset]:
db_engine = get_relational_engine()

async with db_engine.get_async_session() as session:
datasets = (await session.scalars(
select(Dataset).filter(Dataset.name.in_(dataset_names))
)).all()

return datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling.

The function lacks error handling, which is crucial for robust database operations. Consider adding try-except blocks to handle potential exceptions.

from sqlalchemy.exc import SQLAlchemyError

async def retrieve_datasets(dataset_names: List[str]) -> List[Dataset]:
    db_engine = get_relational_engine()

    try:
        async with db_engine.get_async_session() as session:
            datasets = (await session.scalars(
                select(Dataset).filter(Dataset.name.in_(dataset_names))
            )).all()
            return datasets
    except SQLAlchemyError as e:
        # Handle the exception (e.g., log it and return an empty list or re-raise)
        print(f"Database error: {e}")
        return []

from cognee.infrastructure.databases.relational import get_relational_engine
from ..models import Dataset

async def retrieve_datasets(dataset_names: list[str]) -> list[Dataset]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the type hint for the parameter.

Use List[str] instead of list[str] for consistency with Python type hinting conventions.

- async def retrieve_datasets(dataset_names: list[str]) -> list[Dataset]:
+ async def retrieve_datasets(dataset_names: List[str]) -> List[Dataset]:
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def retrieve_datasets(dataset_names: list[str]) -> list[Dataset]:
from typing import List
async def retrieve_datasets(dataset_names: List[str]) -> List[Dataset]:

class Permission(Base):
__tablename__ = "permissions"

id = Column(UUID, primary_key = True, index = True, default = uuid4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the default value for id.

The default value for id should be a callable (i.e., uuid4 without parentheses).

- id = Column(UUID, primary_key = True, index = True, default = uuid4)
+ id = Column(UUID, primary_key = True, index = True, default = uuid4)

Committable suggestion was skipped due to low confidence.

Comment on lines +12 to +13
created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use func.now() and func.current_timestamp() for timestamps.

Replace the lambda functions for created_at and updated_at with func.now() and func.current_timestamp() from SQLAlchemy for better readability and consistency.

- created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
- updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
+ from sqlalchemy.sql import func
+ created_at = Column(DateTime(timezone = True), default = func.now())
+ updated_at = Column(DateTime(timezone = True), onupdate = func.current_timestamp())
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
created_at = Column(DateTime(timezone = True), default = lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone = True), onupdate = lambda: datetime.now(timezone.utc))
from sqlalchemy.sql import func
created_at = Column(DateTime(timezone = True), default = func.now())
updated_at = Column(DateTime(timezone = True), onupdate = func.current_timestamp())

Comment on lines +33 to +38
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate lifespan context manager.

The lifespan context manager is defined twice, which can cause conflicts.

- @asynccontextmanager
- async def lifespan(app: FastAPI):
-    # Not needed if you setup a migration system like Alembic
-    await create_db_and_tables()
-    yield
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@asynccontextmanager
async def lifespan(app: FastAPI):
# Not needed if you setup a migration system like Alembic
await create_db_and_tables()
yield

Comment on lines +31 to +32
from contextlib import asynccontextmanager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager

Committable suggestion was skipped due to low confidence.

Tools
Ruff

31-31: Module level import not at top of file

(E402)

from cognee.api.v1.users.routers import get_auth_router, get_register_router,\
get_reset_password_router, get_verify_router, get_users_router

from cognee.api.v1.permissions.get_permissions_router import get_permissions_router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from cognee.api.v1.permissions.get_permissions_router import get_permissions_router

Committable suggestion was skipped due to low confidence.

Tools
Ruff

59-59: Module level import not at top of file

(E402)

Comment on lines +56 to +57
from cognee.api.v1.users.routers import get_auth_router, get_register_router,\
get_reset_password_router, get_verify_router, get_users_router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from cognee.api.v1.users.routers import get_auth_router, get_register_router,\
-    get_reset_password_router, get_verify_router, get_users_router

Committable suggestion was skipped due to low confidence.

Tools
Ruff

56-57: Module level import not at top of file

(E402)

Comment on lines +111 to +122
@app.get("/datasets", response_model = list)
async def get_datasets():
from cognee.api.v1.datasets.datasets import datasets
return datasets.list_datasets()
try:
from cognee.api.v1.datasets.datasets import datasets
datasets = await datasets.list_datasets()

return JSONResponse(
status_code = 200,
content = [dataset.to_json() for dataset in datasets],
)
except Exception as error:
raise HTTPException(status_code = 500, detail=f"Error retrieving datasets: {str(error)}") from error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve error handling in the get_datasets function.

Avoid using bare except statements to catch specific exceptions.

-    except Exception as error:
+    except (SQLAlchemyError, ValueError) as error:
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.get("/datasets", response_model = list)
async def get_datasets():
from cognee.api.v1.datasets.datasets import datasets
return datasets.list_datasets()
try:
from cognee.api.v1.datasets.datasets import datasets
datasets = await datasets.list_datasets()
return JSONResponse(
status_code = 200,
content = [dataset.to_json() for dataset in datasets],
)
except Exception as error:
raise HTTPException(status_code = 500, detail=f"Error retrieving datasets: {str(error)}") from error
@app.get("/datasets", response_model = list)
async def get_datasets():
try:
from cognee.api.v1.datasets.datasets import datasets
datasets = await datasets.list_datasets()
return JSONResponse(
status_code = 200,
content = [dataset.to_json() for dataset in datasets],
)
except (SQLAlchemyError, ValueError) as error:
raise HTTPException(status_code = 500, detail=f"Error retrieving datasets: {str(error)}") from error

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)
.github/workflows/docker_compose.yml (1)

38-38: Ensure consistent usage of docker compose or docker-compose commands.

There are mixed references to docker compose and docker-compose in the codebase:

  • docker compose is used in docs/blog/posts/from-demo-to-production-3.md.
  • docker-compose is used in README.md.

To avoid confusion or potential errors, ensure consistent usage of either docker compose or docker-compose throughout the codebase.

  • docs/blog/posts/from-demo-to-production-3.md
  • README.md
Analysis chain

Verify compatibility with docker compose command for down.

Ensure that the new command docker compose -f docker-compose.yml down is compatible with your workflow and does not introduce any issues.


Verify consistency of docker compose command usage.

Ensure that the new command docker compose -f docker-compose.yml down is consistently used and correctly referenced in the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify compatibility with `docker compose` command for down.

# Test: Check if `docker compose` command is available. Expect: Command found.
docker compose version

Length of output: 98


Script:

#!/bin/bash
# Search for references to `docker compose` and `docker-compose` in the codebase to verify consistency and correctness.
rg "docker compose" -A 2
rg "docker-compose" -A 2

Length of output: 536

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4749995 and 896a2ce.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (6)
  • .github/workflows/docker_compose.yml (1 hunks)
  • .github/workflows/test_common.yml (2 hunks)
  • .github/workflows/test_neo4j.yml (3 hunks)
  • .github/workflows/test_qdrant.yml (1 hunks)
  • .github/workflows/test_weaviate.yml (1 hunks)
  • pyproject.toml (3 hunks)
Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test_neo4j.yml
Additional comments not posted (18)
.github/workflows/docker_compose.yml (3)

24-24: Verify compatibility with docker compose command for build.

Ensure that the new command docker compose -f docker-compose.yml build is compatible with your workflow and does not introduce any issues.


28-28: Verify compatibility with docker compose command for up.

Ensure that the new command docker compose -f docker-compose.yml up -d is compatible with your workflow and does not introduce any issues.


20-20: Verify compatibility with Docker Buildx v3.

Ensure that the new version docker/setup-buildx-action@v3 is compatible with your workflow and does not introduce any breaking changes.

.github/workflows/test_qdrant.yml (4)

Line range hint 50-72:
Verify the run_qdrant_integration_test job steps configuration.

Ensure that the job steps are correctly configured and align with best practices.

Verification successful

The run_qdrant_integration_test job steps are correctly configured and align with best practices.

  • The workflow sets up Python, installs dependencies using Poetry, and runs the integration tests with the necessary environment variables.
  • The recent failures are likely due to issues in the test script or environment, not the workflow configuration.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the `run_qdrant_integration_test` job steps configuration.

# Test: Check the job steps configuration for errors. Expect: No errors.
gh workflow view test_qdrant.yml

Length of output: 804


Script:

#!/bin/bash
# Description: View the YAML configuration of the `test_qdrant.yml` workflow.

# Command to view the YAML configuration of the workflow.
gh workflow view test_qdrant.yml --yaml

Length of output: 1558


38-49: Verify the new PostgreSQL service configuration.

Ensure that the new PostgreSQL service is correctly configured and aligns with best practices.


21-27: Verify the new setup_docker job configuration.

Ensure that the new job setup_docker is correctly configured and aligns with best practices.


32-32: Verify compatibility with ubuntu-latest environment.

Ensure that the new environment ubuntu-latest is compatible with your workflow and does not introduce any issues.

Verification successful

Environment change to ubuntu-latest is not the cause of workflow failures.

The recent failures in the workflow are due to missing configuration fields, specifically the 'username' field in the PostgresCredentials configuration. The change from macos-latest to ubuntu-latest is not the root cause of these failures.

  • Ensure that all necessary configuration values are provided for the workflow to run successfully.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify compatibility with `ubuntu-latest` environment.

# Test: Check the environment configuration for errors. Expect: No errors.
gh workflow view test_qdrant.yml

Length of output: 804


Script:

#!/bin/bash
# Description: Fetch and analyze logs of recent failed runs for the `test_qdrant.yml` workflow.

# Get the list of recent runs for the workflow
gh run list --workflow test_qdrant.yml --limit 5 --json conclusion,workflowDatabaseId,name,headSha,headBranch,url,status

# Fetch logs for the most recent failed run
gh run view 10267916576 --log

Length of output: 218035

.github/workflows/test_weaviate.yml (4)

Line range hint 50-72:
Verify the run_weaviate_integration_test job steps configuration.

Ensure that the job steps are correctly configured and align with best practices.

Verification successful

The run_weaviate_integration_test job steps are correctly configured and align with best practices.

  • The job includes steps for checking out the repository, setting up Python, installing Poetry and dependencies, creating a directory, and running the integration tests with the necessary environment variables.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the `run_weaviate_integration_test` job steps configuration.

# Test: Check the job steps configuration for errors. Expect: No errors.
gh workflow view test_weaviate.yml

Length of output: 824


Script:

#!/bin/bash
# Description: Retrieve the YAML content of the `test_weaviate.yml` workflow file.

# Fetch the YAML content of the workflow file to inspect the job steps configuration.
gh workflow view test_weaviate.yml --yaml

Length of output: 1572


32-32: Verify compatibility with ubuntu-latest environment.

Ensure that the new environment ubuntu-latest is compatible with your workflow and does not introduce any issues.


38-49: Verify the new PostgreSQL service configuration.

Ensure that the new PostgreSQL service is correctly configured and aligns with best practices.


21-27: Verify the new setup_docker job configuration.

Ensure that the new job setup_docker is correctly configured and aligns with best practices.

pyproject.toml (5)

29-29: LGTM! Fixed version for sqlalchemy ensures stability.

Changing the version constraint from ^2.0.21 to 2.0.21 helps avoid potential breaking changes from minor updates.


32-32: LGTM! Updated debugpy version ensures compatibility.

Updating from ^1.8.0 to 1.8.2 likely addresses specific bug fixes or required features.


41-41: LGTM! Updated dlt version with PostgreSQL support enhances functionality.

Updating from 0.5.1 to {extras = ["postgres"], version = "^0.5.2"} adds PostgreSQL support, improving the library's capabilities.


72-72: LGTM! Added fastapi-users for user management.

Introducing fastapi-users with sqlalchemy extras adds essential user management capabilities to the project.


73-73: LGTM! Added asyncpg for asynchronous PostgreSQL interactions.

Introducing asyncpg with version ^0.29.0 enhances the project's capability to handle asynchronous PostgreSQL database operations.

.github/workflows/test_common.yml (2)

21-26: LGTM! Added setup_docker job for multi-platform builds.

The new job sets up Docker Buildx using docker/setup-buildx-action@v3, enhancing the workflow's capability to handle multi-platform builds efficiently.


50-60: LGTM! Added postgres service for PostgreSQL database setup.

The new service uses the latest PostgreSQL image, sets up environment variables, and defines a volume for persistent storage, improving the testing environment's fidelity and functionality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (1)
cognee/api/v1/search/search.py (1)

45-47: Verify the function usage in the codebase.

The following instances of the search function calls do not include the user parameter, which is required by the new function signature. Please update these calls to match the new signature:

  • cognee/tests/test_weaviate.py
  • cognee/tests/test_qdrant.py
  • cognee/tests/test_neo4j.py
  • cognee/tests/test_library.py
  • cognee/modules/search/vector/search_traverse.py
  • cognee/modules/search/graph/search_adjacent.py
  • cognee/modules/search/graph/search_summary.py
  • cognee/modules/search/graph/search_similarity.py
  • cognee/modules/cognify/llm/resolve_cross_graph_references.py
  • cognee/infrastructure/databases/vector/vector_db_interface.py
  • cognee/infrastructure/databases/vector/falkordb/FalkorDBAdapter.py
  • cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
  • cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py
  • cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py
  • cognee/api/v1/search/search.py
  • cognee/api/v1/cognify/cognify.py
  • cognee/api/client.py
Analysis chain

Verify the function usage in the codebase.

Ensure that all function calls to search match the new signature.


Let's correct the regex pattern and re-run the search to verify all function calls to search match the new signature.


Let's correct the pattern format and use double quotes to avoid the regex parse error.


Let's try escaping the parentheses in the pattern to resolve the regex parse error.


Let's use fd to find Python files and then search for the function calls using rg.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'search('

Length of output: 98


Script:

#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 'search('

Length of output: 98


Script:

#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 "search("

Length of output: 98


Script:

#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 "search\\("

Length of output: 65


Script:

#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Find all Python files and search for the function usage.
fd --extension py | xargs rg -A 5 "search\\("

Length of output: 21025

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 896a2ce and 1c89119.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (8)
  • .github/workflows/test_common.yml (4 hunks)
  • .github/workflows/test_neo4j.yml (3 hunks)
  • .github/workflows/test_qdrant.yml (2 hunks)
  • .github/workflows/test_weaviate.yml (2 hunks)
  • cognee/api/client.py (8 hunks)
  • cognee/api/v1/search/search.py (4 hunks)
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1 hunks)
  • pyproject.toml (4 hunks)
Files skipped from review due to trivial changes (1)
  • pyproject.toml
Files skipped from review as they are similar to previous changes (4)
  • .github/workflows/test_common.yml
  • .github/workflows/test_neo4j.yml
  • .github/workflows/test_qdrant.yml
  • .github/workflows/test_weaviate.yml
Additional context used
Ruff
cognee/api/v1/search/search.py

59-59: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

cognee/api/client.py

31-31: Module level import not at top of file

(E402)


56-57: Module level import not at top of file

(E402)


59-59: Module level import not at top of file

(E402)

Additional comments not posted (10)
cognee/api/v1/search/search.py (1)

Line range hint 64-90:
LGTM!

The specific_search function is well-implemented and efficient.

cognee/api/client.py (5)

33-38: LGTM!

The lifespan context manager is well-implemented and efficient.


Line range hint 39-54:
LGTM!

The FastAPI application setup and middleware configuration are well-implemented and efficient.

Tools
Ruff

31-31: Module level import not at top of file

(E402)


61-95: LGTM!

The router inclusion for authentication and user management is well-implemented and efficient.


111-122: LGTM!

The get_datasets function is well-implemented and efficient.


291-292: LGTM!

The get_settings function and other endpoints are well-implemented and efficient.

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (4)

24-25: LGTM!

The addition of max_connection_lifetime to the AsyncGraphDatabase.driver initialization is well-implemented and efficient.


Line range hint 47-57:
LGTM!

The query method is well-implemented and efficient.


Line range hint 81-111:
LGTM!

The add_node and add_nodes methods are well-implemented and efficient.


Line range hint 211-243:
LGTM!

The add_edge and add_edges methods are well-implemented and efficient.


for search_result in search_results:
document_id = search_result["document_id"] if "document_id" in search_result else None
document_id = UUID(document_id) if type(document_id) == str else document_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use isinstance for type comparison.

Replace type(document_id) == str with isinstance(document_id, str) for better readability and performance.

-        document_id = UUID(document_id) if type(document_id) == str else document_id
+        document_id = UUID(document_id) if isinstance(document_id, str) else document_id
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
document_id = UUID(document_id) if type(document_id) == str else document_id
document_id = UUID(document_id) if isinstance(document_id, str) else document_id
Tools
Ruff

59-59: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

Comment on lines +15 to +16
from cognee.infrastructure.databases.relational import create_db_and_tables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganize import statements to the top of the file.

Imports should be at the top of the file to comply with PEP 8 standards.

- from contextlib import asynccontextmanager
- from cognee.api.v1.users.routers import get_auth_router, get_register_router,\
-    get_reset_password_router, get_verify_router, get_users_router
- from cognee.api.v1.permissions.get_permissions_router import get_permissions_router
+ from contextlib import asynccontextmanager
+ from cognee.api.v1.users.routers import get_auth_router, get_register_router, get_reset_password_router, get_verify_router, get_users_router
+ from cognee.api.v1.permissions.get_permissions_router import get_permissions_router

Also applies to: 31-32, 56-60

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1c89119 and 6d38bcd.

Files selected for processing (1)
  • .github/workflows/test_common.yml (4 hunks)
Additional comments not posted (3)
.github/workflows/test_common.yml (3)

21-34: LGTM! The setup_docker job is correctly defined.

The job sets up Docker Buildx for multi-platform builds using appropriate actions.


57-67: LGTM! The PostgreSQL service is correctly defined.

The service uses the latest PostgreSQL image, sets up environment variables, defines a volume for persistent storage, and is exposed on port 5432.


108-113: LGTM! The step to wait for PostgreSQL to be ready is correctly implemented.

The step uses the pg_isready command in a loop until the database is ready, ensuring that PostgreSQL is fully initialized before running tests.

@Vasilije1990 Vasilije1990 merged commit 80e4f29 into main Aug 6, 2024
21 of 24 checks passed
@borisarzentar borisarzentar deleted the COG-206 branch October 3, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants