Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix ubuntu 24.04 segmentation fault #422

Merged
merged 4 commits into from
Jan 13, 2025
Merged

Conversation

dexters1
Copy link
Collaborator

@dexters1 dexters1 commented Jan 9, 2025

Current testing seems to suggest issues with running Cognee on Ubuntu 24.04 when using SQLite. When running Cognee on Ubuntu 24.04 with PostgreSQL there are no segmentation fault issues.

Summary by CodeRabbit

  • Infrastructure

    • Updated GitHub Actions workflows to use ubuntu-latest across multiple workflow files
    • Added a new trigger_deployment job in the continuous deployment workflow
  • Code Optimization

    • Updated SQLAlchemy query loading strategy in user retrieval method from joinedload to selectinload

Copy link
Contributor

coderabbitai bot commented Jan 9, 2025

Walkthrough

The pull request encompasses a comprehensive update to GitHub Actions workflow configurations across multiple files. The primary change involves systematically replacing ubuntu-22.04 with ubuntu-latest for job runner environments across various workflow files. Additionally, there are minor modifications in specific workflow files, such as adding a deployment trigger job in cd.yaml and adjusting the output handling in cd_prd.yaml. A singular code change in the get_default_user.py file involves switching the SQLAlchemy loading strategy from joinedload to selectinload.

Changes

File Change Summary
.github/workflows/* Updated runs-on from ubuntu-22.04 to ubuntu-latest across multiple workflow files
.github/workflows/cd.yaml Added trigger_deployment job dependent on publish_docker_to_ecr
.github/workflows/cd_prd.yaml Updated cognee_image_tag output reference
cognee/modules/users/methods/get_default_user.py Changed SQLAlchemy loading strategy from joinedload to selectinload

Poem

🐰 Workflows hopping to the latest beat,
Ubuntu's newest runner, oh so neat!
Runners updated, environments shine,
Code leaps forward, everything's fine!
A rabbit's CI/CD delight! 🚀

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

gitguardian bot commented Jan 9, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@dexters1 dexters1 marked this pull request as ready for review January 13, 2025 14:14
@dexters1 dexters1 changed the title test: Test ubuntu 24.04 test: Fix ubuntu 24.04 segmentation fault Jan 13, 2025
@dexters1 dexters1 changed the title test: Fix ubuntu 24.04 segmentation fault fix: Fix ubuntu 24.04 segmentation fault Jan 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
.github/workflows/py_lint.yml (1)

Remove outdated linting commands as the project uses a different toolchain now

The commented linting commands are obsolete as the project has moved to a different set of tools:

  • mypy is still used but should be uncommented as it's an active dependency
  • flake8 and bandit have been replaced by ruff and pylint
  • black and isort can be removed as ruff handles formatting
🔗 Analysis chain

Line range hint 63-75: Consider cleaning up or restoring commented linting commands.

There are several commented-out linting commands. If these are no longer needed, consider removing them. If they are still relevant, consider restoring them to maintain code quality checks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if these linting tools are still in use elsewhere
# Note: This helps determine if we should restore these checks

echo "Checking for mypy usage..."
rg --type yaml 'mypy' .github/workflows/

echo "Checking for flake8 usage..."
rg --type yaml 'flake8' .github/workflows/

echo "Checking for bandit usage..."
rg --type yaml 'bandit' .github/workflows/

Length of output: 580


Script:

#!/bin/bash
# Check for linting tool configurations and dependencies

echo "Checking for config files..."
fd -t f "mypy.ini|.flake8|pyproject.toml|.bandit"

echo -e "\nChecking content of pyproject.toml if exists..."
if [ -f "pyproject.toml" ]; then
    echo "=== Dependencies section ==="
    rg -A 5 "\[tool.poetry.dependencies\]|\[build-system\]" pyproject.toml
fi

echo -e "\nChecking for requirements files..."
fd "requirements.*\.txt"

Length of output: 791


Script:

#!/bin/bash
# Check dev dependencies and documentation

echo "Checking dev dependencies in pyproject.toml..."
rg -A 10 "\[tool.poetry.dev-dependencies\]|\[tool.poetry.group.dev.dependencies\]" pyproject.toml

echo -e "\nChecking documentation for linting mentions..."
fd -t f "README|CONTRIBUTING" -E "*.pyc"

Length of output: 745

.github/workflows/cd_prd.yaml (1)

Line range hint 89-89: Fix inconsistent step ID in outputs.

The outputs section references steps.export-promethai-docker-tag but the step ID is export-cognee-docker-tag. This mismatch will cause the output to be undefined.

-      cognee_image_tag: ${{ steps.export-promethai-docker-tag.outputs.cognee_image_tag }}
+      cognee_image_tag: ${{ steps.export-cognee-docker-tag.outputs.cognee_image_tag }}
🧹 Nitpick comments (4)
.github/workflows/test_qdrant.yml (1)

20-20: LGTM! Consider adding SQLite verification step.

The update to ubuntu-latest is good. Since this PR addresses SQLite issues, consider adding a verification step to ensure that Qdrant integration tests don't trigger SQLite-related segmentation faults when SQLite is used as the metadata store.

.github/workflows/test_weaviate.yml (2)

20-20: LGTM! Consider adding SQLite verification step.

The update to ubuntu-latest is good. Since this PR addresses SQLite issues, consider adding a verification step to ensure that Weaviate integration tests don't trigger SQLite-related segmentation faults when SQLite is used as the metadata store.


Line range hint 1-1: Consider adding dedicated SQLite integration tests.

To ensure long-term stability with SQLite on Ubuntu 24.04:

  1. Add a dedicated SQLite integration test workflow
  2. Document the SQLite compatibility improvements and known limitations
  3. Consider adding SQLite-specific CI badges to indicate compatibility status

Would you like me to help create a dedicated SQLite integration test workflow or documentation?

.github/workflows/test_pgvector.yml (1)

20-20: Consider adding parallel SQLite testing workflow

Since this workflow successfully tests PostgreSQL (which works on Ubuntu 24.04), consider creating a parallel workflow for SQLite testing with similar structure to:

  1. Isolate and reproduce the reported segmentation faults
  2. Compare behavior between Ubuntu 22.04 and 24.04
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9ddcaf and 32d7b07.

📒 Files selected for processing (23)
  • .github/workflows/cd.yaml (1 hunks)
  • .github/workflows/cd_prd.yaml (1 hunks)
  • .github/workflows/ci.yaml (1 hunks)
  • .github/workflows/community_greetings.yml (1 hunks)
  • .github/workflows/docker_compose.yml (1 hunks)
  • .github/workflows/dockerhub.yml (1 hunks)
  • .github/workflows/profiling.yaml (1 hunks)
  • .github/workflows/py_lint.yml (1 hunks)
  • .github/workflows/release_discord_action.yml (1 hunks)
  • .github/workflows/reusable_notebook.yml (1 hunks)
  • .github/workflows/reusable_python_example.yml (1 hunks)
  • .github/workflows/ruff_format.yaml (1 hunks)
  • .github/workflows/ruff_lint.yaml (1 hunks)
  • .github/workflows/test_deduplication.yml (1 hunks)
  • .github/workflows/test_milvus.yml (1 hunks)
  • .github/workflows/test_neo4j.yml (1 hunks)
  • .github/workflows/test_pgvector.yml (1 hunks)
  • .github/workflows/test_python_3_10.yml (1 hunks)
  • .github/workflows/test_python_3_11.yml (1 hunks)
  • .github/workflows/test_python_3_12.yml (1 hunks)
  • .github/workflows/test_qdrant.yml (1 hunks)
  • .github/workflows/test_weaviate.yml (1 hunks)
  • cognee/modules/users/methods/get_default_user.py (2 hunks)
✅ Files skipped from review due to trivial changes (10)
  • .github/workflows/test_milvus.yml
  • .github/workflows/dockerhub.yml
  • .github/workflows/release_discord_action.yml
  • .github/workflows/ruff_lint.yaml
  • .github/workflows/ruff_format.yaml
  • .github/workflows/test_deduplication.yml
  • .github/workflows/ci.yaml
  • .github/workflows/test_python_3_10.yml
  • .github/workflows/community_greetings.yml
  • .github/workflows/profiling.yaml
🔇 Additional comments (9)
.github/workflows/docker_compose.yml (1)

15-15: Verify Docker Compose tests with SQLite on Ubuntu 24.04.

While updating to ubuntu-latest will help test on Ubuntu 24.04, ensure that the Docker Compose tests specifically validate the SQLite functionality to catch any potential segmentation faults.

Let's check if SQLite is being tested in the Docker setup:

.github/workflows/test_neo4j.yml (1)

18-18: Verify Neo4j compatibility with Ubuntu 24.04

While updating to ubuntu-latest will help test newer Ubuntu versions, ensure that the Neo4j dependencies and Python packages are compatible with Ubuntu 24.04.

Consider adding explicit test coverage for both SQLite and PostgreSQL database backends in this workflow to help identify the reported segmentation faults.

.github/workflows/reusable_python_example.yml (1)

25-25: Review Python 3.12.x compatibility with SQLite

The workflow uses Python 3.12.x, which might affect SQLite compatibility differently than other Python versions. Consider:

  1. Testing with multiple Python versions
  2. Adding specific SQLite tests to verify the reported segmentation faults
.github/workflows/reusable_notebook.yml (1)

25-25: Test database operations in notebooks

Since notebooks might contain database operations, ensure that:

  1. Notebooks using SQLite are tested on both Ubuntu versions
  2. Any segmentation faults during notebook execution are properly caught and reported
.github/workflows/test_python_3_12.yml (1)

20-20: 🛠️ Refactor suggestion

Enhance SQLite testing in comprehensive test suite

This comprehensive test workflow is ideal for investigating the SQLite segmentation faults. Consider:

  1. Adding specific SQLite test cases in both unit and integration tests
  2. Using environment variables to toggle between SQLite and PostgreSQL backends
  3. Adding error handling to capture and report segmentation faults

Run the following script to check for existing SQLite tests:

.github/workflows/test_python_3_11.yml (1)

20-20: 🛠️ Refactor suggestion

Consider using explicit Ubuntu version for reproducing SQLite issues.

Since the PR aims to address SQLite segmentation faults specifically on Ubuntu 24.04, using ubuntu-latest might not consistently reproduce the issue as the latest version may change. Consider explicitly using ubuntu-24.04 to ensure reliable test results.

-    runs-on: ubuntu-latest
+    runs-on: ubuntu-24.04
.github/workflows/cd.yaml (1)

20-20: LGTM! Using latest Ubuntu for CD workflow is appropriate.

For CD workflows that don't involve SQLite testing, using ubuntu-latest is acceptable as it ensures we get the latest security updates and features.

.github/workflows/py_lint.yml (1)

19-19: LGTM! Using latest Ubuntu for linting is appropriate.

For linting workflows, using ubuntu-latest is acceptable as it ensures we get the latest security updates and features.

.github/workflows/cd_prd.yaml (1)

20-20: LGTM! Using latest Ubuntu for production CD workflow is appropriate.

For CD workflows that don't involve SQLite testing, using ubuntu-latest is acceptable as it ensures we get the latest security updates and features.

@@ -1,4 +1,4 @@
from sqlalchemy.orm import joinedload
from sqlalchemy.orm import selectinload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Consider updating other joinedload instances for SQLite compatibility

Similar to the change in get_default_user.py, other files using joinedload might benefit from switching to selectinload to prevent SQLite segmentation faults:

  • cognee/modules/users/methods/get_user.py: Uses joinedload(User.groups) - identical case to the fixed one
  • cognee/modules/users/permissions/methods/check_permission_on_documents.py: joinedload(ACL.resources)
  • cognee/modules/data/methods/create_dataset.py and SqlAlchemyAdapter.py: Both use joinedload(Dataset.data)
🔗 Analysis chain

Good choice using selectinload to address SQLite segmentation faults.

The change from joinedload to selectinload is a good solution for SQLite compatibility issues. While joinedload uses LEFT OUTER JOINs that can trigger SQLite limitations, selectinload uses separate SELECT statements which are more reliable with SQLite.

Let's verify if there are other instances of joinedload that might need similar updates:

Also applies to: 14-14

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other instances of joinedload that might need updating
rg "joinedload" --type py

Length of output: 926

@dexters1
Copy link
Collaborator Author

For some reason unknown to me using the joinedload for getting the default user would cause a segmentation fault in Ubuntu 24.04, problem is solved by using the selectinload option for joining data

@Vasilije1990 Vasilije1990 merged commit a77a87e into dev Jan 13, 2025
25 checks passed
@Vasilije1990 Vasilije1990 deleted the test-ubuntu-24.04 branch January 13, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants