fix: Fix ubuntu 24.04 segmentation fault #422

dexters1 · 2025-01-09T17:42:43Z

Current testing seems to suggest issues with running Cognee on Ubuntu 24.04 when using SQLite. When running Cognee on Ubuntu 24.04 with PostgreSQL there are no segmentation fault issues.

Summary by CodeRabbit

Infrastructure
- Updated GitHub Actions workflows to use ubuntu-latest across multiple workflow files
- Added a new trigger_deployment job in the continuous deployment workflow
Code Optimization
- Updated SQLAlchemy query loading strategy in user retrieval method from joinedload to selectinload

coderabbitai · 2025-01-09T17:42:50Z

Walkthrough

The pull request encompasses a comprehensive update to GitHub Actions workflow configurations across multiple files. The primary change involves systematically replacing ubuntu-22.04 with ubuntu-latest for job runner environments across various workflow files. Additionally, there are minor modifications in specific workflow files, such as adding a deployment trigger job in cd.yaml and adjusting the output handling in cd_prd.yaml. A singular code change in the get_default_user.py file involves switching the SQLAlchemy loading strategy from joinedload to selectinload.

Changes

File	Change Summary
`.github/workflows/*`	Updated `runs-on` from `ubuntu-22.04` to `ubuntu-latest` across multiple workflow files
`.github/workflows/cd.yaml`	Added `trigger_deployment` job dependent on `publish_docker_to_ecr`
`.github/workflows/cd_prd.yaml`	Updated `cognee_image_tag` output reference
`cognee/modules/users/methods/get_default_user.py`	Changed SQLAlchemy loading strategy from `joinedload` to `selectinload`

Poem

🐰 Workflows hopping to the latest beat,
Ubuntu's newest runner, oh so neat!
Runners updated, environments shine,
Code leaps forward, everything's fine!
A rabbit's CI/CD delight! 🚀

Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gitguardian · 2025-01-09T17:55:30Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

.github/workflows/py_lint.yml (1)
Remove outdated linting commands as the project uses a different toolchain now

The commented linting commands are obsolete as the project has moved to a different set of tools:

mypy is still used but should be uncommented as it's an active dependency

flake8 and bandit have been replaced by ruff and pylint

black and isort can be removed as ruff handles formatting

🔗 Analysis chain

Line range hint 63-75: Consider cleaning up or restoring commented linting commands.

There are several commented-out linting commands. If these are no longer needed, consider removing them. If they are still relevant, consider restoring them to maintain code quality checks.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check if these linting tools are still in use elsewhere
# Note: This helps determine if we should restore these checks

echo "Checking for mypy usage..."
rg --type yaml 'mypy' .github/workflows/

echo "Checking for flake8 usage..."
rg --type yaml 'flake8' .github/workflows/

echo "Checking for bandit usage..."
rg --type yaml 'bandit' .github/workflows/
Length of output: 580

Script:
#!/bin/bash
# Check for linting tool configurations and dependencies

echo "Checking for config files..."
fd -t f "mypy.ini|.flake8|pyproject.toml|.bandit"

echo -e "\nChecking content of pyproject.toml if exists..."
if [ -f "pyproject.toml" ]; then
    echo "=== Dependencies section ==="
    rg -A 5 "\[tool.poetry.dependencies\]|\[build-system\]" pyproject.toml
fi

echo -e "\nChecking for requirements files..."
fd "requirements.*\.txt"
Length of output: 791

Script:
#!/bin/bash
# Check dev dependencies and documentation

echo "Checking dev dependencies in pyproject.toml..."
rg -A 10 "\[tool.poetry.dev-dependencies\]|\[tool.poetry.group.dev.dependencies\]" pyproject.toml

echo -e "\nChecking documentation for linting mentions..."
fd -t f "README|CONTRIBUTING" -E "*.pyc"
Length of output: 745
.github/workflows/cd_prd.yaml (1)
Line range hint 89-89: Fix inconsistent step ID in outputs.

The outputs section references steps.export-promethai-docker-tag but the step ID is export-cognee-docker-tag. This mismatch will cause the output to be undefined.
-      cognee_image_tag: ${{ steps.export-promethai-docker-tag.outputs.cognee_image_tag }}
+      cognee_image_tag: ${{ steps.export-cognee-docker-tag.outputs.cognee_image_tag }}

🧹 Nitpick comments (4)

.github/workflows/test_qdrant.yml (1)

20-20: LGTM! Consider adding SQLite verification step.

The update to ubuntu-latest is good. Since this PR addresses SQLite issues, consider adding a verification step to ensure that Qdrant integration tests don't trigger SQLite-related segmentation faults when SQLite is used as the metadata store.

.github/workflows/test_weaviate.yml (2)

20-20: LGTM! Consider adding SQLite verification step.

The update to ubuntu-latest is good. Since this PR addresses SQLite issues, consider adding a verification step to ensure that Weaviate integration tests don't trigger SQLite-related segmentation faults when SQLite is used as the metadata store.

Line range hint 1-1: Consider adding dedicated SQLite integration tests.

To ensure long-term stability with SQLite on Ubuntu 24.04:

Add a dedicated SQLite integration test workflow

Document the SQLite compatibility improvements and known limitations

Consider adding SQLite-specific CI badges to indicate compatibility status

Would you like me to help create a dedicated SQLite integration test workflow or documentation?

.github/workflows/test_pgvector.yml (1)

20-20: Consider adding parallel SQLite testing workflow

Since this workflow successfully tests PostgreSQL (which works on Ubuntu 24.04), consider creating a parallel workflow for SQLite testing with similar structure to:

Isolate and reproduce the reported segmentation faults

Compare behavior between Ubuntu 22.04 and 24.04

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9ddcaf and 32d7b07.

📒 Files selected for processing (23)

.github/workflows/cd.yaml (1 hunks)
.github/workflows/cd_prd.yaml (1 hunks)
.github/workflows/ci.yaml (1 hunks)
.github/workflows/community_greetings.yml (1 hunks)
.github/workflows/docker_compose.yml (1 hunks)
.github/workflows/dockerhub.yml (1 hunks)
.github/workflows/profiling.yaml (1 hunks)
.github/workflows/py_lint.yml (1 hunks)
.github/workflows/release_discord_action.yml (1 hunks)
.github/workflows/reusable_notebook.yml (1 hunks)
.github/workflows/reusable_python_example.yml (1 hunks)
.github/workflows/ruff_format.yaml (1 hunks)
.github/workflows/ruff_lint.yaml (1 hunks)
.github/workflows/test_deduplication.yml (1 hunks)
.github/workflows/test_milvus.yml (1 hunks)
.github/workflows/test_neo4j.yml (1 hunks)
.github/workflows/test_pgvector.yml (1 hunks)
.github/workflows/test_python_3_10.yml (1 hunks)
.github/workflows/test_python_3_11.yml (1 hunks)
.github/workflows/test_python_3_12.yml (1 hunks)
.github/workflows/test_qdrant.yml (1 hunks)
.github/workflows/test_weaviate.yml (1 hunks)
cognee/modules/users/methods/get_default_user.py (2 hunks)

✅ Files skipped from review due to trivial changes (10)

.github/workflows/test_milvus.yml
.github/workflows/dockerhub.yml
.github/workflows/release_discord_action.yml
.github/workflows/ruff_lint.yaml
.github/workflows/ruff_format.yaml
.github/workflows/test_deduplication.yml
.github/workflows/ci.yaml
.github/workflows/test_python_3_10.yml
.github/workflows/community_greetings.yml
.github/workflows/profiling.yaml

🔇 Additional comments (9)

.github/workflows/docker_compose.yml (1)

15-15: Verify Docker Compose tests with SQLite on Ubuntu 24.04.

While updating to ubuntu-latest will help test on Ubuntu 24.04, ensure that the Docker Compose tests specifically validate the SQLite functionality to catch any potential segmentation faults.

Let's check if SQLite is being tested in the Docker setup:

.github/workflows/test_neo4j.yml (1)

18-18: Verify Neo4j compatibility with Ubuntu 24.04

While updating to ubuntu-latest will help test newer Ubuntu versions, ensure that the Neo4j dependencies and Python packages are compatible with Ubuntu 24.04.

Consider adding explicit test coverage for both SQLite and PostgreSQL database backends in this workflow to help identify the reported segmentation faults.

.github/workflows/reusable_python_example.yml (1)

25-25: Review Python 3.12.x compatibility with SQLite

The workflow uses Python 3.12.x, which might affect SQLite compatibility differently than other Python versions. Consider:

Testing with multiple Python versions

Adding specific SQLite tests to verify the reported segmentation faults

.github/workflows/reusable_notebook.yml (1)

25-25: Test database operations in notebooks

Since notebooks might contain database operations, ensure that:

Notebooks using SQLite are tested on both Ubuntu versions

Any segmentation faults during notebook execution are properly caught and reported

.github/workflows/test_python_3_12.yml (1)

20-20: 🛠️ Refactor suggestion

Enhance SQLite testing in comprehensive test suite

This comprehensive test workflow is ideal for investigating the SQLite segmentation faults. Consider:

Adding specific SQLite test cases in both unit and integration tests

Using environment variables to toggle between SQLite and PostgreSQL backends

Adding error handling to capture and report segmentation faults

Run the following script to check for existing SQLite tests:
.github/workflows/test_python_3_11.yml (1)

20-20: 🛠️ Refactor suggestion

Consider using explicit Ubuntu version for reproducing SQLite issues.

Since the PR aims to address SQLite segmentation faults specifically on Ubuntu 24.04, using ubuntu-latest might not consistently reproduce the issue as the latest version may change. Consider explicitly using ubuntu-24.04 to ensure reliable test results.
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-24.04
.github/workflows/cd.yaml (1)

20-20: LGTM! Using latest Ubuntu for CD workflow is appropriate.

For CD workflows that don't involve SQLite testing, using ubuntu-latest is acceptable as it ensures we get the latest security updates and features.

.github/workflows/py_lint.yml (1)

19-19: LGTM! Using latest Ubuntu for linting is appropriate.

For linting workflows, using ubuntu-latest is acceptable as it ensures we get the latest security updates and features.

.github/workflows/cd_prd.yaml (1)

20-20: LGTM! Using latest Ubuntu for production CD workflow is appropriate.

For CD workflows that don't involve SQLite testing, using ubuntu-latest is acceptable as it ensures we get the latest security updates and features.

coderabbitai · 2025-01-13T14:24:36Z

cognee/modules/users/methods/get_default_user.py

@@ -1,4 +1,4 @@
-from sqlalchemy.orm import joinedload
+from sqlalchemy.orm import selectinload


💡 Codebase verification

Consider updating other joinedload instances for SQLite compatibility

Similar to the change in get_default_user.py, other files using joinedload might benefit from switching to selectinload to prevent SQLite segmentation faults:

cognee/modules/users/methods/get_user.py: Uses joinedload(User.groups) - identical case to the fixed one

cognee/modules/users/permissions/methods/check_permission_on_documents.py: joinedload(ACL.resources)

cognee/modules/data/methods/create_dataset.py and SqlAlchemyAdapter.py: Both use joinedload(Dataset.data)

🔗 Analysis chain

Good choice using selectinload to address SQLite segmentation faults.

The change from joinedload to selectinload is a good solution for SQLite compatibility issues. While joinedload uses LEFT OUTER JOINs that can trigger SQLite limitations, selectinload uses separate SELECT statements which are more reliable with SQLite.

Let's verify if there are other instances of joinedload that might need similar updates:

Also applies to: 14-14

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for other instances of joinedload that might need updating rg "joinedload" --type py

Length of output: 926

dexters1 · 2025-01-13T14:25:34Z

For some reason unknown to me using the joinedload for getting the default user would cause a segmentation fault in Ubuntu 24.04, problem is solved by using the selectinload option for joining data

test: Test ubuntu 24.04

46fec64

dexters1 added run-checks do not merge labels Jan 9, 2025

test: change all actions to ubuntu-latest

ff6d8cb

dexters1 force-pushed the test-ubuntu-24.04 branch from 4c83d8a to ff6d8cb Compare January 13, 2025 13:58

fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault

0ce2339

dexters1 removed the do not merge label Jan 13, 2025

Merge branch 'dev' into test-ubuntu-24.04

32d7b07

dexters1 marked this pull request as ready for review January 13, 2025 14:14

dexters1 requested review from Vasilije1990 and borisarzentar January 13, 2025 14:18

dexters1 changed the title ~~test: Test ubuntu 24.04~~ test: Fix ubuntu 24.04 segmentation fault Jan 13, 2025

dexters1 changed the title ~~test: Fix ubuntu 24.04 segmentation fault~~ fix: Fix ubuntu 24.04 segmentation fault Jan 13, 2025

coderabbitai bot reviewed Jan 13, 2025

View reviewed changes

Vasilije1990 approved these changes Jan 13, 2025

View reviewed changes

Vasilije1990 merged commit a77a87e into dev Jan 13, 2025
25 checks passed

Vasilije1990 deleted the test-ubuntu-24.04 branch January 13, 2025 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix ubuntu 24.04 segmentation fault #422

fix: Fix ubuntu 24.04 segmentation fault #422

dexters1 commented Jan 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

gitguardian bot commented Jan 9, 2025 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot Jan 13, 2025

dexters1 commented Jan 13, 2025

		@@ -1,4 +1,4 @@
		from sqlalchemy.orm import joinedload
		from sqlalchemy.orm import selectinload

fix: Fix ubuntu 24.04 segmentation fault #422

fix: Fix ubuntu 24.04 segmentation fault #422

Conversation

dexters1 commented Jan 9, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 9, 2025 • edited Loading

Walkthrough

Changes

Poem

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

gitguardian bot commented Jan 9, 2025 • edited Loading

️✅ There are no secrets present in this pull request anymore.

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 13, 2025

Choose a reason for hiding this comment

dexters1 commented Jan 13, 2025

dexters1 commented Jan 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

gitguardian bot commented Jan 9, 2025 •

edited

Loading