Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite: set busy timeout to 100s #20838

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Nov 29, 2023

Only one process can write to the sqlite db at the same time, if another process tries to use it at that time it fails and a database is locked error is returned. If this happens sqlite should keep retrying until it can write. To do that we can just set the _busy_timeout option. A 100s timeout should be enough even on slower systems but not to much in case there is a deadlock so it still returns in a reasonable time.

[NO NEW TESTS NEEDED] I think we strongly need to consider some form of parallel stress testing to catch bugs like this.

Fixes #20809

Does this PR introduce a user-facing change?

Fix `database is locked` errors with the new sqlite database backend.

Only one process can write to the sqlite db at the same time, if another
process tries to use it at that time it fails and a database is locked
error is returned. If this happens sqlite should keep retrying until it
can write. To do that we can just set the _busy_timeout option. A 100s
timeout should be enough even on slower systems but not to much in case
there is a deadlock so it still returns in a reasonable time.

[NO NEW TESTS NEEDED] I think we strongly need to consider some form of
parallel stress testing to catch bugs like this.

Fixes containers#20809

Signed-off-by: Paul Holzinger <[email protected]>
@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 29, 2023
@Luap99
Copy link
Member Author

Luap99 commented Nov 29, 2023

@mheon @vrothberg @edsantiago PTAL

@edsantiago
Copy link
Member

edsantiago commented Nov 29, 2023

Now testing in #17831

PS thank you

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believed that one of the options above took care of setting a default busy value already but I may be wrong.

Changes LGTM. Did you manage to manually reproduce and validate the fix is solving it?

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Nov 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Luap99
Copy link
Member Author

Luap99 commented Nov 29, 2023

Did you manage to manually reproduce and validate the fix is solving it?

Yes using the reproducer from the issue.

@mheon
Copy link
Member

mheon commented Nov 29, 2023

I'm going to drop a
/lgtm
/hold

Even if this does turn out to not be a complete fix for some reason I see no harm in merging it.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 29, 2023
@giuseppe
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit 5da1790 into containers:main Nov 29, 2023
92 checks passed
@Luap99 Luap99 deleted the sqlite-timeout branch November 29, 2023 19:29
@jpalus
Copy link

jpalus commented Nov 29, 2023

One question though -- what "writing to the database" mean exactly? Is it like single update statement, single transaction, single "connection"? In other words what needs to fit within those 100s? If I have a big container with lots of files and I do podman rm on it which takes > 100s does that mean no write can occur before it, but only afterwards?

@mheon
Copy link
Member

mheon commented Nov 29, 2023

From the SQLite locking model, this should be a single transaction. Connections are allowed to be concurrent, locks are only taken on write transactions.

@Luap99
Copy link
Member Author

Luap99 commented Nov 30, 2023

One question though -- what "writing to the database" mean exactly? Is it like single update statement, single transaction, single "connection"? In other words what needs to fit within those 100s? If I have a big container with lots of files and I do podman rm on it which takes > 100s does that mean no write can occur before it, but only afterwards?

To be clear here the lock is only taken for db writes, those should generally be fast. We do not have the db locked while we delete the container storage for example. The db stores only "metadata" so the data should not be to big and operations should be fast.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Feb 29, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

save transaction: database is locked on podman exec
6 participants