sqlite: Error: beginning transaction: database is locked #18356

edsantiago · 2023-04-26T14:31:52Z

First: yes, this is after #18339. Seen just now in f37 rootless:

  podman container restart running container
...
$ podman [options] ps -q
Error: beginning transaction: database is locked

Why is it always podman ps that triggers this flake?

The text was updated successfully, but these errors were encountered:

edsantiago · 2023-05-02T14:43:41Z

Current list of flake instances

debian-12 : int podman debian-12 rootless host sqlite
- 05-01 10:58 in Podman restart [It] podman restart running container
fedora-37 : int podman fedora-37 rootless host sqlite
- 04-26 09:28 in Podman restart [It] podman container restart running container
rawhide : int podman rawhide rootless host sqlite
- 04-26 13:47 in Podman restart [It] podman container restart running container

vrothberg · 2023-05-03T08:27:09Z

@edsantiago, can you try the following patch? It's (again) dropping WAL mode. Let's see if the database is locked flake comes back again.

diff --git a/libpod/sqlite_state.go b/libpod/sqlite_state.go
index d435f10727c6..e433c0c09ca0 100644
--- a/libpod/sqlite_state.go
+++ b/libpod/sqlite_state.go
@@ -44,7 +44,7 @@ const (
        // Assembled sqlite options used when opening the database.
        sqliteOptions = "db.sql?" +
                sqliteOptionLocation +
-               sqliteOptionJournal +
+//             sqliteOptionJournal +
                sqliteOptionSynchronous +
                sqliteOptionForeignKeys +
                sqliteOptionTXLock

edsantiago · 2023-05-03T13:47:07Z

@vrothberg the flake triggers infrequently, but FWIW one pass of #17831 just completed without seeing this flake. I just started another run.

vrothberg · 2023-05-03T13:49:42Z

Thanks a lot, Ed!

edsantiago · 2023-05-04T02:35:02Z

Seven CI runs today on #17831, lots of flakes, but did not see the transaction one. Tests do seem to be running slower, O(40m), it could just be slow VMs.

vrothberg · 2023-05-04T06:48:56Z

@mheon WAL mode seems to play a role in the lock errors. Since you are keen on keeping WAL mode, I hand it over to you.

mheon · 2023-05-04T12:46:58Z

@edsantiago @vrothberg Mind running for a few more days? I'd like to be sure about this before we make changes, we've thought we had this one locked down before and it's still here.

edsantiago · 2023-05-05T02:36:07Z

I think we're up to fourteen CI runs, still haven't seen the locked flake. [EDIT: yes, I'll keep hammering away at it]

edsantiago · 2023-05-08T00:04:03Z

Thirty-five CI runs with @vrothberg's patch: no sqlite-locked failures.

I removed the patch. First CI run passed! Failed on the second:

$ podman [options] ps -q
Error: beginning transaction: database is locked

[EDIT: it failed twice, actually, the first one above was f38 rootless, this second one I didn't notice was rawhide root]

vrothberg · 2023-05-09T06:27:19Z

I think we have two options:

Disable WAL mode.
If possible, address the locking issues.

I let @mheon decide since he owns the sqlite backend.

mheon · 2023-05-09T13:17:54Z

I spent a large portion of friday attempting to identify the specific cause of the bug. Per the docs, it should only occur on concurrent read/write from within a single process to a single table (https://www2.sqlite.org/cvstrac/wiki?p=DatabaseIsLocked) but absolutely none of this explains how a locked error would occur on the start of a transaction... I went to far as to try and trace what was accessing the database (to not much success, on-update hooks are a little complex apparently).

At this point, let's rip it out until we can figure out what is actually going on.

As shown in containers#17831, WAL mode plays a role in causing `database is locked` errors. Those are errors, in theory, should not happen as the DB should busy wait. mattn/go-sqlite3/issues/274 has some comments indicating that the busy handler behaves differently in WAL mode which may be an explanation to the error. For now, let's disable WAL mode and only re-enable it when we have clearer understanding of what's going on. The upstream issue along with the SQLite documentation do not give me the clear guidance that I would need. [NO NEW TESTS NEEDED] - flake is only reproducible in CI. Fixes: containers#18356 Signed-off-by: Valentin Rothberg <[email protected]>

edsantiago · 2023-05-09T23:06:54Z

Ugh. I'm sorry. I'm really, really sorry.

not ok 123 podman start --filter invalid-restart-policy - return error
 ...
# podman run -d quay.io/libpod/testimage:20221018 /bin/true
d3be2e45fb326ee8afa117698395995c2f9fb2961ae347ff315395ccf339d6b1
# podman start --filter restart-policy=fakepolicy d3be2e45fb326ee8afa117698395995c2f9fb2961ae347ff315395ccf339d6b1
Error: beginning transaction: database is locked
[ rc=125 (expected) ]
#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: CID of restart-policy=<not-exists> container
#| expected: 'Error: fakepolicy invalid restart policy'
#|   actual: 'Error: beginning transaction: database is locked'
#\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is f38 aarch64 root, in my hammer-hammer PR, fully rebased on main as of this afternoon, 287a419, which I 100% guarantee includes #18356.

The other failures were in int tests, always in podman ps, and (due to the nature of int tests) on a relatively fresh setup with fresh roots and all that. This is test 123 in system tests, not a freshly-initialized DB by any means. And no contention: sys tests are not parallel. I don't know what this means, and don't have the brains to pursue it at this hour.

Also, #18356 is identical (except for a comment) to what I tested all last week and this weekend. Over forty CI runs, and I never saw this.

mheon · 2023-05-10T01:05:18Z

Next theory: set SQLite max connections to 1.

vrothberg · 2023-05-10T07:20:17Z

I have a love-hate-relationship with your thorough testing, @edsantiago :) Please don't excuse for that. It's invaluable!

There is still something we don't understand and that's making me nervous. In theory, the busy handler should kick in when the database is locked.

edsantiago · 2023-06-07T17:09:43Z

Last seen on May 9 despite many-times-per-day testing on #17831. Does anyone have an explanation for how/why this failure has disappeared?

vrothberg · 2023-06-12T07:17:38Z

The only explanation I could find is bumping the sqlite bindings in commit 92309d9. But that is a month after May 9, so I'd expect the flake to have occurred in between.

github-actions · 2023-08-10T00:07:20Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-08-10T14:40:47Z

The last instance was May 9. I haven't been hammering on #17831 as much the past month, but even so we're still talking O(100) runs since then.

I don't want to be the one to close this without understanding the reason, but I will not object to someone else closing it.

vrothberg · 2023-08-10T14:45:22Z

Thank you, Ed! This (i.e., all the data points) is super helpful! I can take the blame for closing. If it flakes again, we can re-open.

edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. priority/high bugweek labels Apr 26, 2023

edsantiago mentioned this issue May 2, 2023

Ed's pet PR with no flake retries #17831

Draft

vrothberg mentioned this issue May 9, 2023

sqlite: disable WAL mode #18519

Merged

openshift-merge-robot closed this as completed in #18519 May 9, 2023

edsantiago reopened this May 9, 2023

github-actions bot added the stale-issue label Aug 10, 2023

vrothberg closed this as completed Aug 10, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 9, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlite: Error: beginning transaction: database is locked #18356

sqlite: Error: beginning transaction: database is locked #18356

edsantiago commented Apr 26, 2023

edsantiago commented May 2, 2023

vrothberg commented May 3, 2023

edsantiago commented May 3, 2023

vrothberg commented May 3, 2023

edsantiago commented May 4, 2023

vrothberg commented May 4, 2023

mheon commented May 4, 2023

edsantiago commented May 5, 2023 •

edited

Loading

edsantiago commented May 8, 2023 •

edited

Loading

vrothberg commented May 9, 2023

mheon commented May 9, 2023

edsantiago commented May 9, 2023

mheon commented May 10, 2023

vrothberg commented May 10, 2023 •

edited

Loading

edsantiago commented Jun 7, 2023

vrothberg commented Jun 12, 2023

github-actions bot commented Aug 10, 2023

edsantiago commented Aug 10, 2023

vrothberg commented Aug 10, 2023 •

edited

Loading

sqlite: Error: beginning transaction: database is locked #18356

sqlite: Error: beginning transaction: database is locked #18356

Comments

edsantiago commented Apr 26, 2023

edsantiago commented May 2, 2023

vrothberg commented May 3, 2023

edsantiago commented May 3, 2023

vrothberg commented May 3, 2023

edsantiago commented May 4, 2023

vrothberg commented May 4, 2023

mheon commented May 4, 2023

edsantiago commented May 5, 2023 • edited Loading

edsantiago commented May 8, 2023 • edited Loading

vrothberg commented May 9, 2023

mheon commented May 9, 2023

edsantiago commented May 9, 2023

mheon commented May 10, 2023

vrothberg commented May 10, 2023 • edited Loading

edsantiago commented Jun 7, 2023

vrothberg commented Jun 12, 2023

github-actions bot commented Aug 10, 2023

edsantiago commented Aug 10, 2023

vrothberg commented Aug 10, 2023 • edited Loading

edsantiago commented May 5, 2023 •

edited

Loading

edsantiago commented May 8, 2023 •

edited

Loading

vrothberg commented May 10, 2023 •

edited

Loading

vrothberg commented Aug 10, 2023 •

edited

Loading