Instead of erroring, clean up after dangling IDs in DB #14321

mheon · 2022-05-23T13:55:52Z

For various (mostly legacy) reasons, Podman presently maintains a unified namespace for pods and containers - IE, we cannot have both a pod and a container named "test" at the same time. To implement this, we use a global database table of every pod and container ID (and another of every pod and container name).

These entries should be added when containers/pods are added, and removed when containers/pods are removed, with the database's transactional integrity providing a guarantee that this is batched with the overall removal and that the DB should remain sane and consistent no matter what. As such, we treat a dangling ID as a hard error that stops the use of Podman.

Unfortunately, we have someone run into this last Friday. I'm still not certain how exactly their DB got into this state, but without further clarification there, we can consider removing the error and making Podman instead clean up and remove any dangling
IDs, which should restore Podman to a serviceable state. Drop an error message if we do this, though, because people should know that the DB is in a bad state.

[NO NEW TESTS NEEDED] it is deliberately impossible to produce a configuration that would test this without hex-editing the DB file.

Fixed a bug where a dangling ID in the database could render Podman unusable.

vrothberg

LGTM

openshift-ci · 2022-05-23T14:06:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mheon,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Luap99 · 2022-05-23T14:19:35Z

libpod/boltdb_state.go

+					// remove things from the table during
+					// it.
+					logrus.Errorf("Database issue: dangling ID %s found (not a pod or container) - removing", string(id))
+					toRemoveIDs = append(toRemoveIDs, string(id))


I think you have to add a continue here. Right now the logic will go to the next step which is podBkt.Get(stateKey) and will result in a nil pointer panic.

very good catch! I agree.

oh sry this is not an actual for loop, just a function so you have to return

For various (mostly legacy) reasons, Podman presently maintains a unified namespace for pods and containers - IE, we cannot have both a pod and a container named "test" at the same time. To implement this, we use a global database table of every pod and container ID (and another of every pod and container name). These entries should be added when containers/pods are added, and removed when containers/pods are removed, with the database's transactional integrity providing a guarantee that this is batched with the overall removal and that the DB should remain sane and consistent no matter what. As such, we treat a dangling ID as a hard error that stops the use of Podman. Unfortunately, we have someone run into this last Friday. I'm still not certain how exactly their DB got into this state, but without further clarification there, we can consider removing the error and making Podman instead clean up and remove any dangling IDs, which should restore Podman to a serviceable state. Drop an error message if we do this, though, because people should know that the DB is in a bad state. [NO NEW TESTS NEEDED] it is deliberately impossible to produce a configuration that would test this without hex-editing the DB file. Signed-off-by: Matthew Heon <[email protected]>

Luap99 · 2022-05-23T16:05:14Z

/lgtm
/hold

mheon · 2022-05-23T19:08:42Z

/hold cancel

mheon · 2022-09-23T14:48:30Z

/cherry-pick v3.0.1-rhel

mheon · 2022-09-23T15:33:43Z

Bot's broken, doing it manual-like

[v3.0.1-rhel] Backport #14321

openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 23, 2022

vrothberg approved these changes May 23, 2022

View reviewed changes

Luap99 reviewed May 23, 2022

View reviewed changes

mheon force-pushed the no_error_on_dangling branch from 37803d9 to 68c1035 Compare May 23, 2022 15:07

mheon force-pushed the no_error_on_dangling branch from 68c1035 to b7dbc50 Compare May 23, 2022 15:21

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2022

openshift-ci bot assigned Luap99 May 23, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 23, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2022

openshift-merge-robot merged commit 023fe23 into containers:main May 23, 2022

This was referenced May 23, 2022

Need timeout bump in containerignore-filtering-embedded-etc test #14184

Closed

timeout in sctp? forward expose? #14331

Closed

edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label May 26, 2022

edsantiago mentioned this pull request Jun 2, 2022

CI flake: secret file is not leaked into image #13417

Closed

mheon mentioned this pull request Sep 23, 2022

[v3.0.1-rhel] Backport #14321 #15914

Merged

openshift-merge-robot added a commit that referenced this pull request Oct 6, 2022

Merge pull request #15914 from mheon/backport_14321_301rhel

09c68b5

[v3.0.1-rhel] Backport #14321

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instead of erroring, clean up after dangling IDs in DB #14321

Instead of erroring, clean up after dangling IDs in DB #14321

mheon commented May 23, 2022

vrothberg left a comment

openshift-ci bot commented May 23, 2022

Luap99 May 23, 2022

vrothberg May 23, 2022

mheon May 23, 2022

Luap99 May 23, 2022

Luap99 commented May 23, 2022

mheon commented May 23, 2022

mheon commented Sep 23, 2022

mheon commented Sep 23, 2022

Instead of erroring, clean up after dangling IDs in DB #14321

Instead of erroring, clean up after dangling IDs in DB #14321

Conversation

mheon commented May 23, 2022

vrothberg left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 23, 2022

Luap99 May 23, 2022

Choose a reason for hiding this comment

vrothberg May 23, 2022

Choose a reason for hiding this comment

mheon May 23, 2022

Choose a reason for hiding this comment

Luap99 May 23, 2022

Choose a reason for hiding this comment

Luap99 commented May 23, 2022

mheon commented May 23, 2022

mheon commented Sep 23, 2022

mheon commented Sep 23, 2022