Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instead of erroring, clean up after dangling IDs in DB #14321

Merged
merged 1 commit into from
May 23, 2022

Conversation

mheon
Copy link
Member

@mheon mheon commented May 23, 2022

For various (mostly legacy) reasons, Podman presently maintains a unified namespace for pods and containers - IE, we cannot have both a pod and a container named "test" at the same time. To implement this, we use a global database table of every pod and container ID (and another of every pod and container name).

These entries should be added when containers/pods are added, and removed when containers/pods are removed, with the database's transactional integrity providing a guarantee that this is batched with the overall removal and that the DB should remain sane and consistent no matter what. As such, we treat a dangling ID as a hard error that stops the use of Podman.

Unfortunately, we have someone run into this last Friday. I'm still not certain how exactly their DB got into this state, but without further clarification there, we can consider removing the error and making Podman instead clean up and remove any dangling
IDs, which should restore Podman to a serviceable state. Drop an error message if we do this, though, because people should know that the DB is in a bad state.

[NO NEW TESTS NEEDED] it is deliberately impossible to produce a configuration that would test this without hex-editing the DB file.

Fixed a bug where a dangling ID in the database could render Podman unusable.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 23, 2022
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 23, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// remove things from the table during
// it.
logrus.Errorf("Database issue: dangling ID %s found (not a pod or container) - removing", string(id))
toRemoveIDs = append(toRemoveIDs, string(id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to add a continue here. Right now the logic will go to the next step which is podBkt.Get(stateKey) and will result in a nil pointer panic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good catch! I agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sry this is not an actual for loop, just a function so you have to return

@mheon mheon force-pushed the no_error_on_dangling branch from 37803d9 to 68c1035 Compare May 23, 2022 15:07
For various (mostly legacy) reasons, Podman presently maintains a
unified namespace for pods and containers - IE, we cannot have
both a pod and a container named "test" at the same time. To
implement this, we use a global database table of every pod and
container ID (and another of every pod and container name).

These entries should be added when containers/pods are added, and
removed when containers/pods are removed, with the database's
transactional integrity providing a guarantee that this is
batched with the overall removal and that the DB should remain
sane and consistent no matter what. As such, we treat a dangling
ID as a hard error that stops the use of Podman.

Unfortunately, we have someone run into this last Friday. I'm
still not certain how exactly their DB got into this state, but
without further clarification there, we can consider removing the
error and making Podman instead clean up and remove any dangling
IDs, which should restore Podman to a serviceable state. Drop an
error message if we do this, though, because people should know
that the DB is in a bad state.

[NO NEW TESTS NEEDED] it is deliberately impossible to produce a
configuration that would test this without hex-editing the DB
file.

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the no_error_on_dangling branch from 68c1035 to b7dbc50 Compare May 23, 2022 15:21
@Luap99
Copy link
Member

Luap99 commented May 23, 2022

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 23, 2022
@mheon
Copy link
Member Author

mheon commented May 23, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2022
@openshift-merge-robot openshift-merge-robot merged commit 023fe23 into containers:main May 23, 2022
@edsantiago edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label May 26, 2022
@mheon
Copy link
Member Author

mheon commented Sep 23, 2022

/cherry-pick v3.0.1-rhel

@mheon
Copy link
Member Author

mheon commented Sep 23, 2022

Bot's broken, doing it manual-like

openshift-merge-robot added a commit that referenced this pull request Oct 6, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants