Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop container does not exist on removal to debugf #10427

Merged
merged 1 commit into from
May 21, 2021

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented May 21, 2021

We have race conditions where a container can be removed
by two different processes when running podman --remove rm.

It can be cleaned up in the API or by the conmon executing
podman container cleanup.

When we fail to remove a container that does not exists we should
not be printing errors or warnings, we should just debug the fact.

[NO TESTS NEEDED] Since this is a race condition it is difficult to
test.

Signed-off-by: Daniel J Walsh [email protected]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2021
@rhatdan
Copy link
Member Author

rhatdan commented May 21, 2021

Fixes: #10423

@edsantiago
Copy link
Member

edsantiago commented May 21, 2021

Hold on please, I just saw:

$ while :;do ./bin/podman-remote run --rm alpine true;done
ERRO[0000] Error removing container 63170103bcc92cde7a8c3454f463def82e39196b3e588f75e183725f28433cf0: refusing to remove "63170103bcc92cde7a8c3454f463def82e39196b3e588f75e183725f28433cf0" as it exists in libpod as container 63170103bcc92cde7a8c3454f463def82e39196b3e588f75e183725f28433cf0: container already exists

Looks like

return errors.Wrapf(define.ErrCtrExists, "refusing to remove %q as it exists in libpod as container %s", idOrName, ctr.ID)

@rhatdan
Copy link
Member Author

rhatdan commented May 21, 2021

@edsantiago Try again.

@@ -645,10 +645,14 @@ func (r *Runtime) removeContainer(ctx context.Context, c *Container, force, remo
}
} else {
if err := r.state.RemoveContainer(c); err != nil {
if cleanupErr == nil {
cleanupErr = err
if err == define.ErrNoSuchCtr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still worth an error. If we got this far, we did it holding the lock - there is 0 chance the container was removed from under us. If this happens something very strange and very bad is going on - someone is messing around in the DB without obeying our locks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok some of this was just guessing. I will remove this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @mheon @edsantiago reported in the original issue that he was getting.
WARN[0000] Container 6b4d7060b9466bd7c0e806758b3c2366227ff415784541882ea324c0bfb1b9e2 does not exist: container 6b4d7060b9466bd7c0e806758b3c2366227ff415784541882ea324c0bfb1b9e2 does not exist in database: no such container

Which looks like this is happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we have a serious bug, because that should never happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I reverted, let's see if this error is reproducible, with the other fixes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pulled your latest change (6ca721c), rebuilt, and am running tests. No failures in the last two minutes, which is good. I need to be AFK periodically this morning but will check in as time allows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still running with no errors (20+ minutes). I call that good. Gotta head out now, will check back in in 90m or so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no failures. I call this good.

We have race conditions where a container can be removed
by two different processes when running podman --remove rm.

It can be cleaned up in the API or by the conmon executing
podman container cleanup.

When we fail to remove a container that does not exists we should
not be printing errors or warnings, we should just debug the fact.

[NO TESTS NEEDED] Since this is a race condition it is difficult to
test.

Signed-off-by: Daniel J Walsh <[email protected]>
@TomSweeneyRedHat
Copy link
Member

LGTM
but you're still fighting with the tests

@edsantiago
Copy link
Member

One of the flakes looks like #7139

@edsantiago
Copy link
Member

The other flake also looks like #7139

@edsantiago
Copy link
Member

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2021
@mheon
Copy link
Member

mheon commented May 21, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2021
@openshift-merge-robot openshift-merge-robot merged commit 7f4afe4 into containers:master May 21, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants