Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow force removing containers with unavailable OCI runtime #3502

Closed
wants to merge 2 commits into from

Conversation

marcov
Copy link
Collaborator

@marcov marcov commented Jul 5, 2019

When the OCI runtime associated to a container could not be found in the configuration, removing that container is not possible. Allow force removing the container by detecting when this happens.


This typically happens when you create a container with a specific runtime in libpod.conf, and later the runtime name is changed in the the conf, with outcome:

$ pd rm -f 06
Error: container 060314dd88dc22cdf6d21b8f8146dcf2481ab158c5a29f79a9ecb81cef2e6d9e was created with OCI runtime my_runtime, but that runtime is not available in the current configuration: internal libpod error

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marcov
To complete the pull request process, please assign rhatdan
You can assign the PR to them by writing /assign @rhatdan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Collaborator

Hi @marcov. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 5, 2019
marcov added 2 commits July 5, 2019 13:34
When the OCI runtime associated to a container could not be found
in the configuration, assign the default runtime to the container
and return the ErrRuntimeUnavailable error.

Signed-off-by: Marco Vedovati <[email protected]>
When the OCI runtime associated to a container could not be found in the
configuration, removing that container is not possible.
Allow force removing the container by detecting when this happens.

Signed-off-by: Marco Vedovati <[email protected]>
@marcov marcov force-pushed the runtime-unavailable branch from f85e755 to 52990c9 Compare July 5, 2019 11:34
@mheon
Copy link
Member

mheon commented Jul 5, 2019

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 5, 2019
return errors.Wrapf(define.ErrInternal, "container %s was created with OCI runtime %s, but that runtime is not available in the current configuration", ctr.ID(), ctr.config.OCIRuntime)
err = errors.Wrapf(define.ErrRuntimeUnavailable, "cannot find OCI runtime %q for container %s", ctr.config.OCIRuntime, ctr.ID())
// fall back to the default runtime to allow ctr clean up
ociRuntime = s.runtime.defaultOCIRuntime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem sane; we can't reasonably expect the default runtime to be able to do anything.

return errors.Wrapf(define.ErrInternal, "container %s was created with OCI runtime %s, but that runtime is not available in the current configuration", ctr.ID(), ctr.config.OCIRuntime)
err = errors.Wrapf(define.ErrRuntimeUnavailable, "cannot find OCI runtime %q for container %s", ctr.config.OCIRuntime, ctr.ID())
// fall back to the default runtime to allow ctr clean up
ociRuntime = s.runtime.defaultOCIRuntime
}
ctr.ociRuntime = ociRuntime
}

ctr.runtime = s.runtime
ctr.valid = valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never return a valid container if err != nil

@mheon
Copy link
Member

mheon commented Jul 5, 2019

I understand the intent here, but I don't know if trying to convince the existing remove API to work with containers which completely lack a valid runtime is sane.

We may want a new API to forcibly evict a container from the DB and storage with minimal checks. Retrieve it from the DB; if there's an error doing so, skip everything else and try removing from the DB. If we can successfully retrieve the container, remove from storage.

@marcov
Copy link
Collaborator Author

marcov commented Jul 8, 2019

thanks @mheon, you're right, I felt my changes were kindof hacky instead of a proper solution. I'll evaluate what you proposed and see if I can come back with something better.

@marcov
Copy link
Collaborator Author

marcov commented Jul 8, 2019

@mheon following your advice a possible way to handle this is:

  • Adding an evict flag to rm (better to have --force used to stop running containers)
  • Add a DB API to retrieve a container ID / config.
  • Add an Evict API to remove a container without the OCI runtime interaction using the container config retrieved with the previous API.

Any feedback?

@mheon
Copy link
Member

mheon commented Jul 8, 2019

I have some thoughts about the CLI here (I think allowing --force to be passed more than once might make sense, like multiple --verbose calls for more logging - --force --force or -ff to aggressively remove).

My thought here would be that Evict() or whatever we would choose to name the API - would take a full container ID as an argument, as opposed to a container itself. It would attempt to retrieve the container from the DB, and if that was successful, try and perform a normal removal. If any errors occurred, it would clean up as best as it could, and make absolutely certain the container was removed from the DB. If retrieving the container from the DB failed, we remove the container from the DB directly, then use the RemoveStorageContainer() API to remove it from c/storage.

@marcov
Copy link
Collaborator Author

marcov commented Jul 10, 2019

I'm closing this and continuing in a new PR to keep things cleaner: #3549

@marcov marcov closed this Jul 10, 2019
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants