Allow force removing containers with unavailable OCI runtime #3502

marcov · 2019-07-05T11:13:10Z

When the OCI runtime associated to a container could not be found in the configuration, removing that container is not possible. Allow force removing the container by detecting when this happens.

This typically happens when you create a container with a specific runtime in libpod.conf, and later the runtime name is changed in the the conf, with outcome:

$ pd rm -f 06
Error: container 060314dd88dc22cdf6d21b8f8146dcf2481ab158c5a29f79a9ecb81cef2e6d9e was created with OCI runtime my_runtime, but that runtime is not available in the current configuration: internal libpod error

openshift-ci-robot · 2019-07-05T11:13:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marcov
To complete the pull request process, please assign rhatdan
You can assign the PR to them by writing /assign @rhatdan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2019-07-05T11:13:29Z

Hi @marcov. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

When the OCI runtime associated to a container could not be found in the configuration, assign the default runtime to the container and return the ErrRuntimeUnavailable error. Signed-off-by: Marco Vedovati <[email protected]>

When the OCI runtime associated to a container could not be found in the configuration, removing that container is not possible. Allow force removing the container by detecting when this happens. Signed-off-by: Marco Vedovati <[email protected]>

mheon · 2019-07-05T19:11:03Z

/ok-to-test

mheon · 2019-07-05T19:14:17Z

libpod/boltdb_state_internal.go

-			return errors.Wrapf(define.ErrInternal, "container %s was created with OCI runtime %s, but that runtime is not available in the current configuration", ctr.ID(), ctr.config.OCIRuntime)
+			err = errors.Wrapf(define.ErrRuntimeUnavailable, "cannot find OCI runtime %q for container %s", ctr.config.OCIRuntime, ctr.ID())
+			// fall back to the default runtime to allow ctr clean up
+			ociRuntime = s.runtime.defaultOCIRuntime


This doesn't seem sane; we can't reasonably expect the default runtime to be able to do anything.

mheon · 2019-07-05T19:14:40Z

libpod/boltdb_state_internal.go

-			return errors.Wrapf(define.ErrInternal, "container %s was created with OCI runtime %s, but that runtime is not available in the current configuration", ctr.ID(), ctr.config.OCIRuntime)
+			err = errors.Wrapf(define.ErrRuntimeUnavailable, "cannot find OCI runtime %q for container %s", ctr.config.OCIRuntime, ctr.ID())
+			// fall back to the default runtime to allow ctr clean up
+			ociRuntime = s.runtime.defaultOCIRuntime
 		}
 		ctr.ociRuntime = ociRuntime
 	}

 	ctr.runtime = s.runtime
 	ctr.valid = valid


We should never return a valid container if err != nil

mheon · 2019-07-05T19:17:16Z

I understand the intent here, but I don't know if trying to convince the existing remove API to work with containers which completely lack a valid runtime is sane.

We may want a new API to forcibly evict a container from the DB and storage with minimal checks. Retrieve it from the DB; if there's an error doing so, skip everything else and try removing from the DB. If we can successfully retrieve the container, remove from storage.

marcov · 2019-07-08T08:55:04Z

thanks @mheon, you're right, I felt my changes were kindof hacky instead of a proper solution. I'll evaluate what you proposed and see if I can come back with something better.

marcov · 2019-07-08T17:54:43Z

@mheon following your advice a possible way to handle this is:

Adding an evict flag to rm (better to have --force used to stop running containers)
Add a DB API to retrieve a container ID / config.
Add an Evict API to remove a container without the OCI runtime interaction using the container config retrieved with the previous API.

Any feedback?

mheon · 2019-07-08T18:16:34Z

I have some thoughts about the CLI here (I think allowing --force to be passed more than once might make sense, like multiple --verbose calls for more logging - --force --force or -ff to aggressively remove).

My thought here would be that Evict() or whatever we would choose to name the API - would take a full container ID as an argument, as opposed to a container itself. It would attempt to retrieve the container from the DB, and if that was successful, try and perform a normal removal. If any errors occurred, it would clean up as best as it could, and make absolutely certain the container was removed from the DB. If retrieving the container from the DB failed, we remove the container from the DB directly, then use the RemoveStorageContainer() API to remove it from c/storage.

marcov · 2019-07-10T17:36:24Z

I'm closing this and continuing in a new PR to keep things cleaner: #3549

openshift-ci-robot requested review from mrunalp and TomSweeneyRedHat July 5, 2019 11:13

openshift-ci-robot added the size/M label Jul 5, 2019

openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 5, 2019

marcov added 2 commits July 5, 2019 13:34

libpod: assign container the default OCI runtime

66d6807

When the OCI runtime associated to a container could not be found in the configuration, assign the default runtime to the container and return the ErrRuntimeUnavailable error. Signed-off-by: Marco Vedovati <[email protected]>

marcov force-pushed the runtime-unavailable branch from f85e755 to 52990c9 Compare July 5, 2019 11:34

openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 5, 2019

mheon reviewed Jul 5, 2019

View reviewed changes

marcov closed this Jul 10, 2019

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow force removing containers with unavailable OCI runtime #3502

Allow force removing containers with unavailable OCI runtime #3502

marcov commented Jul 5, 2019

openshift-ci-robot commented Jul 5, 2019

openshift-ci-robot commented Jul 5, 2019

mheon commented Jul 5, 2019

mheon Jul 5, 2019

mheon Jul 5, 2019

mheon commented Jul 5, 2019

marcov commented Jul 8, 2019

marcov commented Jul 8, 2019

mheon commented Jul 8, 2019

marcov commented Jul 10, 2019

Allow force removing containers with unavailable OCI runtime #3502

Allow force removing containers with unavailable OCI runtime #3502

Conversation

marcov commented Jul 5, 2019

openshift-ci-robot commented Jul 5, 2019

openshift-ci-robot commented Jul 5, 2019

mheon commented Jul 5, 2019

mheon Jul 5, 2019

Choose a reason for hiding this comment

mheon Jul 5, 2019

Choose a reason for hiding this comment

mheon commented Jul 5, 2019

marcov commented Jul 8, 2019

marcov commented Jul 8, 2019

mheon commented Jul 8, 2019

marcov commented Jul 10, 2019