-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rarely Error: getting pod's service container: container {ID} not found in DB: no such container
#16964
Comments
Error: getting pod's service container: container {ID} not found in DB: no such container
Thanks for reaching out. Can you share a reproducer? |
Thank you for your repsonse. Sadly I don't know how it happened each time. In the end it would be helpful to be able to rm and restart pods which are in created states forever because a service container with a specific id is missing. |
I assume you have created the pod via |
Together with systemD, yes. Here is the file Maybe it's helpful, but since it only happens rarely because probably there is some kind of race condition I don't know. |
Thanks, @MartinX3. Are you seeing the error when stopping the service via systemctl or when removing the pod manually? |
Sadly I don't remember. |
Regardless of how we reproduce it I think the first step for podman/libpod/runtime_pod_common.go Lines 320 to 322 in 28d04bc
This will make the pod rm command work AFAICT. |
Care to open a PR? |
I don't think that's the right approach at the current time. We don't have a reproducer. If a pod was created with a service container and the service container was removed before the pod, then there is a bug OR the user removed it (which they should not). Unless we have a reproducer, I am against patching symptoms. |
Ah, but it seems the pod cannot be removed in that case. In that case, I think the pod should be removed but the error of the service container should be returned? In case that's a too intrusive change, it should at least be error logged. |
Well in theory the podman process could be killed after podman/libpod/runtime_pod_common.go Line 325 in 28d04bc
Since the next pod rm will always fail because of the missing service container it will leave the db in a very bad state. I don't even think we should log this, at this point the user wants to remove the pod so why should the user care about such warning. The pod is gone afterwards anyway. You can reproduce this easily by removing the service container manually:
If the service container is that critical to the pod it podman should not allow it to be removed without the pod. |
See my other comment. It either means there's a bug or that a user manually removed the service container (which they should not) - or something got killed. Logging seems the right way to me. Ignoring such errors (I call them symptoms) will also hide them from tests which can negatively impact quality since we don't know/see the errors. A service container is only used when executed in systemd. In that case, systemd should manage the deployments (not the user). |
To avoid duplicate work. @Luap99 do you want to tackle the issue or shall I? |
@vrothberg Please take it. |
Do not allow for removing the service container unless all associated pods have been removed. Previously, the service container could be removed when all pods have exited which can lead to a number of issues. Now, the service container is treated like an infra container and can only be removed along with the pods. Also make sure that a pod is unlinked from the service container once it's being removed. Fixes: containers#16964 Signed-off-by: Valentin Rothberg <[email protected]>
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Maybe connected to #12034
I can't remove, start or restart the pod.
It just tell me, that it's missing the service container with a specific id.
My workaround is to edit the database file
~/.local/share/containers/storage/libpod/bolt_state.db
and replace this id by the id of a running service container and then remove the pod and start it again with systemD.On removing I get the error
Error: freeing pod 59c9006b753b60141d73d23e1f42ef0fae794a45e3c6315a27faee9db1bf930a lock: no such file or directory
, but I can ignore it and just recreate the database pod.The service container I stole the ID from still works fine.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Removed pod.
Additional information you deem important (e.g. issue happens only occasionally):
happens rarely
Output of
podman version
:Output of
podman info
:Package info (e.g. output of
rpm -q podman
orapt list podman
orbrew info podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
physical
The text was updated successfully, but these errors were encountered: