-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
power-loss while creating containers may leave podman (storage) in a broken state #8005
Comments
@mheon PTAL |
may be related to #7941 |
This looks like the symlink recreation issue that we have seen elsewhere in
c/storage (it first cropped up in CRI-O).
…On Tue, Oct 13, 2020, 07:30 Alexander Wellbrock ***@***.***> wrote:
may be related to #7941 <#7941>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8005 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCGV3SMBNAHP5GYZWYTSKQ263ANCNFSM4SOSZZSQ>
.
|
Unfortunately, I don't think we've ever had a good explanation as to the exact cause of this. I know a fix was landed earlier (around a year ago) when CRI-O hit this, but it appears that it did not fully resolve the issue. |
Unlikely to be related to #7941 |
Is there a way to workaround this broken state without clearing the podman storage with I've several deployments of podman in the field on a low-bandwith or cost-by-byte connection and would like to keep downloaded images and still recover from this broken state. Any ideas? EDIT: Might be related to #5986 - at least there seems to be a valid work-around using read-only fs: #5986 (comment) |
Can you remove the image, that caused the issue? |
I can remove all images together, or one by one. I actually don't know which image I would have to try. As far as I can tell right now, it's independent of any particular image. I did not test the whole thing with each image one by one to be fair. I have to see when I'll be free to try that |
Id you do We had a similar experience and found that we can edit ./.local/share/containers/storage/overlay-images/images.json We are looking into making this work with podman rmi IMAGE, but have to figure out how to clean up the image store. |
A friendly reminder that this issue had no activity for 30 days. |
I'd like to contribute to this issue, since I am also experiencing this behavior and would love to help fixing it. I am able to reproduce it consistently on a Raspberry Pi 4 running Arch Linux ARM with Podman 2.2.0-rc2 which I built from source. The Raspberry Pi is currently not needed for anything important, I may fiddle around with it in any way you want me to, to help fixing this problem. So let me describe my setup in more detail and give you as much information as possible: I cannot give you the Containerfile with everything else you'd need to recreate my image, but it's providing a complete work environment including Qt, OpenCV, different compilers, programming IDEs, browsers etc. The built image is huge, around 15 GB of size. It's also using systemd, so the entrypoint is set to To force this issue, I am simply running a container based on that image with:
While that container is running, I unplug the power cable of the Raspberry Pi. In about 1 of 3 times that breaks the Podman environment. Being in that state, I executed the following commands: [attk@jonny-raspberry ~]$ sudo podman system prune
WARNING! This will remove:
- all stopped containers
- all stopped pods
- all dangling images
- all build cache
Are you sure you want to continue? [y/N] y
Deleted Pods
Deleted Containers
Deleted Images [attk@jonny-raspberry ~]$ sudo podman image prune
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N] y [attk@jonny-raspberry ~]$ sudo podman container prune
WARNING! This will remove all non running containers.
Are you sure you want to continue? [y/N] y [attk@jonny-raspberry ~]$ sudo podman container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [attk@jonny-raspberry ~]$ sudo podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/auv latest 7937f0d3891c About an hour ago 4.92 GB
docker.io/polygamma/archlinux_arm_generic_aarch64 latest 53c4eb44e69e 3 days ago 1.4 GB I stripped the Containerfile down a bit, to not have to wait so long during building, which is why it's only about 5GB of size here. [attk@jonny-raspberry ~]$ sudo podman run -i -t --rm --name auv --privileged --network='host' --ipc='host' --systemd='true' --volume=/lib/modules:/lib/modules:ro auv:latest
Error: readlink /var/lib/containers/storage/overlay/l/74I7Y5UEZAA3GKIJJJSSGEMXNQ: no such file or directory The Raspberry Pi is still in exactly that state, so if you need more information, I can provide it. @rhatdan Do you have any idea on how to progress with this problem? |
This is the same issue as #8437 |
A friendly reminder that this issue had no activity for 30 days. |
Let's concentrate on #8347 |
to create a corrupted storage, we can use a reproducer like:
The machine is powered off. On the next reboot, if it still boots :-) ...
|
If you attempt to run a container on that image, what error do you see? |
|
We have seen these type of errors many times, which gives us a clue on what could cause it. If you just do podman rmi fedora, or better yet podman pull fedora, does that cleanup your images? |
I am sporadically seeing this issue as well on my Linux device when I pull power unexpectedly. I have a 'base image' on a seperate partition that I spin up a podman container against on boot using systemd. When this failure happens, the container fails to launch and I have to 'podman rmi' my base image and reload it from scratch to use my container. I haven't been able to figure out how to work around removing the image itself and recover the container. |
I'm also starting to see this issue on some containers run by systemd. Again, unexpected power disconnect. |
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
If power-loss occurs in a small time window while creating containers with podman, the container storage is broken and no containers can be started nor created anymore. Only a
podman system prune -a
seems to resolve the issue while all other prune commands don't.Steps to reproduce the issue (maybe in general):
Steps to reproduce the issue (specifically):
Following are the specific steps with regards to my actual setup. This might make a difference, since the Raspberry Pi 3B+ has few resources which causes image pull and container creation to take some time (especially when starting 5 containers in parallel) which could widen the time window for corruption.
Describe the results you received:
After successful reboot after the power-loss all podman container units fail to start with the following error message:
Describe the results you expected:
I expect all containers to be created normally. My systemd units remove any left-over containers before attempting to create the new ones. This should work in any case, even on power loss. Podman should not enter a state where I have to manually issue a
podman system prune -a
or other intervention when something fails at container creation.Additional information you deem important (e.g. issue happens only occasionally):
I'm starting 5 containers in parallel, which slows down container creation quite a bit on a raspberry Pi 3B+ which could widen a potential time window for corruption.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
I'm using the aarch64 variant on a Raspberry Pi 3B+ (limited resources) running Fedora IoT 32. The containers are created automatically on boot via systemd units. The units first try to remove any existing container via optional command and then run a podman container command with --systemd flag.
The text was updated successfully, but these errors were encountered: