-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman fails to start container after reboot if using volume #4605
Labels
kind/bug
Categorizes issue or PR as related to a bug.
locked - please file new issue/PR
Assist humans wanting to comment on an old issue or PR with locked comments.
Comments
openshift-ci-robot
added
the
kind/bug
Categorizes issue or PR as related to a bug.
label
Dec 1, 2019
@mheon PTAL |
Got it. Damn, damn, damn. Same issue as #4621 On system reboot, we're not correctly reacquiring locks for volumes, so if you delete/recreate a container, it can cause conflicts. |
We are definitely going to need a new build for RHEL once I have a patch out for this. |
(Just 8.1.1 and 8.2) |
mheon
added a commit
to mheon/libpod
that referenced
this issue
Dec 3, 2019
After a restart, pods and containers both run a refresh() function to prepare to run after a reboot. Until now, volumes have not had a similar function, because they had no per-boot setup to perform. Unfortunately, this was not noticed when in-memory locking was introduced to volumes. The refresh() routine is, among other things, responsible for ensuring that locks are reserved after a reboot, ensuring they cannot be taken by a freshly-created container, pod, or volume. If this reservation is not done, we can end up with two objects using the same lock, potentially needing to lock each other for some operations - classic recipe for deadlocks. Add a refresh() function to volumes to perform lock reservation and ensure it is called as part of overall refresh(). Fixes containers#4605 Fixes containers#4621 Signed-off-by: Matthew Heon <[email protected]>
This was referenced Dec 3, 2019
#4624 should have a fix |
mheon
added a commit
to mheon/libpod
that referenced
this issue
Dec 10, 2019
After a restart, pods and containers both run a refresh() function to prepare to run after a reboot. Until now, volumes have not had a similar function, because they had no per-boot setup to perform. Unfortunately, this was not noticed when in-memory locking was introduced to volumes. The refresh() routine is, among other things, responsible for ensuring that locks are reserved after a reboot, ensuring they cannot be taken by a freshly-created container, pod, or volume. If this reservation is not done, we can end up with two objects using the same lock, potentially needing to lock each other for some operations - classic recipe for deadlocks. Add a refresh() function to volumes to perform lock reservation and ensure it is called as part of overall refresh(). Fixes containers#4605 Fixes containers#4621 Signed-off-by: Matthew Heon <[email protected]>
github-actions
bot
added
the
locked - please file new issue/PR
Assist humans wanting to comment on an old issue or PR with locked comments.
label
Sep 23, 2023
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
kind/bug
Categorizes issue or PR as related to a bug.
locked - please file new issue/PR
Assist humans wanting to comment on an old issue or PR with locked comments.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I create volume and container to use it. I can run container multiple times without problems. But after I reboot, podman is stuck and does not start container fully or run any commands when 'podman run' is starting container.
Steps to reproduce the issue:
Buy Raspberry pi 4, Install Ubuntu 19.10 64-bit (ubuntu-server)
https://ubuntu.com/download/raspberry-pi
add total_mem=3072 to /boot/firmware/usercfg.txt according to:
https://ubuntu.com/blog/roadmap-for-official-support-for-the-raspberry-pi-4
Connect USB SSD and move root/home to there using rsync method under LVM.
Update normally. Install haveged (to get more random bytes).
Linux ubuntu 5.3.0-1012-raspi2 #14-Ubuntu SMP Mon Nov 11 10:06:55 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
Build podman locally as there is no packages available at https://launchpad.net/~projectatomic/+archive/ubuntu/ppa.
Use https://podman.io/getting-started/installation as basis.
I've used this script:
Now I should have latest podman built locally and recommended configs etc.
ubuntu@ubuntu:
$ podman --version$ runc --versionpodman version 1.6.4-dev
ubuntu@ubuntu:
runc version 1.0.0-rc9+dev
commit: 2186cfa3cd52b8e00b1de76db7859cacdf7b1f94
spec: 1.0.1-dev
Run container multiple times. Container starts and then stops automatically. And we are back to shell.
Like this:
Reboot machine.
Describe the results you received:
I started script as before boot, but it is stuck there. I can end program by C-c but for example 'podman ps' is also stuck when 'podman run' is stuck.
Describe the results you expected:
I'd expect container to run and then go back to shell.
Additional information you deem important (e.g. issue happens only occasionally):
If I remove '--volume tst-data:/data' from 'podman container run' parameters it works as excpected. Volume is untouched.
If I change container from 'debian:stretch-lite' to 'alpine' issue stays. And further still to 'fedora' and no change.
Easiest way to see if this problem is active is to run 'podman ps' in another window and if it is stuck then there is problem.
I've tried this with running container by sudo and problem is still existing.
I guess '--network host' is not needed but it slightly simplifies testing. I needed to install 'slirp4netns'
Without --network host, last line of problematic run is like:
When running under strace, last action is from main thread:
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Not installed by package manager.
Additional environment details (AWS, VirtualBox, physical, etc.):
Raspberry pi 4
Linux ubuntu 5.3.0-1012-raspi2 #14-Ubuntu SMP Mon Nov 11 10:06:55 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
My current kernel commandline is
But I've seen this issue also when I've used default cmdline.
If I remove volume it always gives error about freeing lock.
Test script works again as expected until reboot.
Volume is just:
I also tried to restore /run/usr/1000 after reboot, but it did not work.
The text was updated successfully, but these errors were encountered: