-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'podman start <toolbx-container>' fails with 'setrlimit RLIMIT_NPROC
: Operation not permitted: OCI permission denied'
#19634
Comments
Also failed to clone: acheong@insignificantv5 ~ [125]> podman --log-level debug container clone dev dev1
INFO[0000] podman filtering at log level debug
DEBU[0000] Called clone.PersistentPreRunE(podman --log-level debug container clone dev dev1)
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/home/acheong/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/home/acheong/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/containers
DEBU[0000] Using static dir /var/home/acheong/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /var/home/acheong/.local/share/containers/storage/volumes
DEBU[0000] Using transient store: false
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] Cached value indicated that overlay is supported
DEBU[0000] Cached value indicated that overlay is supported
DEBU[0000] Cached value indicated that metacopy is not being used
DEBU[0000] Cached value indicated that native-diff is usable
DEBU[0000] backingFs=btrfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false
DEBU[0000] Initializing event backend journald
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument
DEBU[0000] Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument
DEBU[0000] Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument
DEBU[0000] Configured OCI runtime runc initialization failed: no valid executable found for OCI runtime runc: invalid argument
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument
DEBU[0000] Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument
DEBU[0000] Using OCI runtime "/usr/bin/crun"
INFO[0000] Setting parallel job count to 25
DEBU[0000] Looking up image "997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f" in local containers storage
DEBU[0000] Trying "997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f" ...
DEBU[0000] parsed reference into "[overlay@/var/home/acheong/.local/share/containers/storage+/run/user/1000/containers]@997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f"
DEBU[0000] Found image "997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f" as "997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f" in local containers storage
DEBU[0000] Found image "997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f" as "997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f" in local containers storage ([overlay@/var/home/acheong/.local/share/containers/storage+/run/user/1000/containers]@997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f)
DEBU[0000] Inspecting image 997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f
DEBU[0000] exporting opaque data as blob "sha256:997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f"
DEBU[0000] exporting opaque data as blob "sha256:997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f"
DEBU[0000] exporting opaque data as blob "sha256:997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f"
DEBU[0000] Inspecting image 997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f
DEBU[0000] Inspecting image 997b52ccbf8544c42851a181e80bcd0f081eff8a879256b67d273a7e07f31f6f
Error: invalid config provided: cannot set shmsize when running in the {host } IPC Namespace
DEBU[0000] Shutting down engines |
This comment was marked as off-topic.
This comment was marked as off-topic.
This seems to be the logs from a successful |
Looks like #18696, and from what I can tell it looks like the only solution is to recreate the container. |
Yes you need to check ulimit -u before and after the reboot. Podman 4.5 and early hard code the ulimt at create time so if it changes to a lower value then it will not work and following starts will fail. The fix #18721 makes so that we now apply the ulimit at start time so it should work all the time but that requires the container to be created with 4.6 or newer so going forward you shoul dnot see that issue again. |
Sadly, as I mentioned in the report, that's a deal breaker for Toolbx containers. :/ There's a Fedora 39 Change to treat the Toolbx stack as a release blocker. I think one of the test criteria is about preexisting containers continuing to work. |
That's not going to work for Toolbx. Can you please re-open? |
Backward compatibility with existing data (container instances are data) is important to preserve during upgrade. Podman is effectively asking customers to kill the existing data in order to upgrade. This is pretty bad and should be avoided if possible. |
There is nothing we can do here realistically, the ulimits were added to the container spec at create time with 4.5 and earlier so it is now impossible to tell how they got set (by a user or just the default). So once we start the container the runtime will try to apply the configured ulimit and when your limits used to be higher at create time then this will fail. So you either recreate the container or make sure the ulimit does not change (i.e. in /etc/security/limits.conf). The fact that is was implemented like that is unfortunate but it used to be like that for years without issue. Just recently it seems that nproc limit was lowered resulting in this bug. As said going forward this is fixed, containers created with 4.6 should not run into this bug. |
Toolbx containers always have the
Asking people to recreate a Toolbx container is a deal breaker. Maybe the |
I see that there's a maze of issues and pull requests related to this problem, but I am unable to figure out the root cause for this Podman change. Was there a pressing issue that couldn't be fixed in any other way? I am asking, because one way to address this can be to (temporarily?) revert the Podman change and offer a way to create new containers that don't have the ulimits-in-the-container-spec problem. Then, after sufficient time has passed and we can assume that most pet containers out there are new enough to not have this problem, we can restore the Podman change. |
I'm by no means an expert here, and trust that someone will correct me if what I say below is wrong.
This is not a Podman change. Clarification: the issue you are seeing has nothing to do with any podman changes. It has to do with the fact that your system rebooted with The 4.6 podman change, IIUC, is that on container creation podman will no longer store those limits. Unless explicitly requested. So this problem shouldn't recur on future reboots. Obviously that does not time-travel back and help already-created containers. Perhaps there's some sort of |
I tried a few things that didn't work: $ cat /etc/security/limits.conf
* hard nproc 62703
* soft nproc 62703 The same thing is in |
this worked$ podman export $CONTAINER_NAME -o output.tar
$ podman import output.tar $NEW_IMAGE_NAME
$ podman container rm $CONTAINER_NAME
$ toolbox create --image localhost/$NEW_IMAGE_NAME -i $CONTAINER_NAME |
No data loss |
Well, who brokered the deal with whom? Podman containers are certainly no long-term data storage. Toolbx uses (and advertises) them for a specific purpose, and changes quite a few of the standard podman options when it creates containers. Is there no way Toolbx could reset the ulimit for an existing container? After all, it would be "fair" to ask users to operate on Toolbx containers using Toolbx (rather then podman, if it fails), and Toolbx is able to recognize Toolbx containers as such (i.e. distinguish them from non-Toolbx containers). Maybe, as a middle ground, podman could offer an option to migrate old containers (by resetting the ulimit setting or clearing it), and leave it for Toolbx (or the user) to decide when and if they use it? |
Do we know what value changed? Can it be changed back in limits.conf? |
at least a warning and prompt should be shown. However, the normal use-case should not be to have long persistent data inside a podman container. |
Since no data is lost, I suppose it counts as migration? |
After you made those changes did you fully log out and log back in, to make sure your login process had those settings? $ ulimit -a Podman is not doing anything special other then attempting to set the ulimits for the container. |
Yup It's not getting applied for some reason but that has nothing to do with podman. |
If the ulimint -a call is not showing the change, then there is nothing podman can fix. |
hmmm, i was hit with this too today, and tried the above @acheong08 trick to export the image and create new one. I believe the
|
Is the echo intentional?
Try |
@gptlang no, echo was for me seing the parameterd are right. I copied the wrong line here. Command was without it. In a meanwhile, I got tip to specify the ulimit in /etc/security/limits.d/50-podman-ulimits.conf. That didn't help either. Now it says:
|
Okay! Thanks for summarizing that so clearly. To verify, I got rid of I have no trivial way to find out what the ulimits were on a traditional package based Linux distribution. I suppose I could use my Fedora Silverblue machine to figure that out. For what it's worth, the current values are:
... and:
I do think there's something Podman can do avoid the problem for Toolbx containers. See my comment above. It's easy to identify a Toolbx container, and for those Podman could handle the failure to set the ulimits more softly.
Sadly, that won't help the (surprisingly large number of) Toolbx users out there. For a lot of people the Toolbx environment is their primary interactive command-line shell. It's unsettling when that stops working suddenly. On top of that, if it's not trivially obvious to people like us, who are paid to work on this full-time, how to restore the old ulimit values or what caused them to change in the first place, then imagine how hard would it be for the random user out there to find out the workaround. |
I tried to explain this before. However, since folks are trying to, somewhat emphatically, state the opposite, I will risk repeating myself by responding to:
... and this:
Arguing over whether there's data loss or not, gets close to playing with semantics. Of course, there's nothing catastrophic like It's also not about pointing fingers at anyone. We need to find a way forward to recover from this problem. Maybe it requires reverting whatever changed the ulimits? Maybe it requires something else in the stack to handle those changes more gracefully? Toolbx containers are by definition long-lived pet containers for continued interactive use, not short-lived service containers. Many people use them as their development environment, and some even as their primary interactive command-line shell. It's not fun if, out of the blue, the CLI shell of your choice (eg., Bash or Z shell) refuses to offer a prompt or your chosen editor (eg., Emacs or Vim) refuses to start. Note that this problem doesn't just affect unstable development distributions like Fedora Rawhide, but also stable ones where such things are not expected to happen. Sometimes, the loss of a development environment can be a big loss, even if it can be salvaged, because time is a factor. We shouldn't be designing operating systems where users need to factor in the possibility that their CLI shell may suddenly refuse to work. At a time when different groups of people are trying to ship OSTree based OSes, from Endless OS to the different Fedora variants to GNOME OS, the stability of Toolbx environments is crucial. |
Yes, I rebooted after my
I get:
... and:
I wonder why you have
Maybe it shouldn't do that for containers with the |
Umm... I am unsure about what you mean, but maybe:
I don't know of any way to do that other than significantly side-stepping Toolbx actually uses
Like I said before, we shouldn't be asking users to do anything. We need to find a way where the existing tools are able to sort it out on their own.
If |
@debarshiray it's clear that you have deep concerns about this situation -- but it's equally clear that this is complex and will not be resolved by commenting on a closed github issue. If you think this is critical enough for PM to intervene, please file a BZ and try to escalate. If you think this is something Toolbx can address via special case, I encourage you to look into that option. Or perhaps if there is a simple recipe for users to solve this on their own, you could add a solution here, make it the last comment in the thread, and we can lock the issue. Then affected people can websearch, find this, scroll down, and be happy. I'm not sure what other options are available. Thank you for your concern and for understanding. |
Sadly, I am also concerned about this approach of closing the issue in a hurry without any meaningful discussion or understanding; and then saying that there's no point commenting on a closed github issue; and that I should escalate. It's odd. I also don't see why this problem is particularly complex. I offered one mechanism that Podman could use to avoid this problem, and I never heard back. Maybe it is complex, but I have no idea why.
I don't know of any way to do that other than significantly side-stepping |
There is nothing about semantics. By no data loss, it meant that nothing (not even packages and stuff) installed in the old podman/toolbox containers are lost. All you do is export the data and import them again with the updated config. I was panicked when toolbox stopped working because it took some effort to set up all my development stuff (vscode, build tools, miscellaneous utilities) which would take me a few hours to replace. I also had an important podman container I kept WIP projects not uploaded anywhere. I tried this solution and it got back my containers just fine. As far as I'm aware, not a single file was lost. |
It would be nice if podman provided a way to automatically migrate from a deprecated config to a working one |
What does this have to do with toolbox? It uses podman under the hood and this issue affects all containers. There is nothing special with toolbox that requires a different solution |
I think everyone has a responsibility here. I am going to stop allowing comments on this issue temporarily and discuss this with the team. |
Issue Description
Preexisting Toolbx containers can no longer be started after a
dnf update
on Fedora 38 Workstation.Highlights from the
dnf update
:crun-1.8.5-1.fc38.x86_64
tocrun-1.8.6-1.fc38.x86_64
podman-5:4.5.1-1.fc38.x86_64
topodman-5:4.6.0-1.fc38.x86_64
Toolbx containers are interactive command-line environments that are meant to be long-lasting pet containers. Therefore, it's important that containers created by older versions of the tools can be used with newer versions.
If necessary, I am happy to change the configuration with which new Toolbx containers are created, but we would need a sufficient migration window for users with pre-existing older containers.
Here's an attempt to
podman start
a container created withtoolbox create
and the older version of the Podman stack:As far as I can make out, Toolbx containers created with the new version of the Podman stack can be started with it.
Steps to reproduce the issue
Create a Toolbx container with
toolbox create
usingcrun-1.8.5-1.fc38.x86_64
,podman-5:4.5.1-1.fc38.x86_64
, etc. on Fedora 38 Workstationdnf update
tocrun-1.8.6-1.fc38.x86_64
andpodman-5:4.6.0-1.fc38.x86_64
Reboot
Try
podman start ...
Describe the results you received
podman start ...
fails with:Describe the results you expected
podman start
should succeed.podman info output
The text was updated successfully, but these errors were encountered: