Replies: 38 comments 1 reply
-
Thanks for reaching out. There are a couple of resource online that may be help in getting it to work:
There was a recent workshop by AWS that goes into the details how to run Podman on Slurm but https://pcluster-podman-gromacs.workshop.aws/ee/verify/slurm.html is not reachable anymore. @ChristianKniep, was is taken offline or is it temporary? |
Beta Was this translation helpful? Give feedback.
-
superseded by containers-on-pcluster.workshop.aws, I have yet to add Podman. ATM it uses AmazonLinux2 (CentOS7), the installation of Cloud9 on CentOS8 blocked me. |
Beta Was this translation helpful? Give feedback.
-
@ChristianKniep Yes, the computers in this cluster have CentOS 7. I am wondering if I can somehow run podman in a SLURM job. |
Beta Was this translation helpful? Give feedback.
-
I've tried to create a GitHub actions workflow to build Podman for CentOS 7 and CentOS 8. Anyway, maybe the project could be of interest to those that want to build Podman and then run it on a compute cluster.
Another project of mine is |
Beta Was this translation helpful? Give feedback.
-
Did I understood correctly that not yet such solution exists, which allows clusters with existing SLURM installation to run podman as a job? |
Beta Was this translation helpful? Give feedback.
-
Podman also allows for distributed container runs using |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot, @kniec! I leave the issue open for reference. |
Beta Was this translation helpful? Give feedback.
-
A friendly reminder that this issue had no activity for 30 days. |
Beta Was this translation helpful? Give feedback.
-
@vrothberg @eriksjolund What is the goal of this issue? Should it still be open? Should we add more documentation? |
Beta Was this translation helpful? Give feedback.
-
Documentation would be my goal. Something in the man pages and maybe a more verbose doc in the tutorials? |
Beta Was this translation helpful? Give feedback.
-
From user's point of view, it looks like podman does not work under slurm. Does it work? How? I think in this thread there are instructions for mpi, which appears not be be the same as running slurm jobs. Then there are some very special slurm setup cases, which might not match what users have. What actually happens is that when trying to run podman under slurm, there will be 'standard_init_linux.go:211: exec user process caused "permission denied" ' error and no explanation what is wrong and how to fix it. |
Beta Was this translation helpful? Give feedback.
-
I think it would be great if Podman had some instructions targeted for this type of HPC user
I have access to a CentOS 7 cluster that is also is running Slurm. It's configured like this:
/etc/subuid and /etc/subgid are configured to give my user a range of subuids and subgids. As I don't have root permissions, I needed to ask the system group to set up the files /etc/subuid, /etc/subgid and /etc/sysctl.d/99-rootless.conf . My plan is to see if it is possible to run Podman in a compute job submitted with Slurm. Today I made a minimal to test to see if could run Podman 3.2.2 (without Slurm):
That seems to work. I used this storage.conf
with Podman downloaded from It seems the graphroot directory is empty (/tmp/erik.sjolund_test3) so I guess only the rootless_storage_path setting is needed. I haven't performed any Slurm tests yet though. (The cluster is often heavily used by other people. I would need to find some good time to test it). |
Beta Was this translation helpful? Give feedback.
-
This seems like a rather specific use case to document in general documention. I think you should blog on your experience. But I don't see where the upstream would put something like this. |
Beta Was this translation helpful? Give feedback.
-
@rhatdan yes, you have point. My use case is probably a bit too specific. What if there would be a more general document about "surviving in the harshest conditions", that would spell out: What are the minimal requirements for running a container with Podman? If I understand correctly this is always required: The value of /proc/sys/user/max_user_namespaces would need to be a positive number (or a positive number that is large enough) Running a container with single UID mapping might then be possible ( To be able to run a container without single UID mapping, you would additionally need I would guess there are only a few dependencies where access to the home directory and the /etc/containers directory is absolutely needed. At least over time it seems that Podman has become less and less dependent on those directories. By setting environment variables, such as In my opinion it would be great if it would be possible to run a Podman container without any file access in the How does my discussion relate to the topic of this GitHub issue (Slurm)? |
Beta Was this translation helpful? Give feedback.
-
@eriksjolund If you find a way to run podman under a SLURM job, I would be very interested testing the method / config. |
Beta Was this translation helpful? Give feedback.
-
Has it been identified which slurm and podman configuration options enable podman to work under a slurm job? Are there any requirements for podman versions, for example if a newer than 1.6.4 is needed? |
Beta Was this translation helpful? Give feedback.
-
Running interactive as user on the system (submit nodes via ssh) podman runs just fine, under slurm it falls apart with:
Any hints on workarounds? |
Beta Was this translation helpful? Give feedback.
-
@caramarc Interesting results! There are some command-line options for
If anyone wants to experiment, maybe something like (Currently I have little time over, otherwise I would like to try it out) About the lack of cgroups v2 support in SlurmIt looks like there is some ongoing development for introducing cgroupsv2 support in Slurm but it
I also found this quote: |
Beta Was this translation helpful? Give feedback.
-
@eriksjolund I did a lot more testing after my post and this specific error is related to kernels 3.10.x. It works just fine in kernels 4.18.x. Whatever cgroup v1 changes happened between kernels 3.10.x and 4.18.x addressed this issue.
|
Beta Was this translation helpful? Give feedback.
-
@caramarc I see, good news then. I would be interested to hear some more details. |
Beta Was this translation helpful? Give feedback.
-
@eriksjolund I didn't use/specify any extra command-line arguments and yes |
Beta Was this translation helpful? Give feedback.
-
A friendly reminder that this issue had no activity for 30 days. |
Beta Was this translation helpful? Give feedback.
-
Hello, I configured slurm and podman on the node and they both work fine on their own. I have some difficulties trying to use them together, particular the gpu resource filtering. After a srun / sbatch on the node, nvidia-smi correctly displays the resources allocated by slurm (e.g. 2 out of 4 gpus), but when we start a container with podman, all the gpu of the machine are available (e.g. 4/4). Is there any way to fix this? |
Beta Was this translation helpful? Give feedback.
-
A friendly reminder that this issue had no activity for 30 days. |
Beta Was this translation helpful? Give feedback.
-
A friendly reminder that this issue had no activity for 30 days. |
Beta Was this translation helpful? Give feedback.
-
This feels more like a discussion than an issue, I am moving to discussion. |
Beta Was this translation helpful? Give feedback.
-
Maybe cgroups v2 support has arrived in Slurm? Slurm provides support for systems with Control Group v2. (I haven't tried it out) |
Beta Was this translation helpful? Give feedback.
-
Slurm docs about Cgroup v2: https://slurm.schedmd.com/cgroup_v2.html |
Beta Was this translation helpful? Give feedback.
-
Hi, Has there been any update for this issue? Is there a way to run podman (4.4.1) under a slurm (22.05) job? Is there documentation how to do this? I have AlmaLinux9 environment with slurm 22.05. If I take an interactive shell as a slurm job and try podman, I get following error: $ podman version So, under slurm job there is no /run/user folder, which seems to be required by podman. |
Beta Was this translation helpful? Give feedback.
-
This might be the docs you're looking for: https://slurm.schedmd.com/containers.html You need Slurm-23.02+, cgroups v2 and special settings, I think. |
Beta Was this translation helpful? Give feedback.
-
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind feature
Description
Steps to reproduce the issue:
$ srun --pty /bin/bash
$ podman run -it ubuntu /bin/bash
Describe the results you received:
Error message saying: 'standard_init_linux.go:211: exec user process caused "permission denied" '
Describe the results you expected:
Entering a container. Or at least an error message stating what is the problem and how to solve it.
Additional information you deem important (e.g. issue happens only occasionally):
Info gives a different error message. It looks that when running a SLURM session, there is no user's folder under /run/user/, but still podman is expecting to have it.
$ podman info
Error: could not get runtime: error generating default config from memory: cannot mkdir /run/user/16869/libpod: mkdir /run/user/16869/libpod: no such file or directory
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
No
This is a production environment and podman is coming from official production quality repositories. As far as I know the only package repository for more recent podman is Kubic project's, but it is stating that it is not a preferred for production environments: "These packages haven’t been through the official Red Hat QA process and may not be preferable for production environments."
Additional environment details (AWS, VirtualBox, physical, etc.):
Beta Was this translation helpful? Give feedback.
All reactions