-
Notifications
You must be signed in to change notification settings - Fork 4
Notes
Pros:
- Don't stay in "root"
- Logging
- Restrict commands per user
Cons:
- One password to become root.
- Accidental execution of root commands in shell history
- Not in "root thinking" mode
- Stay logged into root (unless remove "ticket" manually), it's easier and clearer if you are in a root shell. You just logout after getting done with the session. The visual queue is clearer that the shell has root access.
If you have a non-longer-trusted user, who should no longer be an admin, you remove their account completely. You don't just "demote" by removing sudo. They could have left behind a trojan or daemon so you need a clean sweep of all systems, even ones where the user didn't have root access, but had an account.
As soon as you have the ability to write a file as root, you can overwrite any binary or shell init file for any user. That's all that's needed to escalate privileges so sudo isn't really protecting anything. You have to restrict not only what commands but also the targets. This gets tricky if the purpose of sudo is to give access in the first place.
One important point about the cloud is that it upgrades synchronously. I've been getting the SLURM list for a while, and, quite often, I see messages like the one appended. The problematic part for me is "let's say node01 has this: CPUs=4, and Gres=gpu:1; and node02 has this: CPUs=4, and no GPU."
If you use the cloud, you do not have this problem, ever. If a job requires GPUs, then you ask the cloud for nodes with GPUs. If a job requires 16 cores per node, you ask the cloud for that. If a job requires local SSD, again, just ask and ye shall receive.
The supercomputer market has many distortions. Apart from the bizarre economics and egos, you see from the above that amortization distorts the market for cluster management software. Amortization also justifies permanent staff, because they are needed to understand and to operate legacy hardware and the complex software required to run it. This is distinct from the rent vs buy economics, which doesn't always apply to supercomputers.
The cloud eliminates complexity. Even without tools like CfnCluster and Star Cluster, you can quite easily create a VPC with a script that runs MPI for a small audience.
The above thoughts are not exactly new to me, but the volume of complexity on the SLURM and OpenMPI lists has raised the cost to a new level. SLURM, Torque, SGE, IT departments, etc. probably will disappear as soon as more tools for invoking MPI-based codes on AWS, GCP, etc. are made available.
Rob
---------- Forwarded message ----------
From: Xyz
Subject: [slurm-dev] How should I configure a cluster to reserve GPUs
To: slurm-dev <[email protected]>
What's the, let's say, most optimized way of configuring Slurm to manage
a very small cluster (around 20 nodes), with nodes having the following
characteristics:
- A few nodes with:
Intel Core i7
NVIDIA GeForce GTX 680
8 GB RAM
- Some other nodes with:
Intel Core i7
8 GB RAM
- And some other nodes with:
Intel QuadCore
4 GB RAM
My main goal is to set priorities like that:
- The nodes with the best configuration are the first to get jobs, but
it's VERY important that the GPU in these nodes remain usable for jobs
that require GPUs.
Example:
1.let's say node01 has this: CPUs=4, and Gres=gpu:1; and node02 has
this: CPUs=4, and no GPU.
2.a user submits a job that does not use GPU, but uses 5 CPUs
3.my idea in that case is to make the job use only 3 CPUs in node01
and 2 CPUs in node02, because node01 has a GPU and the job does not
use GPU so I have to reserve a CPU and a GPU in node01 for jobs that
require GPU
4.another user submits a job with GPU as requirement and the job
gets executed in node01 because the GPU and (at least) 1 CPU is
reserved there
- Nodes with slower performance are the last to receive jobs
Note: MPI support would be great.
I don't know if that's possible, and if it isn't, which ways do you guys
suggest for me to configure this cluster to allocate the GPUs properly?
Thanks.