Slurm Source Code Install | Cluster Deployment - Day4: Methods of Slurm to restrict resources
- Support for Multi-core/Multi-thread Architectures (srun to control resources usage)
- Consumable Resources in Slurm
- Resource Binding
- CPU Management User and Administrator Guide
- GRES
- Heterogeneous Job Support
- Containers Guide
Support for Multi-core/Multi-thread Architectures (srun to control resources usage)
some key parameters in srun
- BaseBoard
- LDom
- Socket/Core/Thread
- CPU
- Affinity/Affinity Mask/Fat Masks
The use must use srun to
--cpu-bind=... --sockets-per-node=S --cores-per-socket=C --threads-per-core=T
and ohter advanced methods for uses to speicify the number of nodes binded with a job
test this plugin in slurm
- map_cpu:[list] specify a CPU ID binding for each task where [list] is [cpuid1],[cpuid2],...[cpuidN]
The CPU IDs within a node in the block numbering are: (this information is available from the /proc/cpuinfo file on the system)
-
map_ldom:[list] specify a NUMA locality domain ID for each task where [list] is [ldom1],[ldom2],...[ldomN]
-
boards auto-generated masks bind to boards ldoms auto-generated masks bind to NUMA locality domains sockets auto-generated masks bind to sockets cores auto-generated masks bind to cores threads auto-generated masks bind to threads help show this help message We can see above other 6 parameter is auto-generated
But when I want to run the CPU id I specific.
Consumable Resources in Slurm
Slurm, using the default node allocation plug-in, allocates nodes to jobs in exclusive mode.
defalut: exclusive
cons_res
select/cons_res;
Consumable resources has been enhanced with several new resources --namely CPU (same as in previous version), Socket, Core, Memory as well as any combination of the logical processors with Memory:
CPU (CR_CPU): CPU as a consumable resource.
- No notion of sockets, cores, or threads.
- On a multi-core system CPUs will be cores.
- On a multi-core/hyperthread system CPUs will be threads.
- On a single-core systems CPUs are CPUs. ;-)
CPU (CR_CPU): CPU as a consumable resource.
Board (CR_Board): Baseboard as a consumable resource.
Socket (CR_Socket): Socket as a consumable resource.
Core (CR_Core): Core as a consumable resource.
Memory (CR_Memory) Memory only as a consumable resource. Note! CR_Memory assumes OverSubscribe=Yes
Socket and Memory (CR_Socket_Memory): Socket and Memory as consumable resources.
Core and Memory (CR_Core_Memory): Core and Memory as consumable resources.
CPU and Memory (CR_CPU_Memory) CPU and Memory as consumable resources.
example: srun -N 5 -n 20 --mem=10 sleep 100 & <-- running
On many systems with larger processor count, jobs typically run one fewer task than there are processors to minimize interference by the kernel and daemons.
Resource Binding
The highest priority will be that specified using the srun --cpu-bind option. The next highest priority binding will be the node-specific binding if any node in the job allocation has some CpuBind configuration parameter and all other nodes in the job allocation either have the same or no CpuBind configuration parameter. The next highest priority binding will be the partition-specific CpuBind configuration parameter (if any). The lowest priority binding will be that specified by the TaskPluginParam configuration parameter.
Cgroups
- Cgroups - a container;
- Subsystem - a module, typically a resource controller, that applies a set of parameters to the cgroups in a hierarchy.
- Hierarchy - a set of cgroups organized in a tree structure, with one or more associated subsystems.
- State Objects - pseudofiles that represent the state of a cgroup or apply controls to a cgroup:
- tasks - identifies the processes (PIDs) in the cgroup. additional state objects specific to each subsystem.
The Cgroup's functionality in slurm:
- The ability to confine jobs and steps to their allocated cpuset.
- The ability to bind tasks to sockets, cores and threads within their step's allocated cpuset on a node.
- Supports block and cyclic distribution of allocated cpus to tasks for binding.
- The ability to confine jobs and steps to specific memory resources.
- The ability to confine jobs to their allocated set of generic resources (gres devices).
Note that all these structures apply to a specific compute node. Jobs that use more than one node will have a cgroup structure on each node.
It seems that we need to edit cgroup.conf by ourself to restrict the user's use;
CPU Management User and Administrator Guide
GRES
Generic resource (GRES) scheduling is supported through a flexible plugin mechanism. Support is currently provided for Graphics Processing Units (GPUs), CUDA Multi-Process Service (MPS), and Intel® Many Integrated Core (MIC) processors.
we can use a gres.config and slurm.conf to specify the number of GPU nodes.
1 | GresTypes=gpu,mps,bandwidth |
1 | AutoDetect=nvml |
SLURM support gpus scheduling
Jobs will not be allocated any generic resources unless specifically requested at job submit time using the options:
--gres Generic resources required per node --gpus GPUs required per job --gpus-per-node GPUs required per node. Equivalent to the --gres option for GPUs. --gpus-per-socket GPUs required per socket. Requires the job to specify a task socket. --gpus-per-task GPUs required per task. Requires the job to specify a task count.
In summary, we can specify gpu;mps;mic;bounding bandwidth; gres; gpus; gpus-per-node;gpu-persocket;
GRES SCHEDULING
we can
Heterogeneous Job Support
For example, part of a job might require four cores and 4 GB for each of 128 tasks while another part of the job would require 16 GB of memory and one CPU.
Containers Guide
- DOCKER The issue that usually stops most sites from using Docker is the requirement of "only trusted users should be allowed to control your Docker daemon" [Docker Security] which is not acceptable to most HPC systems.
Sites with trusted users can add them to the docker Unix group and allow them control Docker directly from inside of jobs. There is currently no direct support for starting or stopping docker containers in Slurm.
-
Charliecloud Charliecloud is user namespace container system sponsored by LANL to provide HPC containers. Charliecloud supports the following: Directly called by users via user namespace support. Direct Slurm support currently in development. OCI Image support (via wrapper)
-
Shifter Shifter is a container project out of NERSC to provide HPC containers with full scheduler integration.
-
Singularity is hybrid container system that supports:
-
ENROOT
We can see that we can specify the number of GPU we need to use through slurm; we can restrict the resources (except memory and NIC for the task through the command);