Useful Slurm commands¶
Some example commands¶
You can get information on each commands by consulting its man page, e.g. for sinfo
man sinfo
The following table contains a list of example commands.
| command | description |
|---|---|
| sinfo | monitor nodes and partitions queue information; check more info options by sinfo --help |
| sinfo -o "%C %P" | report of CPU usage as Idle, Active,... for a partition |
| squeue | view information about jobs in the scheduling queue |
| scontrol show jobid JobID | job status |
| scontrol show jobid -dd JobID | helpful for job troubleshooting |
| sstat -j JobID | information about running jobs (or specific job JobID) |
| scancel -j JobID | abort job JobID |
| scancel -n JobID | delete all jobs with job name JobID |
| sprio -l | priority of your jobs |
| sshare -a | share information about all users |
| sacct -j JobID -o 'JobID,state,MaxVMSize,MaxRSS,Elapsed' | information on completed jobs (or specific job JobID) |
| sacct --helpformat | format options for sacct |
| sacctmgr show user -s | user account information |
| sreport -tminper cluster utilization --tres="cpu,gres/gpu" start=2019-12-01 | check utilisation of resources |
How to check your past and current jobs' memory requirements¶
For composing job memory requirements it is important to understand the
memory behavior of jobs. The critical metric is the job's maximal
resident set size (MaxRss), i.e. the maximal amount of memory that a
job occupies in the physical RAM of the node. This is what you need to
specify in SLURM request flags like --mem-per-cpu.
You can use sacct in a line like the following to find out about your
past and current jobs.
sacct --format="JobID%16,User%12,State%16,partition,time,elapsed,ReqMem,MaxRss,MaxVMSize,ncpus,nnodes,reqcpus,reqnode,Start,End,NodeList"
If you want to see older jobs than from today, you will have to add a
starting time like -S 2021-05-25. Also, you can list specific jobs by
adding the Job ID following the -j flag:
sacct --format="JobID%16,User%12,State%16,partition,time,elapsed,ReqMem,MaxRss,MaxVMSize,ncpus,nnodes,reqcpus,reqnode,Start,End,NodeList" -j $YOUR_JOB_ID
The total maximal memory consumed by your job may be larger, but this
does not matter if most of it can be kept in virtual memory which is
staged out to disk, and which need not be accessed frequently. The
situation changes if that staged out memory also needs to be continually
read back, which leads to the condition of swapping. The node is so
busy staging in and out from your virtual memory that it can almost do
no work at all for you in \"user space\", but is spending most of it's
time in \"kernel space\". If you look at jobs with tools like top,
these jobs usually appear in a D state.