===== Cheat Sheet =====
=== Submit a job ===
This command submits a job to run "in the background". Output is written to a file which is named slurm-NNNNN.out by default, where NNNNN is the job number slurm assigns to your job.
sbatch [-p partition] [-c ncores] [--mem=NNNG] [--exclusive] scriptname
"--exclusive" requests all cores on a node. Use it only when you need to. "-c" specifies how many cores your job will use. (Use only one of "-c" and "--exclusive".) "scriptname" must be the name of a shell script (but see the "--wrap" option.) "--mem=0" requests all memory on a node.
For submitting many jobs look at using the "--array" option.
sbatch --array=0-99%5 scriptname
Will run 100 copies of **scriptname** in total, but only allow 5 to be running at any one time. The script itself must sort out how to do something different for each instance (using the SLURM_ARRAY_TASK_ID environment variable).
On the new cluster you may need to use the "--mem" or "--mem-per-cpu" options to make sure that sufficient memory is allocated to your task. The default is 8GB per cpu/core.
=== Check the queue ===
squeue
squeue -u USERNAME
squeue -w NODENAME
=== Check node status ===
sinfo
sinfo -p standard -N -O "partition,nodelist,cpus,memory,cpusload"
use_by_user
"use_by_user" is a script that runs "scontrol" to get the information it reports.
=== Kill jobs ===
scancel JOBID
scancel -u USERNAME
**Not "skill" - which does exist, but isn't part of SLURM.**
The second of these commands would kill **all** of your slurm jobs.
=== Report Job Details ===
This works for running (or very recently completed) jobs.
scontrol show job JOBID
=== Check on the Resources a Job Used ===
This command will show how much (elapsed) time and memory a job used. This information is kept in the slurm accounting database so the command can be used long after the job has completed.
sacct -o elapsed,maxrss -j NNNN.batch
Check on resources used by all of your jobs since a specific date:
sacct --user=chris -S 2020-01-01 -o elapsed,maxrss
=== Get Information about a Node ===
To get information about a node including how many cores and how much memory it has:
scontrol show node node62
=== Interactive Jobs ===
srun [-p partition] [-c ncores] [--exclusive] program
srun --pty bash -i
The second command above will get you a command line on a node. You can use the "-w" option to target a specific node. (Note that you will only get the command line if there is a free core on the node in question.) You could use this to check on your job's status - e.g. amount of memory it is using, number of cores it is using. This can also be done more programmatically in your scripts, or using sstat (for memory use), but this command line technique can be useful sometimes.
As a matter of etiquette, please don't start up a shell on a node and just leave it running when you aren't using it. This tends to reduce the number of nodes available for exclusive use for users who need one.
=== Checking Disk Space ===
Check how much space is left on your home volume:
chris@node0:~$ cd
chris@node0:~$ pwd
/home3/chris
chris@node0:~$ df -H /home3
Filesystem Size Used Avail Use% Mounted on
fs2:/srv/storage_2/node-home3 105T 91T 14T 88% /home3
chris@node0:~$
**You should check this before you add a lot more data to your home directory, or run jobs that generate a lot of output.** If you need more space than is available on your home volume please talk to the system administrators: we may be able to give you space on a different volume. Consider using /scratch for data that can easily be replaced (e.g. data download from NCBI).
See space remaining on all home volumes:
chris@node0:~$ df -H /home*
Filesystem Size Used Avail Use% Mounted on
fs1:/srv/storage_1/node-home 40T 33T 7.5T 82% /home
fs2:/srv/storage_1/node-home 105T 96T 8.6T 92% /home2
fs2:/srv/storage_2/node-home3 105T 91T 14T 88% /home3
fs3:/srv/storage_1/node-home 81T 51T 30T 63% /home4
fs4:/srv/storage_0/node-home5 118T 39T 79T 33% /home5
fs4:/srv/storage_1/node-home6 98T 0 98T 0% /home6
To check how much disk space a directory is using:
chris@node0:~$ du -sh torch
3.6G torch
(This can take a long time if there are many files in the directory.)