===== Enforcing Memory Allocation =====
On the cluster, slurm is configured to treat memory as a "consumable resource". That means that you should be careful to specify how much memory your job needs. By default your job will be allocated 8GB per core that it has requested. By default 1 core is allocated, in which case 8GB of memory would be allocated to your job. If your job attempts to use more than its memory allocation **it will be killed by the operating system**.
You can change the amount of memory your job is allocated using either the “–mem” option, or the “–mem-per-cpu” option on sbatch.
sbatch --mem=20G my_script.bash
or
sbatch -c 2 --mem-per-cpu=10G my_script.bash
You can request all memory on a node (whatever amount that is) using:
sbatch --mem=0 my_script.bash
If you have no idea how much memory your job will use, but are convinced that it will use more than the default amount (8GB per core allocated to the job), then you should run at least one test case using the **--exclusive** option and the "--mem=0" trick to **sbatch** which allocates you a whole node and all the memory on the node. When you job completes you can then use the **sacct** command to find out how much memory it used.
If you have previously run jobs on the old cluster (before the 2021 update), you can find out how much memory they used by using the **sacct** command. You can either find the job id of a completed job on the old cluster (perhaps from the slurm-NNNNNN.out file name) and then run a command like this:
sacct -j NNNNNN -o elapsed,MaxRSS
Or you can use your user name and the job name (which defaults to the name of the script you submitted) and get a list of all jobs with the same name as follows. This particular command looks for jobs starting from Jan 1st 2020 (the default start time is the most recent midnight, so you likely will want to specify this option).
sacct --user=chris -S 2020-01-01 --name=myscript.bash --format=JobID,AllocCPUS,elapsed,MaxRSS
The output of this command could be parsed to get the maximum amount of memory used by this particular type of job). (If you use the same script name for multiple different jobs, this will mix them up.)