Table of Contents
GPU Computing
SLURM partition
The GPUs are accessible through compute nodes in the gpu
SLURM partition.
You can send a script to this queue using a command like sbatch -p gpu myscript.sh
.
Hardware
The gpu computing node has two K20 GPUs:
$ srun -p gpu lspci | grep -i nvidia 05:00.0 3D controller: NVIDIA Corporation Device 1028 (rev a1) 42:00.0 3D controller: NVIDIA Corporation Device 1028 (rev a1)
Directory structure
The development tools for GPU computing, including libraries and compilers,
have been installed in /usr/local/cuda-5.0/
.
Of particular interest could be nvcc
in /usr/local/cuda-5.0/bin/
and the shared object files such as libcudart.so
and libcurand.so
in /usr/local/cuda-5.0/lib64/
.
Hardware-specific nvcc build options
Because we are using K20 GPUs, you will want to tell the nvcc
compiler that we can use their modern features such as double precision floating point arithmetic. To do this, use -arch sm_35 to specify the 'compute capability' of 3.5. Other options that you will probably want to use for scientific computing are --prec-div=true and --prec-sqrt=true. For more information see the nvcc options page.
Headless GPGPU profiling
The NVIDIA Visual Profiler nvvp (packaged as nvidia-visual-profiler
for Ubuntu) is a cross-platform way to profile GPGPU applications, but this is less useful when we can run the applications only on the command line through the SLURM resource manager. In this case we can use nvprof (packaged as nvidia-profiler
) to get a log file which can be downloaded and viewed on a desktop using nvvp
.
Running Sample Script
1) Interactive Shell
Below are the contents of sample CUDA script hello_world.cu
which prints Hello CUDA!
on execution.
#include <stdio.h> __global__ void helloCUDA() { printf("Hello, CUDA!\n"); } int main() { helloCUDA<<<1, 1>>>(); cudaDeviceSynchronize(); return 0; }
You can now compile this code with nvcc
(Nvidia cuda compiler) and submit the job using srun
. Srun parameter -p
specifies gpu
partition. –nodes=1 –ntasks-per-node=1
requests one node with one task per node. –time=01:00:00
sets the time limit of 1 hr to the job. –pty bash -i
Requests an interactive Bash shell on the allocated node, which allows for command-line interaction. You can now execute the code on allocated node with ./hello_world
.
> nvcc -o hello_world hello_world.cu > srun -p gpu --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i > ./hello_world Hello, CUDA! > exit
2) Job Script
You can submit same CUDA program with job script. Below are the contents of script.sh which specifies job parameters and captures program output.
#!/bin/bash #SBATCH --job-name CudaJob #SBATCH --output result.out #SBATCH --partition=gpu #SBATCH --ntasks=1 #SBATCH --time=0-00:10:00 ## Run the script srun $HOME/hello_world
You can now submit the job with this script using sbatch and expect the output in result.out.
> sbatch script.sh Submitted batch job 2432982 > cat result.out Hello, CUDA!