GPU Computing

SLURM partition

The GPUs are accessible through compute nodes in the gpu SLURM partition. You can send a script to this queue using a command like sbatch -p gpu myscript.sh.

Hardware

The gpu computing node has two K20 GPUs:

$ srun -p gpu lspci | grep -i nvidia
05:00.0 3D controller: NVIDIA Corporation Device 1028 (rev a1)
42:00.0 3D controller: NVIDIA Corporation Device 1028 (rev a1)

Directory structure

The development tools for GPU computing, including libraries and compilers, have been installed in /usr/local/cuda-5.0/. Of particular interest could be nvcc in /usr/local/cuda-5.0/bin/ and the shared object files such as libcudart.so and libcurand.so in /usr/local/cuda-5.0/lib64/.

Hardware-specific nvcc build options

Because we are using K20 GPUs, you will want to tell the nvcc compiler that we can use their modern features such as double precision floating point arithmetic. To do this, use -arch sm_35 to specify the 'compute capability' of 3.5. Other options that you will probably want to use for scientific computing are --prec-div=true and --prec-sqrt=true. For more information see the nvcc options page.

Headless GPGPU profiling

The NVIDIA Visual Profiler nvvp (packaged as nvidia-visual-profiler for Ubuntu) is a cross-platform way to profile GPGPU applications, but this is less useful when we can run the applications only on the command line through the SLURM resource manager. In this case we can use nvprof (packaged as nvidia-profiler) to get a log file which can be downloaded and viewed on a desktop using nvvp.

Running Sample Script

1) Interactive Shell

Below are the contents of sample CUDA script hello_world.cu which prints Hello CUDA! on execution.

#include <stdio.h>

__global__ void helloCUDA()
{
    printf("Hello, CUDA!\n");
}

int main()
{
    helloCUDA<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

You can now compile this code with nvcc (Nvidia cuda compiler) and submit the job using srun. Srun parameter -p specifies gpu partition. –nodes=1 –ntasks-per-node=1 requests one node with one task per node. –time=01:00:00 sets the time limit of 1 hr to the job. –pty bash -i Requests an interactive Bash shell on the allocated node, which allows for command-line interaction. You can now execute the code on allocated node with ./hello_world.

> nvcc -o hello_world hello_world.cu
> srun -p gpu --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
> ./hello_world
Hello, CUDA!
> exit

2) Job Script

You can submit same CUDA program with job script. Below are the contents of script.sh which specifies job parameters and captures program output.

#!/bin/bash

#SBATCH --job-name CudaJob
#SBATCH --output result.out
#SBATCH --partition=gpu
#SBATCH --ntasks=1
#SBATCH --time=0-00:10:00

## Run the script
srun $HOME/hello_world

You can now submit the job with this script using sbatch and expect the output in result.out.

> sbatch script.sh
Submitted batch job 2432982

> cat result.out
Hello, CUDA!

BRC Documentation

Table of Contents