Skip to content
Evan Schneider edited this page May 9, 2018 · 29 revisions

Requirements

  • NVIDIA graphics card
  • C/C++ compiler, ex. g++
  • NVIDIA cuda compiler (the CUDA toolkit is available here)
  • hdf5 library (recommended)

Downloading the code

The public version of the code can be found at https://github.com/cholla-hydro/cholla. To download it, you can either clone the main repository directly

git clone https://github.com/cholla-hydro/cholla

or create your own fork on github and clone that (recommended if you plan to contribute).

Compiling the code

The main repository contains two makefiles, one for linux and one for macs. Once you have downloaded the required compilers and libraries, you should be able to compile the code by selecting the relevant makefile

make -f makefile.linux

If successful, this should create an executable called "cholla" in the main directory. You may have to edit the makefile to tell it where the libraries are. If you are running cholla on a cluster, you may also need to load the relevant modules prior to compiling the code.

Note: It is important that the code be compiled for the correct GPU architecture. This is be specified in the makefile via the -arch flag. The relevant architecture can be found by running the "Device_Query" sample program in the NVIDIA Cuda toolkit. Two common examples are -arch=sm_35 for Tesla K20's or -arch=sm_60 for Tesla P100's.

Running the code (serial mode)

To run cholla on a single GPU, you must execute the binary and provide it with an input parameter file. For example, to run a 1D Sod Shock tube test, you would run:

./cholla tests/1D/Sod.txt

The code will write some information about the input parameters to the terminal:

Parameter values:  nx = 256, ny = 1, nz = 1, tout = 0.200000, init = Riemann, boundaries = 3 3 0 0 0 0
Output directory:  ./
Local number of grid cells: 256 1 1 262

followed by some text indicating the code is initializing:

Setting initial conditions...
Setting boundary conditions...
Boundary conditions set.
Dimensions of each cell: dx = 0.003906 dy = 0.003906 dz = 0.003906
Ratio of specific heats gamma = 1.400000
Nstep = 0  Timestep = 0.000000  Simulation time = 0.000000
Writing initial conditions to file...
Starting calculations.

After this, the code will print out a line for every time step it takes, indicating the step it is on, the total time elapsed in the simulation, the size of the timestep taken, the wall-time elapsed during the timestep, and the total wall-time of the simulation:

n_step: 1   sim time:  0.0009904   sim timestep: 9.9042e-04  timestep time =   301.153 ms   total time =    0.3060 s

The code will stop running when it reaches the final time specified in the input file. It will also have created at least 2 output files in the output directory specified in the parameter file (in this case, the same directory where we ran the code), one for the initial conditions and one for the final output. Additional files may have been created depending on the timestep chosen for outputs.

Running the code (parallel mode)

Cholla can also be run using multiple GPUs when it is compiled using the Message Passing Interface (MPI). To run in parallel mode requires an mpi compiler, we recommend openmpi, but it should work with others. Once the appropriate mpi compiler is installed or loaded, uncomment the relevant line in the makefile:

MPI_FLAGS = -DMPI_CHOLLA

and compile the code. Once the code is compiled, you can run it using as many processes as you have available GPUs (Cholla assumes there is one GPU per MPI process). For example, if you have 4 GPUs, you could run a 3D sound wave test via:

mpirun -np 4 ./cholla tests/3D/sound_wave.txt

The code will automatically split up the simulation domain amongst the GPUs. If you are running on a cluster, you may have to specify additional information about the number of GPUs per node in the batch submission script (e.g. PBS, slurm, LSF).

Clone this wiki locally