Skip to content

QUDA Quick Start Guide

mikeaclark edited this page May 12, 2015 · 16 revisions

Multi-GPU emulation

To aid performance modelling and debugging, it is possible to switch on communication in a given dimension, even if in actuality that dimension is local to a given GPU. The command line flag --partition N facilitates this feature, where N is a 4-bit number, with bits 0,1,2,3 used to switch on/off communication in dimensions x,y,z,t (respectively). For example:

dslash_test --partition 1     ## enable x dimension communication
dslash_test --partition 6     ## enable y and z dimension communication
dslash_test --partition 15    ## enable full communication

Debugging

QUDA has two specific debugging modes: HOST_DEBUG and DEVICE_DEBUG.

  • HOST_DEBUG compiles all host code using the -g flag and ensures that all CUDA error reporting is done synchronously (e.g., the GPU and CPU are synchronized prior to fetching the error state). For most debugging, HOST_DEBUG is all that should be needed since most bugs tend to be in CPU code. There is a noticeable performance impact enabling HOST_DEBUG, at the 20-50% level, with the penalty being greater at smaller local volumes.
  • DEVICE_DEBUG compiles all GPU kernels using the -G flag. This provides for accurate line reporting in cuda-gdb and cuda-memch. There is a huge performance penalty impact from enabling this, at the 100x level.
Clone this wiki locally