QUDA Environment Variables

Below is a list of QUDA specific environment variables. We also include a list of useful CUDA specific variables at the bottom of the page.

Variable name	Function
`QUDA_RESOURCE_PATH`	Path where tune cache and profile files will be output
`QUDA_PROFILE_OUTPUT_BASE`	Filename prefix for profile output. Setting this will result in the files `$(QUDA_PROFILE_OUTPUT_BASE).tsv` and `$(QUDA_PROFILE_OUTPUT_BASE_async).tsv` and being written out (default is simply `profile.tsv` and `profile_async.tsv`)
`QUDA_ENABLE_P2P`	`QUDA_ENABLE_P2P=0` # disable all p2p transfers `QUDA_ENABLE_P2P=1` # enable only copy engines `QUDA_ENABLE_P2P=2` # enable only remote writing `QUDA_ENABLE_P2P=3` # enable both copy engines and remote writing Default is enabling copy engines and remote writing (3)
`QUDA_ENABLE_P2P_MAX_ACCESS_RANK`	Set limit on which GPUs are peer-to-peer connected (use to disable low-bandwidth connections), e.g., `QUDA_ENABLE_P2P_MAX_ACCESS_RANK=0` would limit to only highest bandwidth connections
`QUDA_ENABLE_TUNING`	Enable / disable kernel autotuning. Default is enabled, disable with `QUDA_ENABLE_TUNING=0`
`QUDA_REORDER_LOCATION`	Set where data should be reordered when transferring CPU<->GPU (default is GPU)
`QUDA_RANK_VERBOSITY`	Set which global ranks are active in `printfQuda` calls (default is rank 0)
`QUDA_ENABLE_DEVICE_MEMORY_POOL`	Enable / disable device memory allocator (default is enabled, disable with `QUDA_ENABLE_DEVICE_MEMORY_POOL=0`
`QUDA_ENABLE_PINNED_MEMORY_POOL`	Enable / disable device memory allocator (default is enabled, disable with `QUDA_ENABLE_PINNED_MEMORY_POOL=0`
`QUDA_ENABLE_MANAGED_MEMORY`	Enable / disable using managed memory for allocations (default is disabled, enable with `QUDA_ENABLE_MANAGED_MEMORY=1`). Note: managed memory has some limitations for pre-Pascal architectures.
`QUDA_ENABLE_MANAGED_PREFETCH`	Enable / disable explicit managed memory prefetching calls (default is disabled, enable with `QUDA_ENABLE_MANAGED_PREFETCH=1`). Does nothing if `QUDA_ENABLE_MANAGED_MEMORY` isn't enabled.
`QUDA_ENABLE_NUMA`	Enabled NUMA placement. Default is enabled, if NUMA has been enabled in cmake, disabled with `QUDA_ENABLE_NUMA=0`
`QUDA_MILC_HISQ_RECONSTRUCT`	Set the reconstruct type in the MILC interface used for the long links in the HISQ solver. Allowed values are 9/13/18 with 18 the default
`QUDA_MILC_HISQ_RECONSTRUCT_SLOPPY`	Set the sloppy reconstruct type in the MILC interface used for the long links in the HISQ solver. Allowed values are 9/13/18 with 18 the default
`QUDA_ENABLE_GDR`	Enable GPU-Direct RDMA. Default is disabled, enabled with `QUDA_ENABLE_GDR=1`
`QUDA_ENABLE_ZERO_COPY`	Enable zero-copy policies (can be beneficial on systems without performant GDR). Default is disabled, enabled with `QUDA_ENABLE_ZERO_COPY=1`
`QUDA_ENABLE_NVSHMEM`	Enable NVSHMEM communication policies if QUDA is build with NVSHMEM support. Default is enabled, set to `0` to disable.
`QUDA_TEST_GRID_SIZE`	Set the process geometry for the unit tests. Overrides the `--gridsize` parameter if set.
`QUDA_TEST_GRID_PARTITION`	Set the process grid partition geometry for the unit tests (for split grid). Overrides the `--grid-partition` parameter if set.
`QUDA_ENABLE_MPS`	Enable support for MPS in QUDA. Generally not recommended except for testing purposes. Default is disabled, enable with `QUDA_ENABLE_MPS=1`
`QUDA_DEVICE_RESET`	Call `cudaDeviceReset` in `endQuda` - this legacy behavior can be useful for profiling, but destroys the CUDA context of other CUDA libraries outside of QUDA (e.g., GPU-aware MPI). Default is disabled, enable with `QUDA_DEVICE_RESET=1`
`QUDA_DETERMINISTIC_REDUCE`	Perform all MPI reductions deterministically: setting this flag means that post-tuning or no tuning, QUDA will run completely deterministically regardless of the rank order. Default is disabled, enable with `QUDA_DETERMINISTIC_REDUCE=1`
`QUDA_TUNE_VERSION_CHECK`	Set `QUDA_TUNE_VERSION_CHECK=0` to disable the check that prevents using a tunecache.tsv file from a different QUDA version
`QUDA_ENABLE_TUNING_SHARED`	Disable shared memory autotuning. Useful for checking the effect of this.
`QUDA_TUNING_RANK`	Set the global default rank for doing kernel autotuning (default is rank 0)
`QUDA_MAX_MULTI_RHS`	Set the maximum number of RHS per kernel. Default is 64 with large kernel arguments, and 16 otherwise.
`QUDA_ENABLE_MONITOR`	Set `QUDA_ENABLE_MONITOR=1` to enable device monitoring during execution. Monitor log dumped to the `QUDA_RESOURCE_PATH` upon `endQuda` being called.
`QUDA_ENABLE_MONITOR_PERIOD`	Set the monitor period in microseconds (default is 1000 microseconds = 1 millisecond)

CUDA environment variables

Variable name	Function
`CUDA_LAUNCH_BLOCKING`	If set to =0 (default behaviour) this will ensure that all kernels are launched synchronously. If set to =1, kernels are launched asynchronously. This will ensure that that error messages pertain to precisely the last kernel called

QUDA calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUDA Environment Variables

Clone this wiki locally