on an Ubuntu system with NVIDIA Quattro GV100 GPUs and CUDA SDK version 11 installed:
sudo apt install -y zlib1g zlib1g-dev
git clone https://github.com/muellan/metacache.git
cd metacache
git submodule update --init --recursive
make gpu CUDA_ARCH=sm_70
See below for more details.
The GPU version of MetaCache requires a CUDA-capable device of the Pascal generation or newer.
In order to be able to build the default database (based on NCBI RefSeq Release 97) with default settings your system will need a total of 120 GB of GPU memory (e.g. 4x GPUs with 32 GB each). If you don't have enough GPU memory, you can use database partitioning.
-
CUDA SDK
- CUDA >= 11
-
Hashtable library warpcore (github) (HIPC '20 paper) and sorting library bb_segsort (github). Both repositories are included as submodules and need to be checked out in addition to MetaCache itself. You can do so by calling
git submodule update --init --recursive
-
Support for gzipped FASTA/FASTQ files requires the zlib compression library to be installed on your system. On Debian/Ubuntu zlib can be installed with
sudo apt install -y zlib1g zlib1g-dev
. If you don't have zlib installed or cannot do so you can compile withmake MC_ZLIB=NO
which will remove the zlib dependency and disables support for gzipped input files.
Run make
in the directory containing the Makefile and set the GPU generation with the CUDA_ARCH
flag (e.g. CUDA_ARCH=sm_70
for Quadro GV100):
make gpu CUDA_ARCH=sm_70
If you don't supply additional parameters MetaCache will be compiled with the default data type settings which support databases with
- up to 4,294,967,295 targets (= reference sequences)
- targets with a length of up to 4,294,967,295 windows (which corresponds to approximately 485.3 billion nucleotides with the default window size of 112)
- kmers with a lengths of up to 16
A database built by the GPU version can be queried by the corresponding CPU version and vice versa. The only restriction is the available (GPU) memory.
MetaCache-GPU allows to build distributed databases across multiple GPUs. By default, it will use all available GPUs on the system. In difference to the database partitioning approach, the reference genomes are automatically distributed across multiple GPUs in a single run. Due to the dynamic distribution scheme and the concurrent execution on the GPUs, two database builds for the same input files will most likely differ. However, this should only have a negligible impact on classification performance.
By default, the query mode will use one GPU per database part. It is also possible to utilize more GPUs by replicating the database (see below), which may increase classification speed.
Since building databases is significantly faster on the GPU than on the CPU and will often take less than a minute, the build+query mode can be used to build and directly query a database without writing the database to disk.
The command line options of the GPU version are similar to the CPU version with a few notable exceptions:
-parts <#>
sets the number of GPUs to use (default: all available GPUs).
-replicate <#>
enables multiple GPU pipelines (default: 1). Each pipeline occupies one GPU per database part.
-kmerlen
kmer length is limited to 16 (default: 16).-sketchlen
sketch length is limited to 16 (default: 16).-winstride
window stride has to be a multiple of 4 (default: 112).-remove-overpopulated-features
is not supported.-remove-ambig-features
is not supported.
- submode
locations
is not available. - submode
featurecounts
is not available.
Merging multiple result files will not be performed on the GPU and will fall back to the CPU.