GPU Benchmarks

Several GPU benchmark programs are provided, which test compressing and then decompressing input data, reporting the compression throughput, decompression throughput, and compression ratio. You can run the benchmark executables on your own files, or test them on standard benchmark data sets, described below.

Running GPU Benchmarks

The GPU benchmark executables are:

benchmark_ans_chunked {-f|--input_file} <input_file>

benchmark_bitcomp_chunked {-f|--input_file} <input_file>
                          [{-t|--type} {char|uchar|short|ushort|int|uint|longlong|ulonglong}]
                          [{-a|--algorithm} {0|1}]

benchmark_cascaded_chunked {-f|--input_file} <input_file>
                           [{-t|--type} {char|uchar|short|ushort|int|uint|longlong|ulonglong}]
                           [{-r|--num_rles} <num_RLE_passes>]
                           [{-d|--num_deltas} <num_delta_passes>]
                           [{-b|--num_bps} <do_bitpack_0_or_1>]

benchmark_gdeflate_chunked {-f|--input_file} <input_file>
                           [{-a|--algorithm} {0|1|2}]

benchmark_deflate_chunked {-f|--input_file} <input_file>
                           [{-a|--algorithm} {0|1|2}]                           

benchmark_lz4_chunked {-f|--input_file} <input_file>
                      [{-t|--type} {bits|char|uchar|short|ushort|int|uint}]

benchmark_snappy_chunked {-f|--input_file} <input_file>

benchmark_hlif {ans|bitcomp|cascaded|gdeflate|lz4|snappy}
               {-f|--input_file} <input_file>
               [{-t|--type} {char|short|int|longlong}]
               [{-m|--memory}]

benchmark_lz4_synth

benchmark_snappy_synth [{-m|--max_byte} <max_random_byte_value>] [{-b|--batch_size} <num_chunks>]

Most of the benchmark executables also support:

{-g|--gpu} <gpu_num>                       GPU device number to use for benchmarking
{-w|--warmup_count} <num_iterations>       The number of warmup (unrecorded) iterations to perform
{-i|--iteration_count} <num_iterations>    The number of recorded iterations to perform
{-x|--duplicate_data} <num_copies>         The number of copies to make of the input data before compressing
{-c|--csv_output} {false|true}             When true, the output is in comma-separated values (CSV) format
{-e|--tab_separator} {false|true}          When true and --csv_output is true, tabs are used to separate values,
                                           instead of commas
{-s|--file_with_page_sizes} {false|true}   When true, the input file must contain pages, each prefixed with int64 size
{-p|--chunk_size} <num_bytes>              Chunk size when splitting uncompressed data
{-?|--help}                                Show help text for the benchmark

For compressors that accept a data type option, input data for which all of the input matches that type will usually compress better than arbitrary data. The sizes of the types are 1 byte for char/uchar/bits, 2 bytes for short/ushort, 4 bytes for int/uint, 8 bytes for longlong/ulonglong. Input files whose sizes aren't multiples of the data type size are unsupported.

If you would like to use standard benchmark data sets, there are two described here, "TPC-H" and "Mortgage", both of which are in the form of text tables that will first need to have a column extracted and converted to binary data, using the benchmarks/text_to_binary.py script.

To obtain TPC-H data tables, randomly generating a simulated database table of purchases:

Clone and compile https://github.com/electrum/tpch-dbgen
Run ./dbgen -s <scale factor> to generate data in the file lineitem.tbl. A larger scale factor will result in a larger generated table.

To obtain Fannie Mae's Single-Family Loan Performance Data

Download any of the archives from https://docs.rapids.ai/datasets/mortgage-data
Unpack perf/Performance_<year><quarter>.txt, e.g. Performance_2000Q4.txt

benchmarks/text_to_binary.py is provided to read a text file (e.g. csv) containing a table of data and output a specified column of data into a binary file. Both the TPC-H and Mortgage data sets use the vertical pipe character | as a column separator (delimiter) and store one row per text line. Usage:

python benchmarks/text_to_binary.py <input_text_file> <column_number> {int|long|float|double|string} <output_binary_file> [<column_separator>]

For example, to extract column 10 (the 11th column) from lineitem.tbl, where columns are separated by |, and write it to binary file shipdate_column.bin as a sequence of 8-byte integers, run:

python benchmarks/text_to_binary.py lineitem.tbl 10 long shipdate_column.bin '|'

The default delimiter, if not specified, is a comma character, and the string data type converts the text to UTF-16 and concatenates all of the text in the output file. float is single-precision floating-point (4 bytes), and double is double-precision floating-point (8 bytes).

Below are some example benchmark results running the LZ4 compressor via the high-level interface (hlif) and the low-level interface (chunked) on a A100 for the Mortgage 2009Q2 column 0:

./bin/benchmark_hlif lz4 -f /data/nvcomp/benchmark/mortgage-2009Q2-col0-long.bin
----------
uncompressed (B): 329055928
comp_size: 8582564, compressed ratio: 38.34
compression throughput (GB/s): 90.48
decompression throughput (GB/s): 312.81

./bin/benchmark_lz4_chunked -f /data/nvcomp/benchmark/mortgage-2009Q2-col0-long.bin
----------
uncompressed (B): 329055928
comp_size: 8461988, compressed ratio: 38.89
compression throughput (GB/s): 95.87
decompression throughput (GB/s): 320.70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks.md

Benchmarks.md

GPU Benchmarks

Running GPU Benchmarks

Files

Benchmarks.md

Latest commit

History

Benchmarks.md

File metadata and controls

GPU Benchmarks

Running GPU Benchmarks