Skip to content

Repository for EE 5351 Applied Parallel Programming final project. This project is to implement RAPIDS Groupby function in CUDA.

Notifications You must be signed in to change notification settings

chrispypatt/RAPIDS-Groupby

Repository files navigation

RAPIDS-Groupby

Repository for EE 5351 Applied Parallel Programming final project on sorting based Groupby and is being updated for EE 5355 Algorithmic Techniques for Scalable Many-core Computing final project on hashed based Groupby. This project is to implement RAPIDS Groupby function in CUDA.

Team members EE 5351: Aaron Nightingale, Christopher Patterson, Jersin Nguetio, Menglu Liang, Tonglin Chen

Team members EE 5355: Tonglin Chen, Tianming Cui, Christopher Patterson, Yadu Kiran

Compiling Options:

  • dbg: Compile target using debug mode, 512 thread per block

  • NOPRINT: Target will not print the result from groupby

  • PRIV: Experimental privatization without relaunches

  • TESLA: Explicitly use 32KB shared memory per block. Should only be used with PRIV

  • CPU_SAMPLE: Target will run CPU sampling to predict the number of unique keys

  • GPU_SAMPLE: Target will run GPU sampling to predict the number of unique keys

Command line usage for sorting based GroupBy:

make
./groupby                                                           # Data Entries:  100k, key_columns: 2, row_columns: 3, unique keys: 4
./groupby <num_rows>                                                # Data Entries: num_rows, key_columns: 2, row_columns: 3, unique keys: 4
./groupby <num_rows> <key_cols> <val_cols>                          # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys: 4
./groupby <num_rows> <key_cols> <val_cols> <distinct_keys>  # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys : distinct_keys

Command line usage for hashed based GroupBy:

make groupby_hash
./groupby_hash                                                           # Data Entries:  100k, key_columns: 2, row_columns: 3, unique keys: 4, hashtable rows: 1003
./groupby_hash <num_rows>                                                # Data Entries: num_rows, key_columns: 2, row_columns: 3, unique keys: 4, hashtable rows: 1003
./groupby_hash <num_rows> <key_cols> <val_cols>                          # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys: 4, hashtable rows: 1003
./groupby_hash <num_rows> <key_cols> <val_cols> <distinct_keys> # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys : distinct_keys, hashtable rows: 1003
./groupby_hash <num_rows> <key_cols> <val_cols> <distinct_keys> <hash_table_size> # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys : distinct_keys, hashtable rows: hash_table_size

Command line usage for hashed based GroupBy with hash table size prediction:

make groupby_hash <mode>												 # mode: CPU_SAMPLE=1 for cpu sampling, GPU_SAMPLE=1 for gpu sampling
./groupby_hash                                                           # Data Entries:  100k, key_columns: 2, row_columns: 3, unique keys: 4
./groupby_hash <num_rows>                                                # Data Entries: num_rows, key_columns: 2, row_columns: 3, unique keys: 4
./groupby_hash <num_rows> <key_cols> <val_cols>                          # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys: 4
./groupby_hash <num_rows> <key_cols> <val_cols> <distinct_keys>  # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys : distinct_keys

The program will populate random data, compute on CPU and GPU then validate the results.

All rights reserved.

About

Repository for EE 5351 Applied Parallel Programming final project. This project is to implement RAPIDS Groupby function in CUDA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •