Skip to content

ericrife/NDSv2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPCDS-Official

This is for the official TPCDS runs.

https://github.com/cloudera/impala-tpcds-kit

This command creates the dataset. Modify the -p option to accomodate youre cluster CPU count

hadoop jar target/tpcds-gen-1.0-SNAPSHOT.jar -d /data_raw -p 200 -s 3000

Update the prefixes to the correct paths to your HDFS infrastructure

spark-submit --driver-memory 32G --conf spark.rapids.sql.enabled=false --conf spark.executor.cores=30 --conf spark.executor.memory=100G --conf spark.task.cpus=30 ./ds-convert.py --input-prefix="hdfs://tpcds-2gpu-8node-t4-cluster-m/data_raw/" --output-prefix="hdfs://tpcds-2gpu-8node-t4-cluster-m/data_parquet/" --input-suffix=""

From inisde the ./DSGen-software-code-3.2.0rc1/tools directory run the following command to generate the queries. Adjust output dir as needed.

./dsqgen -DIRECTORY ../query_templates -INPUT ../query_templates/templates.lst -SCALE 3000 -DIALECT netezza -OUTPUT_DIR /home/erife/queries/streams -streams 10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published