tpch-spark

TPC-H queries implemented in Spark using the DataFrames API. Tested under Spark 2.4.0

Savvas Savvides

Generating tables

Under the dbgen directory do:

make

This should generate an executable called dbgen

./dbgen -h

gives you the various options for generating the tables. The simplest case is running:

./dbgen

which generates tables with extension .tbl with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the -s option:

./dbgen -s 10

will generate roughly 10GB of input data.

You can then either upload your data to hdfs or read them locally.

Running

First compile using:

sbt package

Make sure you set the INPUT_DIR and OUTPUT_DIR in TpchQuery class before compiling to point to the location the of the input data and where the output should be saved.

You can then run a query using:

spark-submit --class "main.scala.TpchQuery" --master MASTER target/scala-2.11/spark-tpc-h-queries_2.11-1.0.jar ##

where ## is the number of the query to run e.g 1, 2, ..., 22 and MASTER specifies the spark-mode e.g local, yarn, standalone etc...

Other Implementations

Data generator (http://www.tpc.org/tpch/)
TPC-H for Hive (https://issues.apache.org/jira/browse/hive-600)
TPC-H for PIG (https://github.com/ssavvides/tpch-pig)

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
answers		answers
dbgen		dbgen
src/main/scala		src/main/scala
.gitignore		.gitignore
.jvmopts		.jvmopts
.project		.project
LICENSE		LICENSE
README.md		README.md
sqlite+tpc-h+windows-v03.pdf		sqlite+tpc-h+windows-v03.pdf
tpc-h_v2.17.3.pdf		tpc-h_v2.17.3.pdf
tpch-dike.txt		tpch-dike.txt
tpch.sbt		tpch.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tpch-spark

Generating tables

Running

Other Implementations

About

Releases

Packages

Languages

License

open-infrastructure-labs/caerus-tpch-spark

Folders and files

Latest commit

History

Repository files navigation

tpch-spark

Generating tables

Running

Other Implementations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages