README.md

IMDB - Analytics

In this example we demonstrate Dozer's capabilities to find meaningful insights by processing large amounts of data.

The dataset used here is taken from CMU 15-445/645 Coursework which also has many interesting ideas for SQL queries.

For this example we are loading the data into a MySQL database. Steps to do the same can be found here.

The dataset has six tables: akas , crew , episodes , people , ratings , title, with roughly 17 million data records in total.

CPU	Cores	Memory
Ryzen7 4800H	8	16GB

It is also important to note, the experiments are being run on a NVMe SSD which offers a higher read speed than conventional SATA storage systems.

In this section we perform various experiments with different configuration files and analyze their performance results.
Experiments 2,3,4 are essentially the same, done via different ways.

Sr.no	Experiment	Aggregations	JOINs	CTEs	Description
1	No ops	0	0	0	Running directly from source to cache
2	Double JOINs	1	2	0	Running with two JOIN operations
3	CTEs & JOINs	1	2	2	Running with CTE & JOIN operations
4	Sub Queries	1	2	0	Running with sub queires & JOINs