IMDB - Analytics

Experiment 3

Finding the actors and actresses with the most action movies.

Dozer SQL provides us with CTEs which can improve readability, as well as the option to reuse the same query multiple times.

We can use a similar approach as used in the previous example, or use CTEs as described. The config file can be found here.

 with acting_crew as (
  select person_id, category
  from crew
  where category = 'actor' or category = 'actress'
 ),
 
 action_titles as (
  select title_id
  from titles
  where genres like '%Action%'
 )
 
 select p.name, r.category, count(1) as titles
 into out_table
 from people p 
 join acting_crew r on p.person_id = r.person_id  
 join action_titles a on a.title_id = r.title_id  
 group by p.name,r.category;

Instructions

dozer clean -c exp3-config.yaml
dozer build -c exp3-config.yaml
dozer run -c exp3-config.yaml

Dozer should start running after executing the commands, but ordering the data is still left! Dozer API provides the option to add query such as order by, limit and many more... to improve the visibility of the data on the endpoints.

The query described can be passed to the REST endpoints GET:localhost:8080/endpoint1 to order the data in descending order w.r.t titles.

{
  "order_by":{"titles":"desc"}
}

Findings

Source

Stores

Roughly took 2 mins to process all the records.
Took about the same time as Exp2 since the same data was ingested.
Pipeline latency stayed under 0.25s.

Start Time	End Time	Elapsed
11:04:40 PM	11:06:34 PM	~ 2 mins

Click here to go back to homepage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiment3.md

experiment3.md

IMDB - Analytics

Experiment 3

Instructions

Findings

Source

Stores

Files

experiment3.md

Latest commit

History

experiment3.md

File metadata and controls

IMDB - Analytics

Experiment 3

Instructions

Findings

Source

Stores