Skip to content

Composing joins

Roy Brondgeest edited this page Apr 25, 2018 · 7 revisions

Introduction

The DSL allows for automatic "merging" of complex joins, doing all the hard work of passing on the columns for you.

This is especially useful, because Clickhouse does not pass on the columns of the right hand table in a join trough the * or all() operator, and therefore they have to be explicitly added to the wrapping SELECT statement.

While Clickhouse has no issues handling multiple joins (we use up to 5 joins in production on HUGE tables) those queries can become quite huge.

The merge operator expects that you provide an on operator following it up, for which you provide one or more columns that are present in both queries (same name and type). The table results will be joined on these columns.

Merges are done recursively, and can be nested. Theoretically allowing you to write queries that perform an endless number of joins.

Explaining the joining mechanism

The merge...on functionality by default assumes you are using only both grouped and aggregated columns so the default on operator then helps by adding the columns provided to it, to the groupBy and orderBy statements.

If this is not desired, you can use onUngrouped and provide your own groupBy (and order by) statement.

Example

val today = new DateTime()
val qry1 = select(sum(order_value), order_customer_id as "customer_id", date)
  .from(ordersTable)
  .where(date >= today)
val qry2 = select(customer_id, customer_city, date)
  .from(customer)
  .where(date >= today and customer_country.isEq("Netherlands"))
  .groupBy(customer_city)

val merge = qry1 merge qry2 on(cust_id, date)

Result would be something like: (note the customer_name in the wrapping select)

SELECT *, customer_city 
FROM (
  SELECT order_id, order_customer_id as customer_id, sum(order_value), date 
  FROM orders 
  WHERE date >= 154984763000
  GROUP BY customer_id, date
) ALL LEFT JOIN (
  SELECT customer_id, customer_city, date 
  FROM customers 
  WHERE date >= 154984763000 AND customer_country = "Netherlands"
  GROUP BY customer_city, customer_id, date
) 
USING (customer_id, date)

Todo pages:

Clone this wiki locally