-
Notifications
You must be signed in to change notification settings - Fork 331
Types of equi joins (JOIN Simplification and Acceleration Series 2)
Let’s look at three types of equi-joins:
Two tables are associated by matching a certain field in table A with the primary key of table B (By field association, as we explained in the previous essay, it is the equivalence of corresponding values in fields specified by the joining condition). Table A is called the fact table, and table B the dimension table. The field in table A that will match table B’s primary key is table A’s foreign key that points to table B; table B is thus table A’s foreign key table.
In this context, the primary key is a logical one, which is the field whose values are unique and can uniquely identify records, and which is not necessarily the one already set on the database table.
The relationship between a table and its foreign key table is many-to-one, which supports JOIN and LEFT JOIN only. The FULL JOIN is extremely rare.
A typical example is the relationship between Production transaction table and Product information table.
The foreign-key-based association is asymmetrical. And positions of fact table and dimension table cannot be swapped.
Two tables are associated by matching table A’s primary key and table B’s primary key. The matching is a one-to-one relationship, and each table is the other’s homo-dimension table. The join between homo-dimension tables supports JOIN, LEFT JOIN and FULL JOIN. But for most of the data structure designs, FULL JOIN is rare.
A typical example is the relationship between Employees table and Managers table.
The relationship between two homo-dimension tables is symmetrical and their positions are equal. Homo-dimension tables can also form an equivalence relationship. Suppose A and B are homo-dimension tables, B and C are homo-dimension tables, then A and C are also homo-dimension tables.
Two tables are associated by matching table A’s primary key and one or several of table B’s primary key fields. Table A is the primary table, and table B is the sub table. It is a one-to-many relationship from table A to table B. The join between them only involves JOIN and LEFT JOIN.
A typical example is the relationship between Orders table and Order details table.
The relationship between the primary table and the sub table is asymmetrical and the matching has a fixed direction.
SQL does not differentiate the concept of foreign key join and the primary-sub join. In the context of SQL, the many-to-one relationship and the one-to-many relationship are essentially the same thing except for the direction in which tables are associated. Indeed, an orders table can be considered the foreign key table of the order details table. The purpose of distinguishing them is to use different methods to simplify their syntax and optimize their performance.
The three types of joins cover most of the equi-join scenarios. Almost all equi-joins that have business significance fall into the three types. To divide equi-joins into the three types will not decrease the range of its application scenarios.
All the three types of equi-joins involve the primary key. There is no many-to-many relationship. But, is it unnecessary to take the relationship into account on all occasions?
Yes. The relationship almost does not have any meaning in real-world business practice.
A many-to-many correspondence occurs when the associative fields for JOINing two tables do not contain the primary key. In that case on almost all occasions, there exists a larger table that correlates the two tables with each other by using them as dimension tables. When the Student table and the Subject table are JOINed, there will be a Score table that uses them as dimension tables. It makes no sense in real-world business practice to purely JOIN such two tables.
So, it is almost certain that a SQL statement is wrong or that the data is bad if a many-to-many relationship occurs. The relationship is useful for checking mistakes in JOIN operations.
But we do not deny the existence of an exception by using “almost”. On rare occasions, the many-to-many relationship has business significance. One example is performing matrix multiplication in SQL, where an equi-join with many-to-many relationship occurs. Try to code the query by yourselves.
SQL’s JOIN definition, which is the filtered Cartesian product, is indeed very simple. The simple definition covers more variety of joins, including the many-to-many equi-join as well as the non-equi-join. On the other hand, it is too simple to fully reflect the characteristics of equi-join, which is the most commonly-seen join operation, depriving SQL of opportunities to better code a query for achieving a computing goal according to the characteristics and bringing great difficulty in expressing and optimizing complex queries (joins involving several or more tables or with a nested query). But by making good use of the characteristics, we can design simple syntax to make computations more performant, which will be discussed in detail in subsequent essays.
In a word, it is more sensible to define the rare JOIN scenarios as a special type of operations than to include them in a universal definition.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code