Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add custom equality behavior to the hash/merge join (#585)
This PR (hopefully) concludes various discussions around flags such as `null_equals_null` (Datafusion) and `null_aware` (Velox). The goal of these flags is to slightly tweak the definition of "equality" in an equijoin relation. This PR introduces a new EquiJoinKey message that can be used by physical join relations to define how keys should be compared. These custom equality functions are needed in a variety of scenarios: ## Optimizing set operations Set operations (e.g. set difference) can sometimes be satisfied by an equi-join. When this happens the user typically wants the equality comparison to be "is not distinct from" ## Flattening correlated subqueries Some kinds of correlated subqueries can be removed during optimization and replaced with an anti-join. Depending on the original query ("not in" vs "where not exists") there may be slightly different behaviors with respect to null an we may want to use "might equals" as the comparison. ## String collations Collations define the ordering and equality of a column. Different columns can have different collations. The equi-join must use the comparison function defined by the collation.
- Loading branch information