Implement inequality joins #701

wmalpica · 2020-05-26T22:33:36Z

This feature is waiting on this:
rapidsai/cudf#2792
rapidsai/cudf#3628

felipeblazing · 2021-04-19T16:10:42Z

If we want inequality joins we may not be able to wait for these primitives to exist on the CUDF side. One of the issues above is closed and the other is just a request that has been open for a long time.

In the Simplicity engine (very old blazingdb) we had an approach that required significant materialization but that scaled relatively well on a single node. We stable sorted the data according to the columns that were part of the inequality joins, created an RLE representation of the larger table, then performed bounded look ups from the smaller table into the larger table to get something like a series of ranges per row that you were joined to. So for each row you end up with something like 0-5, 100-112, 300-340. If there were any equality joins you perform those first to reduce the number of rows to be analyzed during the inequality phase.

Another option that has been floated is using an AST to evaluate the join condition but this is somewhat akin to performing a cartesian join ( though not necessarily one that has to be materialized) followed by a filter step which seems like its begging to be improved.

wmalpica · 2021-04-19T20:30:09Z

I think we will want to do a combination of both approaches. For inequality joins, we will want to do distribution based on a order as opposed to hashes

felipeblazing · 2021-04-19T20:48:16Z

Im not sure combination is the right way to go here. I think we need to pick hopefully one strategy for the first implemenation. I am currently reviewing literature on the subject to see what else I can find.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement inequality joins #701

Implement inequality joins #701

wmalpica commented May 26, 2020 •

edited

Loading

felipeblazing commented Apr 19, 2021

wmalpica commented Apr 19, 2021

felipeblazing commented Apr 19, 2021

Implement inequality joins #701

Implement inequality joins #701

Comments

wmalpica commented May 26, 2020 • edited Loading

felipeblazing commented Apr 19, 2021

wmalpica commented Apr 19, 2021

felipeblazing commented Apr 19, 2021

wmalpica commented May 26, 2020 •

edited

Loading