Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEAT] Outer joins for native executor (#2860)
Implement outer joins for Swordfish. (Yes, this PR is a little big. But: 1. at least tests run in CI now, so you don't need to just take my word for it now. 2. A lot of the diff is because I moved left/right joins to be together with the outer join operator. Therefore the HashJoinProbe operator is now just InnerHashJoinProbeOperator) Outer join probes (and left/right now) are implemented as a Streaming Sink. - During the `execute` phase of the streaming sink, probing is done concurrently via workers (this is the same implementation as all the other join types). The only difference is that during probing, workers will save the indices on the left side that have matches (using a mutable bitmap). - During the `finalize` phase, we merge together all the bitmaps across the concurrent workers (via a bitwise OR) to get a global view of all the indices that had matches. Then, we take all the indices that _**didn't**_ get a match and return them (with nulls for the right side). This is the same logic we currently use for the python runner. - Why is left/right with outer joins now? In the future, we may want to choose the build side for left/right/outer joins based on cardinality. This means that we may need the `used_indices` bitmaps for left/right joins as well. Note: I had to make Streaming Sink concurrency-aware to allow this. The changes in particular are: - Streaming Sinks can specify `max concurrency`, currently only LIMIT will have this set to 1. - `execute` accepts some `mut state` and finalize will consolidate all of the state, i.e. `Vec<Box<dyn State>>`. - In order to make sure that all the workers are done, they are spawned on a Worker Set, and return their state when done. This ensures that the `finalize` method doesn't get called before the workers are done with the `executes`. --------- Co-authored-by: Colin Ho <[email protected]> Co-authored-by: Colin Ho <[email protected]>
- Loading branch information