Filter nulls from joins where possible to improve performance #754
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes #569
I pulled out the performance fix #594 yesterday because I wanted to be sure that all of the queries were correct for the 0.2 release and I wasn't sure I could find the root cause of the issue before the release.
I found it. When doing a join each incoming table has the join keys extracted from it and then the join is performed. The null filtering was intended to go after the keys were extracted and before the actual join. On the build side I had done that, but on the stream side I had inserted the null filtering before the keys were extracted. That ended up causing, in some cases, an unrelated column to have all of the nulls removed from it. fa1215b has the fix. I move the null filtering to after the project on the stream side and it cleans up the code a bit more too.
There is a lot of code changes in here, but some of it is done 3 times because of the shim layer.