-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: implement filter propagation. #12617
Conversation
cc @nvanbenschoten @petermattis @cuongdo. Just in time for Q4 2016! |
Nice! A quick scan of this code reveals that I've almost completely forgotten the |
7e5ce13
to
e9b61e3
Compare
generally although I don't have a full understanding of everything that's going on here. Reviewed 2 of 2 files at r1, 34 of 34 files at r2, 8 of 8 files at r3. pkg/sql/filter_opt.go, line 68 at r3 (raw file):
This seems like an easy win - make a todo issue? pkg/sql/filter_opt.go, line 93 at r3 (raw file):
This smells a bit inefficient - how bad would it be to implement a mergeConjExprs directly? If that's too annoying I think this is fine. pkg/sql/filter_opt.go, line 295 at r3 (raw file):
grammar - you might mean refers? pkg/sql/filter_opt.go, line 312 at r3 (raw file):
Maybe remove pkg/sql/filter_opt.go, line 483 at r3 (raw file):
I suggest adding a comment explaining the inputs and output of this function. pkg/sql/filter_opt.go, line 555 at r3 (raw file):
These following three functions seem like great candidates for some quick unit testing! pkg/sql/filter_opt.go, line 570 at r3 (raw file):
Tests? pkg/sql/filter_opt.go, line 584 at r3 (raw file):
Tests? Comments from Reviewable |
TFYR Review status: 6 of 9 files reviewed at latest revision, 8 unresolved discussions. pkg/sql/filter_opt.go, line 68 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done - added to #12618. pkg/sql/filter_opt.go, line 93 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
It's bad -- pkg/sql/filter_opt.go, line 295 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. pkg/sql/filter_opt.go, line 312 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. pkg/sql/filter_opt.go, line 483 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. pkg/sql/filter_opt.go, line 555 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. pkg/sql/filter_opt.go, line 570 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. pkg/sql/filter_opt.go, line 584 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. Comments from Reviewable |
6422d59
to
6b4dc5a
Compare
@andreimatei any chance you could review this while radu is away? this way we could make this week's beta. |
Reviewed 6 of 6 files at r4. Comments from Reviewable |
Review status: 7 of 32 files reviewed at latest revision, 16 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 27 at r4 (raw file):
how about we get rid of this function so that we don't have to explain the 2-way interaction between it and pkg/sql/filter_opt.go, line 43 at r4 (raw file):
how about we call this
Otherwise, the calls to pkg/sql/filter_opt.go, line 51 at r4 (raw file):
This sounds to me like:
:P pkg/sql/filter_opt.go, line 54 at r4 (raw file):
should we start listing such methods near the pkg/sql/filter_opt.go, line 67 at r4 (raw file):
only if the filter is "pure", right? pkg/sql/filter_opt.go, line 70 at r4 (raw file):
nit: "fallthrough" in the context of a pkg/sql/filter_opt.go, line 81 at r4 (raw file):
mind documenting what this is about? As you've told me, it's about interacting with the "needed cols" infrastructure which needs to know that some columns might not be needed any more, if we got rid of parts of the filter. It might be worth mentioning that in the pkg/sql/filter_opt.go, line 92 at r4 (raw file):
explain that the filter is guaranteed to be... "simple" and fully acceptable by the scanNode (guaranteed by So here we don't need pkg/sql/filter_opt.go, line 122 at r4 (raw file):
as we discussed, let's just panic in this unexpected case pkg/sql/filter_opt.go, line 156 at r4 (raw file):
can we propagate the pkg/sql/filter_opt.go, line 298 at r4 (raw file):
why is that? What if we have a pkg/sql/filter_opt.go, line 310 at r4 (raw file):
I'm confused :( I guess your comment above has something to do with this:
Maybe you could expand it and make it more technical. pkg/sql/filter_opt.go, line 369 at r4 (raw file):
can you explain what the assumed layout is here? The merged columns are the first in pkg/sql/filter_opt.go, line 394 at r4 (raw file):
what's this pkg/sql/filter_opt.go, line 500 at r4 (raw file):
perhaps you can hang more hints here: do we represent expressions as a series of conjunctions? (I think) it's because conjunctions can be independently "pushed down" on our trees pkg/sql/filter_opt.go, line 505 at r4 (raw file):
can this be, say, a disjunction? I think the name "predicate" might suggest very narrow expressions, maybe change it or expand more. Comments from Reviewable |
Review status: 5 of 14 files reviewed at latest revision, 16 unresolved discussions. pkg/sql/filter_opt.go, line 27 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/filter_opt.go, line 43 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/filter_opt.go, line 51 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/filter_opt.go, line 54 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Excellent idea. Done. (plan.go) pkg/sql/filter_opt.go, line 67 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Right. Extended the comment accordingly. pkg/sql/filter_opt.go, line 70 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Thanks for explaining. Done. pkg/sql/filter_opt.go, line 81 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Good idea! I changed the interface of Rebind as suggested. pkg/sql/filter_opt.go, line 92 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
For consistency actually a Reset is needed here. Well spotted! What do you mean "simple"? pkg/sql/filter_opt.go, line 122 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/filter_opt.go, line 156 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I added the explanatory comment. Thanks for the suggestion. Also filed as #12762. pkg/sql/filter_opt.go, line 298 at r4 (raw file):
Added a paragraph with examples in the comment to clarify.
That's a separate optimization. Filed as #12763.
No. There are no more complex filters. This code can propagate arbitrarily complex filters already.
This doesn't make sense (to me). pkg/sql/filter_opt.go, line 310 at r4 (raw file):
Happens to the best of us :)
We don't! And the beauty of it: it doesn't matter! The only thing that matters is 1) that the values of their Idx field is in the proper range; that has been taken care of by the parent node; and 2) that the node that eventually accepts the filter does a Rebind() to ensure that the container is attached properly. The reason why the container doesn't matter during the transformation (i.e. until Rebind) is that the container is only used to retrieve the type of the node, in which case a "wrong" (previous) container will do just as well because the type of an indexedvar never changes across nodes, or to render it to a string, which never happens during this transform, or to evaluate it to a Datum, which also never happens here.
Done - added an explanatory comment at the beginning of the file. pkg/sql/filter_opt.go, line 369 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/filter_opt.go, line 394 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
That's the definition of the results produced by USING or NATURAL. It's not equivalent to either operands because of NULLs. Blame the SQL standard. pkg/sql/filter_opt.go, line 500 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Yes, see the new long comment at the beginning. pkg/sql/filter_opt.go, line 505 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
No, the word "predicate" is general in logic theory, it really can be any boolean expression. Comments from Reviewable |
pkg/sql/filter_opt.go, line 542 at r5 (raw file):
wouldn't a constant boolean expression, after normalization, be either true or false? or are there cases where we can't simplify even though it's constant? Comments from Reviewable |
Review status: 5 of 14 files reviewed at latest revision, 30 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 394 at r4 (raw file): Previously, knz (kena) wrote…
Actually it is equivalent for inner joins (which is the case here)... pkg/sql/filter_opt.go, line 40 at r5 (raw file):
I had a lot of trouble understanding these definitions. I think it would be very helpful to have a brief (if handwaivy) explanation of each function. pkg/sql/filter_opt.go, line 64 at r5 (raw file):
Shouldn't this be pkg/sql/filter_opt.go, line 68 at r5 (raw file):
block filter propagation (not filtering) pkg/sql/filter_opt.go, line 101 at r5 (raw file):
Is there really a big benefit to using lists of predicates vs just an expression? I understand the ease of merging but that isn't hard to do with expressions either (just expand any top-level pkg/sql/filter_opt.go, line 138 at r5 (raw file):
In most situations any remainingFilter would just end up in a It seems to me that the benefit of returning a pkg/sql/filter_opt.go, line 175 at r5 (raw file):
As mentioned before, I think we should propagateOrWrap here (and below for sortNode). pkg/sql/filter_opt.go, line 177 at r5 (raw file):
Why not pkg/sql/filter_opt.go, line 182 at r5 (raw file):
filtering pkg/sql/filter_opt.go, line 321 at r5 (raw file):
Should this be pkg/sql/filter_opt.go, line 381 at r5 (raw file):
This query doesn't actually work ( This would be worth mentioning in the big comment around "f propagates filters through render nodes". pkg/sql/filter_opt.go, line 527 at r5 (raw file):
I am surprised we don't use pkg/sql/filter_opt.go, line 530 at r5 (raw file):
need pkg/sql/testdata/join, line 685 at r3 (raw file):
Are these tabs? I think it's better to use spaces in these test files (plus it's inconsistent across this query) Comments from Reviewable |
Review status: 5 of 14 files reviewed at latest revision, 30 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 394 at r4 (raw file): Previously, RaduBerinde wrote…
Aw, I had forgotten that! Nice. Now we can propagate those filters too! pkg/sql/filter_opt.go, line 40 at r5 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 64 at r5 (raw file): Previously, RaduBerinde wrote…
Yeah, that's smarter! Thanks for finding this out. Done. pkg/sql/filter_opt.go, line 68 at r5 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 101 at r5 (raw file): Previously, RaduBerinde wrote…
I found the code both easier to read and to test this way. pkg/sql/filter_opt.go, line 138 at r5 (raw file): Previously, RaduBerinde wrote…
pkg/sql/filter_opt.go, line 175 at r5 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 177 at r5 (raw file): Previously, RaduBerinde wrote…
It doesn't rely on results, but to avoid extra temp assignments I use e.g. pkg/sql/filter_opt.go, line 182 at r5 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 321 at r5 (raw file): Previously, RaduBerinde wrote…
No, it's never possible that we get more filters remaining after propagate than what we put in. (check the formulas) In your example the filterNode will absorb the remaining filter from the (Although there was an error in the case for pkg/sql/filter_opt.go, line 381 at r5 (raw file): Previously, RaduBerinde wrote…
I had indeed made a mistake in the example, which is now fixed. What do you mean with "this would be worth mentioning"? What "this" in this context? pkg/sql/filter_opt.go, line 527 at r5 (raw file): Previously, RaduBerinde wrote…
Good idea. I changed to use splitFilter. It seems to work. pkg/sql/filter_opt.go, line 530 at r5 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 542 at r5 (raw file): Previously, RaduBerinde wrote…
pkg/sql/testdata/join, line 685 at r3 (raw file): Previously, RaduBerinde wrote…
Done. Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 14 unresolved discussions. pkg/sql/filter_opt.go, line 92 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Wow, your comment tells me either you are envisioning several major potential problems here which I have not even thought about, or you are very confused. Let me reiterate what I know to be true, and feel free to extend this with your analysis:
pkg/sql/filter_opt.go, line 29 at r6 (raw file): Previously, andreimatei (Andrei Matei) wrote…
T and F are the usual shorthand notations for true and false, do you foresee a readability issue here? pkg/sql/filter_opt.go, line 35 at r6 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I tried and it makes the long formulas below harder to read. pkg/sql/filter_opt.go, line 46 at r6 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Don't laugh. This makes me unhappy: I spent half a day to troubleshoot an issue in my code that wasn't in the functional specification. If the former was automatically derived from the latter, this company would have saved several hundred dollars. pkg/sql/parser/indexed_vars.go, line 188 at r6 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. Comments from Reviewable |
Gtg? |
@andreimatei I just extended the renderNode comment as suggested. PTAL |
Review status: 2 of 12 files reviewed at latest revision, 11 unresolved discussions. pkg/sql/filter_opt.go, line 36 at r7 (raw file):
Maybe mention that by definition of pkg/sql/filter_opt.go, line 51 at r7 (raw file):
I think this should be first (since the others reference it) pkg/sql/filter_opt.go, line 149 at r7 (raw file):
👍 pkg/sql/filter_opt.go, line 506 at r7 (raw file):
This name confused me for a bit, doesn't really suggest it's the boundary between left and right.. maybe pkg/sql/filter_opt.go, line 540 at r7 (raw file):
I understand what we're doing (and it's pretty neat TBH), but I don't see why we can't simply add a case in the callbacks to the "main" splitFilter calls below to accept the merged columns, eg:
pkg/sql/filter_opt.go, line 579 at r7 (raw file):
Aren't we letting through vars that refer to merged columns (leading to panics later in shiftConj)? Also, it's a bit odd that instead of just reindexing the vars here, we do it after the fact. pkg/sql/filter_opt.go, line 612 at r7 (raw file):
Nice! Comments from Reviewable |
gtg gg thanks Review status: 2 of 12 files reviewed at latest revision, 11 unresolved discussions. pkg/sql/filter_opt.go, line 92 at r4 (raw file): Previously, knz (kena) wrote…
Ok now I understand what's going on, thank you. Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 9 unresolved discussions. pkg/sql/filter_opt.go, line 36 at r7 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 51 at r7 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 506 at r7 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, RaduBerinde wrote…
oh, interesting. I'll try it out! pkg/sql/filter_opt.go, line 579 at r7 (raw file): Previously, RaduBerinde wrote…
I was counting on the fact that the USING, NATURAL and ON syntaxes are mutually exclusive. So there cannot be an ON filter using merged columns in the input. Although now you're saying it, if we propagate the filters two times, the 2nd time can run into this situation. I'll think about it more. Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending. pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, knz (kena) wrote…
So I tried and that doesn't work: if we do what you suggest, then the part of the predicate that refers to the merged columns will always migrate to the left join operand, and will not remain in the "remainder" expr (the one subsequently split into right/combined exprs). We really need to duplicate them ahead of the split. pkg/sql/filter_opt.go, line 579 at r7 (raw file): Previously, knz (kena) wrote…
I checked and by construction there can never be a filter that refers to the merged columns (because they are all rewritten by the code at the beginning). So this is fine as-is. Comments from Reviewable |
pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, knz (kena) wrote…
The second call doesn't need to be run on the remaining expression, it can be run on the same expression as the first call.. Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending. pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, RaduBerinde wrote…
But how do you compute the combinedExpr then? We have to split both the left and right sub-exprs away from it don't we? Comments from Reviewable |
pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, knz (kena) wrote…
Ah, but then we don't get the smallest "final" remaining expression (combinedExpr).. Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending. pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, RaduBerinde wrote…
Yes that's the point. :-) Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 540 at r7 (raw file): Previously, RaduBerinde wrote…
Sorry, typed at the same time. Yes, makes sense! pkg/sql/filter_opt.go, line 579 at r7 (raw file): Previously, knz (kena) wrote…
You're right. Could you just add a comment here explaining that there should be no references to merged columns left? Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 579 at r7 (raw file): Previously, RaduBerinde wrote…
Done for the comment. I tried to inline shiftConj but that really makes a big blob of code that is really better self-contained in a separate function. Comments from Reviewable |
pkg/sql/filter_opt.go, line 579 at r7 (raw file): Previously, knz (kena) wrote…
I don't understand, isn't it just replacing Comments from Reviewable |
tl;dr: this patch implements a primitive form of filter propagation: queries of the form `SELECT * FROM a, b WHERE a.x = 10 and b.y = 20` are transformed to `SELECT * FROM (SELECT * FROM a WHERE x = 10), (SELECT * FROM b WHERE y = 20)`. Long explanation: go take a book on relational algebra. Filtering is commutative with a bunch of stuff.
Review status: 2 of 12 files reviewed at latest revision, 5 unresolved discussions. pkg/sql/filter_opt.go, line 579 at r7 (raw file): Previously, RaduBerinde wrote…
Oh now I understand. Thanks for the hint. Done. Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 573 at r9 (raw file):
A suggestion (possibly for a future change) - it wouldn't be too hard to also duplicate any vars that refer to right columns which are constrained to be equal to a left column (and vice-versa). This can happen regardless if merged columns exist. For example Comments from Reviewable |
Review status: 2 of 12 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/sql/filter_opt.go, line 573 at r9 (raw file): Previously, RaduBerinde wrote…
Yes I initially thought about that and I was kinda hoping that analyzeExpr (simplifyExpr) was doing it, but it doesn't :) Good idea nonetheless, filed as #12892. Comments from Reviewable |
Ok I'm going to merge this now. Makes me happy! |
tl;dr: this patch implements a primitive form of filter propagation:
queries of the form
SELECT * FROM a, b WHERE a.x = 10 and b.y = 20
are transformed to
SELECT * FROM (SELECT * FROM a WHERE x = 10), (SELECT * FROM b WHERE y = 20)
.Long explanation: go take a book on relational algebra. Filtering is
commutative with a bunch of stuff.
Note: this PR is based off the commits from #12616. I will rebase when #12616 is merged. Only the last two commits are specific to this PR.
Fixes #8566.
Fixes #10632.
Fixes #10633.
Fixes #11192.
Fixes #11723.
This change is