v3: normalizer: smarter dedup #3262

sougou · 2017-09-30T05:16:16Z

Based on real-life workloads, we found that it may not be a good
idea to dedup all values. Specifically, values used in DMLs like
INSERT etc. should not be deduped. They actually end up polluting
the plan cache with all kinds of combinations.

With the change, values are deduped only if they are within selects.
This deduping happens even if there are subqueries withing DMLS,
while the DML parts are still not deduped.

Additionally, it doesn't make sense to take the effort to dedup
values that are too long. So, I've added a check where if a value
is longer than 256 bytes, we blindly create a new bind var.

demmer

Overall this looks good -- I have a few comments on style and one general point about propagating errors out of the normalizer that applies both before and after this change.

demmer · 2017-09-30T10:55:56Z

go/vt/sqlparser/normalizer.go

-			// Modify RHS to be a list bindvar.
-			node.Right = ListArg(append([]byte("::"), bvname...))
+	nz := newNormalizer(stmt, bindVars, prefix)
+	_ = Walk(nz.WalkStatement, stmt)


Unrelated to this particular change, but it seems like we should be propagating errors in the Normalize traversal out instead of silently swallowing them?

For PII reasons we actually do depend on normalize working properly, so it would be better for us to fail the query with an error than let it go through unnormalized.

I looked at the error conditions. They are all for invalid inputs, which will be caught downstream. So, it's better that normalizer ignores them and completes everything else. That way, any other PII would have been converted to bind vars.

demmer · 2017-09-30T10:58:41Z

go/vt/sqlparser/normalizer.go

 }

-func sqlToBindvar(node SQLNode) *querypb.BindVariable {
+func (nz *normalizer) sqlToBindvar(node SQLNode) *querypb.BindVariable {


Following up on the above comment re error propagation, this should also return an error if the value should have been converted to a bind var but failed for some reason.

demmer · 2017-09-30T11:29:32Z

go/vt/sqlparser/normalizer.go

+}
+
+func (nz *normalizer) convertSQLValDedup(node *SQLVal) {
+	// If value is too long, don't dedup.


Is the reason for this because the comparison is likely to be expensive CPU wise? If so we should add an additional comment indicating why.

demmer · 2017-09-30T11:32:08Z

go/vt/sqlparser/normalizer.go

+// and iterate on converting each individual value into separate
+// bind vars.
+func (nz *normalizer) convertComparison(node *ComparisonExpr) {
+	switch node.Operator {


I'm not sure what the standard go style is, but I would find this particular bit easier to read as:

if node.Operator != InStr && node.Operator != NoInStr { return }

Changed. It might have been my bias against double negations.

demmer · 2017-09-30T11:33:16Z

go/vt/sqlparser/normalizer.go

+	default:
+		return
+	}
+	// It's either IN or NOT IN.


If you make the above change then this comment would be less necessary

Based on real-life workloads, we found that it may not be a good idea to dedup all values. Specifically, values used in DMLs like INSERT etc. should not be deduped. They actually end up polluting the plan cache with all kinds of combinations. With the change, values are deduped only if they are within selects. This deduping happens even if there are subqueries withing DMLS, while the DML parts are still not deduped. Additionally, it doesn't make sense to take the effort to dedup values that are too long. So, I've added a check where if a value is longer than 256 bytes, we blindly create a new bind var.

googlebot added the cla: yes label Sep 30, 2017

sougou requested a review from demmer September 30, 2017 05:16

demmer approved these changes Sep 30, 2017

View reviewed changes

demmer mentioned this pull request Sep 30, 2017

fix vtexplain tests #3264

Merged

sougou force-pushed the normalizer branch from eefda66 to 7046f12 Compare September 30, 2017 14:48

demmer approved these changes Sep 30, 2017

View reviewed changes

sougou added 2 commits September 30, 2017 12:46

v3: normalizer: address review comments

15b6238

sougou force-pushed the normalizer branch from 7046f12 to 15b6238 Compare September 30, 2017 19:46

sougou merged commit 326d4bc into vitessio:master Sep 30, 2017

sougou deleted the normalizer branch October 13, 2017 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3: normalizer: smarter dedup #3262

v3: normalizer: smarter dedup #3262

sougou commented Sep 30, 2017

demmer left a comment

demmer Sep 30, 2017

sougou Sep 30, 2017

demmer Sep 30, 2017

demmer Sep 30, 2017

sougou Sep 30, 2017

demmer Sep 30, 2017

sougou Sep 30, 2017

demmer Sep 30, 2017

v3: normalizer: smarter dedup #3262

v3: normalizer: smarter dedup #3262

Conversation

sougou commented Sep 30, 2017

demmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment