perf: sqlparser faster formatting #7710

vmg · 2021-03-18T15:35:46Z

Description

Happy Friday! (national holiday tomorrow so it's Friday for me right now).

This week I'm bringing another important optimization to the sqlparser code. We're tackling the performance of TrackedBuffer, the data structure that lets us format SQL ASTs into their textual representation.

The existing implementation for formatting AST nodes implements a Format(buf *TrackedBuffer) method on every AST nodes. Inside of these methods, the node serializes itself into the TrackedBuffer by using a very helpful TrackedBuffer.astPrintf method. This lets us developers implement the formatting of all nodes in a very convenient way, because we can use printf-like syntax to generate the SQL output, but it has terrible performance implications.

All calls to astPrintf allocate; a printf interface like func (buf *TrackedBuffer) astPrintf(currentNode SQLNode, format string, values ...interface{}) must necessarily use variable arguments. varargs are not cheap in Go, because they must be passed as interface{}, and we already know (or we learnt last week) that moving objects into interface{} allocates in most cases.
All calls to astPrintf must perform parsing of the input string, which is not free. This is not ideal because for a given node, it's format string is always the same and doesn't change between calls, and
All calls to astPrintf lose type information: when we call astPrintf from one of our SQLNode structs, we obviously know the type of our node, and we know if this kind of node would need special semantic handling (e.g. whether it requires being wrapped with parens) when serializing it. When we pass it through an interface{}, this information is lost, and we need to typecast the interface to figure out this semantic information -- we're doing that for every single astPrintf call.
Most importantly: all calls to astPrintf cannot be inlined. AST serialization is a highly recursive operation, and there's a very significant amount of performance to be gained by inlining the recursive calls when serializing the fields of any given node. The Go compiler cannot inline through interface{} callsites.

So, how do we fix all this? These are all problems that could be trivially solved by verbosely and manually removing all calls to astPrintf and just writing the code to write directly into the TrackedBuffer, without any format strings. This results in very fast code, but the Format functions for our SQL nodes become essentially a maintenance nightmare; the printf syntax is very convenient to make this code manageable.

Because of this, I've come up with an alternative solution: a code rewriter that picks up all the formatting code for SQL nodes, finds all astPrintf calls, parses their static printf format strings, and statically replaces them with their decomposed forms. The resulting code is written into a separate method for every SQL node:

Before rewrite:

func (node *Update) Format(buf *TrackedBuffer) {
	buf.astPrintf(node, "update %v%s%v set %v%v%v%v",
		node.Comments, node.Ignore.ToString(), node.TableExprs,
		node.Exprs, node.Where, node.OrderBy, node.Limit)
}

After rewrite:

func (node *Update) formatFast(buf *TrackedBuffer) {
	buf.WriteString("update ")
	node.Comments.formatFast(buf)
	buf.WriteString(node.Ignore.ToString())
	node.TableExprs.formatFast(buf)
	buf.WriteString(" set ")
	node.Exprs.formatFast(buf)
	node.Where.formatFast(buf)
	node.OrderBy.formatFast(buf)
	node.Limit.formatFast(buf)
}

This is not a naive regexp replacement (I tried implementing that at first; didn't work out), it's a fully syntax and type aware rewriting which handles statically at compile time many of the calculations that astPrintf was doing at runtime, particularly when it comes to handling expression performance and grouping. The rewriter knows whether any of the fields in a node can contain expressions, and whether the expressions need special handling based on syntactic precedence:

func (node *AndExpr) Format(buf *TrackedBuffer) {
	buf.astPrintf(node, "%l and %r", node.Left, node.Right)
}

func (node *AndExpr) formatFast(buf *TrackedBuffer) {
	buf.printExpr(node, node.Left, true) // handled as left-expr
	buf.WriteString(" and ")
	buf.printExpr(node, node.Right, false) // handled as right-expr
}

The resulting formatFast code is used transparently by default when users create a TrackedBuffer without a custom formatter; the existing Format code is used when users use a custom formatter callback (the fast formatter doesn't support custom callbacks because they prevent inlining), so the system is fully backwards compatible.

The results are very exciting. Inlining and removing allocations is a very significant optimization in Go:

name                                old time/op    new time/op    delta
StringTraces/django_queries.txt-16    1.09ms ± 2%    0.43ms ± 2%  -60.72%  (p=0.008 n=5+5)
StringTraces/lobsters.sql.gz-16       45.4ms ± 1%    16.2ms ± 2%  -64.26%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
StringTraces/django_queries.txt-16     220kB ± 0%     124kB ± 0%  -43.83%  (p=0.008 n=5+5)
StringTraces/lobsters.sql.gz-16       11.1MB ± 0%     6.3MB ± 0%  -43.04%  (p=0.008 n=5+5)

name                                old allocs/op  new allocs/op  delta
StringTraces/django_queries.txt-16     6.82k ± 0%     2.85k ± 0%  -58.29%  (p=0.008 n=5+5)
StringTraces/lobsters.sql.gz-16         310k ± 0%      105k ± 0%  -66.18%  (p=0.008 n=5+5)

More than twice as fast for our dataset of realistic queries. A normal vtgate query formats incoming queries into strings at least once (often several times), so I expect this to have a measurable impact in total query latency.

Related Issue(s)

Checklist

[] Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

Signed-off-by: Vicent Marti <[email protected]>

go/vt/sqlparser/ast_format_fast.go

go/tools/astfmtgen/main.go

Signed-off-by: Vicent Marti <[email protected]>

systay

Wow. This is really, really cool.

The only thing I'm missing is a verify mode that we can use as a commit hook, but we can add that as an issue and someone else can work on that, if that saves you time & focus.

vmg added 3 commits March 18, 2021 16:10

sqlparser: vtgate parsing benchmark

cbfd9b2

Signed-off-by: Vicent Marti <[email protected]>

sqlparser: string formatting benchmark

cde7ca6

Signed-off-by: Vicent Marti <[email protected]>

sqlparser: generate faster AST formatting code

15d1461

Signed-off-by: Vicent Marti <[email protected]>

vmg requested review from GuptaManan100, harshit-gangal and systay as code owners March 18, 2021 15:35

systay reviewed Mar 18, 2021

View reviewed changes

go/vt/sqlparser/ast_format_fast.go Show resolved Hide resolved

systay reviewed Mar 18, 2021

View reviewed changes

go/tools/astfmtgen/main.go Show resolved Hide resolved

vmg added 4 commits March 18, 2021 17:37

sqlparser: add generated header

3209faa

Signed-off-by: Vicent Marti <[email protected]>

Merge branch 'master' into vmg/ast-str

b5c8f20

sqltypes: optimize EncodeStringSQL

8b1070a

Signed-off-by: Vicent Marti <[email protected]>

sqlparser: fix compilation

d427f15

Signed-off-by: Vicent Marti <[email protected]>

vmg mentioned this pull request Mar 18, 2021

Performance Improvements #7674

Open

sqlparser: fix flush formatting

5ca7db9

Signed-off-by: Vicent Marti <[email protected]>

harshit-gangal approved these changes Mar 18, 2021

View reviewed changes

Merge branch 'master' into vmg/ast-str

1449e99

Signed-off-by: Vicent Marti <[email protected]>

systay approved these changes Mar 22, 2021

View reviewed changes

systay merged commit 38d5661 into vitessio:master Mar 22, 2021

askdba added the Component: Query Serving label Mar 23, 2021

askdba added this to the v10.0 milestone Mar 23, 2021

ajm188 mentioned this pull request Jul 15, 2021

slack vitess v10.pre tinyspeck/vitess#228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: sqlparser faster formatting #7710

perf: sqlparser faster formatting #7710

vmg commented Mar 18, 2021

systay left a comment

perf: sqlparser faster formatting #7710

perf: sqlparser faster formatting #7710

Conversation

vmg commented Mar 18, 2021

Description

Related Issue(s)

Checklist

Deployment Notes

Impacted Areas in Vitess

systay left a comment

Choose a reason for hiding this comment