-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: sqlparser yacc codegen #7669
Conversation
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I have not looked at the goyacc.go
file - trusting the benchmarks and unit tests we have.
Signed-off-by: Vicent Marti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
I just had one thought, before merging this. How do we track the upstream That would also make it possible to see what you changed in the |
@systay: Upstream |
@vmg is it worth pulling this commit into our copy? golang/tools@59f1f2c |
Description
Following up on last week's zero-copy optimization for the Tokenizer, today we're picking up where we left off with some performance tuning for the generated Yacc parser.
This is the underlying idea: YACC is a compiler generator that outputs stack-based LALR parsers from an input grammar. Each action in a YACC grammar receives its input from one or more stack frames, and is supposed to push its output into a brand new stack frame. These are not native stack frames, they're abstracted by YACC itself into an array of "stack structs". These structs are supposed to contain any of the possible outputs for any of the possible actions in a grammar. This is cheap to implement in C (the language for which YACC was originally designed), because C supports
union
types.A
union
type in C can have any number of fields, but only one of the fields can be written/read to at the same time. Unlike astruct
, all the fields in aunion
share the same underlying memory, so the size of theunion
is the size of the largest of all its fields, instead of the sum of all its fields like in astruct
. This is an important memory optimization that can be performed in YACC parsers in C, and that doesn't work at all in Go: the Go programming language does not supportunion
types because they're not memory-safe (if you write into a pointer field in theunion
and then read from e.g. a float field, undefined behavior happens).The fact that Go doesn't support Unions is a massive performance drawback for our YACC parser, because it means that every stack frame in our parser must be a Go
struct
with a field for every possible output of a grammar action. In our SQL grammar, this is 94 fields totalling 1424 bytes per stack frame. Because of this, the performance of our parser is overwhelmingly dominated by simply copying and clearing stack frames, as opposed to parsing.This problem is immediately obvious when running a profiler in the parser and in fact @sougou and @GuptaManan100 already tried fixing this in a previous PR that didn't get merged. The design behind the PR is compelling: it tries to re-implement unions in Go by storing all the fields in the original struct into a single
interface{}
field, the closest thing that the language has to an union. A Gointerface{}
can hold any Go type, so it's something that could behave like aunion
(and hence reduce the memory usage of the parser) while being memory safe.Why didn't the original PR get merged? It was a combination of two factors:
goyacc
(the yacc generator we're using) and making changes to oursql.y
because some of the idioms used by the fork were not supported after the optimizationsThe new
goyacc
generatorThis PR attempts to address the two major shortcomings in the previous iteration of these changes so that we don't have to make a memory/speed tradeoff nor any changes to our grammar to obtain very significant performance improvement from the parser. How do we accomplish this?
I've fixed the increased memory consumption by implementing optional union semantics. To understand the problem in the previous implementation: most of the time, writing to an
interface{}
in Go allocates. Aninterface{}
is a tuple of pointers: the first one points at a VTable with metadata for the type being held in the interface; the second one points to the actual data contained in the interface. The advantage that our oldstruct
has over interfaces is that when we assign astruct
, or a primitive, to one of the fields, it's just copying memory into the parent struct. But when we assign astruct
to ourinterface{}
field, we're actually allocating space for thestruct
in the heap, because we need to store a pointer to it.The key to fixing the increase in allocations is storing in the union
interface{}
field the types that would not allocate. To accomplish this, the new generator lets us separate all the fields in our grammar between unionized fields, which will be stored in ainterface{}
, andstruct
fields, which will be stored like they were before.We're now storing in an
interface{}
union:Statement
): These fields were already allocating memory, because all interfaces allocate. Moving them to a sharedinterface{}
greatly reduces the size of the stack at no penalty whatsoever.*TableSpec
): These pointers were already heap allocations, so moving them into aninterface{}
does not allocate either.[]*When
): This is actually a memory usage regression, which I've mitigated by using codegen tricks; see the following section for details.Scope
wheretype Scope int8
): Back in the day, in the early versions of Go, literals could be stored inline in place of the pointer field, preventing an allocation. However, after some GC improvements in the Go runtime, this optimization was removed because it was much simpler to assume that the pointer field of aninterface{}
was always a pointer, so literals are always stored as a pointer to a heap allocation that contains the literal. So why are we placing them in aninterface{}
, then? It turns out that our literals are small enough that they're always opted in into another optimization: the Go runtime has a static array with numbers from 0 to 256; when storing a number literal in that range in aninterface{}
, the compiler doesn't create a heap allocation, instead it stores a pointer into the static array, which is not an allocation.We're keeping outside of the union:
string
field: a string is a tuple of two words. The first word is a pointer to a heap allocation with the contents of the string, the second word is the length of the string. When storing astring
in aninterface{}
, we're creating an extra heap allocation for those two words, so suddenly every string is two heap allocations (the tuple, and the contents of the string). This was accounting for almost 50% of all the extra allocations because we write into the string field for every tokenization step in the parser -- simply moving it outside of the union is a massive performance improvement.strs []string
field: this slice field is kept out of the union because it's used very frequently by SQL comments.struct
s in our stack frame: these fields are usually very hot and that's why they were being copied by value before. By keeping them outside of theinterface{}
we prevent them for allocating.I've fixed the requirement for grammar changes by making the new generator aware of Go syntax. Most notably, for any
$$
variable access, it can tell whether the variable acts as an lvalue or rvalue (if you need a refresher on C-language semantics, this article is nice) and generate code accordingly. This was the major shortcoming on the previous version ofgoyacc
that was forcing us to rewrite chunks of our grammar: when the current stack frame$$
is accessed as a rvalue, its unionized type needs to be cast into the expect type for the frame; when accessed as a lvalue, the assignment to the lvalue needs to be typechecked and then it must be assigned into the unionized field. The new generator supports both cases.For reference, here are some examples of what the new codegen looks like:
Codegen sample: Typesafe insert into union
Codegen sample: In-place access for $$
Lastly, I've reduced memory allocations further by adding support for updating slices in-place. As explained earlier, roughly 20% of all the memory allocations in the previous generator were caused by calling
append
on the interface field. Yacc code that looked like$$ = append($$, 1)
was being rewritten intoyyVAL.union = append(yyVAL.union.([]int), 1)
. In theory appending to a slice with enough capacity should not allocate, but this operation does. Why? Similarly to thestring
case discussed earlier, a slice is a triplet of words (pointer to the actual contents; length of the slice; and capacity of the slice). Hence, storing a slice in aninterface{}
is causing two allocations (one for the triplet, one for the contents). We have enough different slices in the parser that it makes sense to union them all, and this would be fine if the extra allocation for the triplet was happening only when we create the slice. But the allocation also happens every time we append to it, because the assignment to the interface forces an allocation: the return of theappend
function is a slice header, which as we know is a triplet of words; assigning this slice header to a local variable is a no-op that the compiler optimizes away (the slice is "appended" in-place), but assigning it to aninterface{}
means allocating 24 new bytes for the header to replace the existing header. The result of theappend
reuses the underlying array for the slice, but allocates a brand new header every time.This is a big enough problem (since the parser appends for basically every grammar rule) that I designed a specific fix for it: a rewrite step that detects cases where we're appending to the same slice and rewrites the append so it can be performed directly on the slice header, by acquiring a pointer to it. This code uses
unsafe
, although in practice it's fully safe. I am not madly in love with it, but the improvement in benchmarks is very noticeable; I've configured the newgoyacc
with a-fast-append
flag that lets us turn the optimization on/off.Codegen sample: Slice creation and in-place update
Benchmark results
We're comparing the relative change for performance, memory allocations (in Bytes) and memory allocations (by count) of 4 different implementations:
single_iface
is @sougou and @GuptaManan100's original PR, which I've rebased on top of the current master and based my work on top of; it includes several changes to the grammars that are not needed by the other implementations. It stores every field in the grammar into a singleinterface{}
.multi_union
is a too-complex version of the generator that allowed configuring many different union fields (for pointers, for primitives, for interfaces, etc). I discarded it because it also required changes to the grammar.optimal_union
is the final version of the code that only separates betweenstruct
fields and aninterface{}
union.optimal_union_fast_append
is like above, but with the in-place append optimization enabled for slices.For all graphs, the Y axis is percentual change, so smaller is always better.
Performance wise, the results are very good. The improvement was already impressive in the original PR after applying the grammar changes (despite the increase in memory consuption), but after unionizing selectively it goes through the roof: over 40% faster parsing in all benchmarks (that's almost twice as fast!) except in the pathological ones, which were designed as tokenization benchmarks so they're not particularly interesting here. This is all with no changes to the idioms in the grammar.
When it comes to memory usage, that's where we see the effort in this PR: we've taken down the increase in allocations from +200% to +25% in the realistic benchmarks. Memory usage in stress benchmarks only increases by 10%. Total allocation count in bytes only increases by 10%.
Obviously this is not ideal -- we'd want the parser to be both faster and to allocate less memory, but sometimes we need to trade memory for speed because computers aren't magical.
Final benchmark table: master vs optimal_union_fast_append
Conclusions
I believe that the performance benefits we're getting from this changeset, and the fact that it involves no actual changes to the grammar itself, makes it worth it to carry our own fork of
goyacc
in the codebase (which I've suitably placed in thesqlparser
directory to make generation trivial)Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: