-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Vec
s without repeated re-allocation and copies
#35
Comments
I realised a chunked linked list could also be a more compact data representation for Currently for 4 statements, it's stored as follows:
Could be stored in chunks as:
4 elements = 40 bytes, instead of 64 bytes (10 bytes per element). To be fair, chunking isn't required for this optimization. We could also pack a contiguous list using the same technique. |
Using a chunked linked list, I think it could be possible to make The key would be to ensure that the node which is currently on the visitation "branch" does not move. You may shuffle up/down items before or after the node on current visitation branch, or insert/delete chunks before/after current chunk, but the current node stays put, so its pointer remains valid during visitation. This would require:
Would work particularly well with chunks of constant size e.g. 8, 16 or 32 slots. If used the scheme of splitting enum discriminants and pointers apart described in previous comment, you could find an empty slot with a single SIMD instruction - the same way Swiss Table / hashbrown does. Layout of an 8-item chunk of
88 bytes = 11 bytes per element. 16-item chunk = 152 bytes = 9.5 bytes per element. Or aligned to cache line sizes: 5-item chunk in 64 bytes:
64 bytes = 12.8 bytes per item. 12-item chunk in 128 bytes:
128 bytes = 10.7 bytes per item. The latter seems a pretty good fit for top level |
The problem with
Vec
sBumpalo packs data tightly and grows downwards. This is generally a good thing, but when building
Vec
s, it's a disadvantage because when aVec
needs to grow beyond capacity, it always reallocates and is copied.For example, while parsing a program, the top level statements
Vec
will initially be capacity 4 (I think that's the default). When a 5th statement is found:Vec<Statement>
grows to capacity 8.[Statement; 8]
is allocated into arena.When more than 8 statements are found, the process repeats again.
This causes 3 different problems:
Vec
's backing storage ends up being located is unpredictable - could be before the node data, or after, or in the middle.This problem is most acute for the top-level
Vec<Statement>
, as programs can be many many statements.Possible solutions
1. Estimate number of statements from source text size
For top-level
Vec<Statement>
, make a guess at eventual size based on source text size and allocate aVec
with (hopefully) sufficient capacity.If we guess correctly, this will reduce copying a lot. However, it's hard to guess accurately. A module which is entirely wrapped in an IIFE could be massive in terms of source text size, but is only a single statement. Webpack-bundled files will also have a small number of statements.
Over-allocation is probably not so bad. It will leave large chunks of memory unused in arena, but that empty memory is never accessed, so shouldn't affect caching. If it's a large stretch of memory, OS may not even allocate memory pages for it.
It's also hard to guess number of statements in functions ahead of time. We could probably take good guesses on size of
Vec
s for other things. e.g. Most functions don't have more than 4 arguments.2. Small vec optimization
Use a "small vec" type for
Vec
s which often have only 1 member. There's a few good candidates for this:Vec<Argument>
inCallExpression
(foo(bar)
)Vec<Directive>
inFunctionBody
(function foo() { 'use strict'; }
)Vec<AssignmentTargetProperty>
inObjectAssignmentTarget
(const {foo} = bar;
)Vec<ImportDeclarationSpecifier>
inImportDeclaration
(import foo from 'bar';
,import {foo} from 'bar';
)Vec<Expression>
inImportExpression
(import('./foo.js')
)If we could reduce size of
enum
s to 8 bytes with pointer compression/pointer tagging, could fit 2 elements inline, which would make it suitable for a few more types. But regardless, many types wouldn't be suitable, and it still it doesn't help with the biggest problemVec<Statement>
.3.
Vec<Box<T>>
For
Vec
s where the content is not already an enum with boxed variants:Box the
Vec
's elements. So whenVec
grows, you're just copyingBox<T>
s rather thanT
s. In some casesT
is quite large e.g.FormalParameter
is 80 bytes, whereasBox<FormalParameter>
is 8 bytes.Trade-off of read speed vs write speed:
Vec
s.4. Replace
Vec
with chunked linked listChunked linked list
As list grows, chunks would be added to end of linked list in sizes of increasing powers of 2 (same growth strategy as
Vec
). This keeps the memory overhead of forward pointers low, compared to a standard linked list.Advantages:
Vec
= faster in parser.Vec
for large lists for enum types, as memory for a list chunk is close to the memory containing theBox<T>
content.Disadvantages:
Indexing: I don't think we often need to index into
Vec
s. If we want to provide aPath
API (#30 (comment)), that would require indexing. We could design a type which does allow indexing by usingindex.trailing_zeros()
to calculate which chunk. This type would be large, so only suitable for top-level statements.5. Build
Vec
s in scratch spaceAdd a scratch space arena to allocator. In the parser, build
Vec
s initially in the scratch space, and then copy the contents into main arena once theVec
is complete.In parser, you're only building 1 x
Vec
at a time, so the scratch space can be treated as a stack, growing upwards.Advantages:
Vec
type.Vec
.Disadvantages vs chunked linked list:
Number of copies: Currently building a
Vec
with between 129 and 255 elements results in 252 elements being copied in total (4 + 8 + 16 + 32 + 64 + 128). Scratch space method copies the number of elements only once (129 to 255 copied elements) - so reduced copying in almost all cases. Copy happens only once in a single go, which may be faster (compiler will use SIMD ops and loop unrolling).Conclusion
All the above could be used in combination.
If we don't need indexing, chunked linked list is probably the best solution. Would need to benchmark it though.
The text was updated successfully, but these errors were encountered: