-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached Size Implementation #7387
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vmg
requested review from
harshit-gangal,
shlomi-noach,
sougou and
systay
as code owners
January 26, 2021 17:50
systay
reviewed
Jan 26, 2021
vmg
force-pushed
the
sizegen-tool
branch
2 times, most recently
from
January 27, 2021 15:37
993721d
to
7ac2a45
Compare
@vmg you would need to fix the DCO |
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
These two features are now handled exceptionally: - Generated files that don't attempt to perform interface casts no longer have a local interface definition for `cachedObject` - Generated functions that use `unsafe.Pointer` for hashmap size calculations are now properly flagged with a pragma to prevent a false positive in the compiler `ptrcheck` pass. - Abstracted the out filesystem to make things easier to test Signed-off-by: Vicent Marti <[email protected]> Signed-off-by: Andres Taylor <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
systay
approved these changes
Feb 1, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Here's a hard problem in a garbage collected language like Go: we need to know the exact size that an object occupies in memory, so we can perform accurate memory tracking for our caches. This issue in the official Go repo provides many examples of why this is exceedingly complicated, and as evidenced by the fact that the issue has been closed by the maintainers, it doesn't seem like the authors of Go are looking to implement a way to reliably do this in the language runtime.
It is then our time to get dirty. After many discussions with @systay, we agreed on an approach that I've implemented in this PR: a code generator, very similar to the one we use to automatically generate the walker code for our AST, but that instead generates a helper method for all the structures that can possibly fit in cache.
This PR is composed of two commits: the first one checks in the code generator and the second one checks in all the generated code.
My goals, in order of importance, are as follows:
Notes on the codegen
As you can see, the generated code is rather comprehensive. The output that has been checked in has been generated from running the following command:
The input to the
sizegen
tool are the base types that are going to be stored in the cache. In our case, this isengine.Plan
andtabletserver.TabletPlan
. From there, the generator builds a full AST of Vitess, typechecks all our packages and dependencies, and starts traversing the type information, starting from just thePlan
andTabletPlan
structs, to find every single field that can be reached from our root structs and generate aCachedSize
helper for each one of them.The generated code is written separately to every package in
go/vt
that contains cacheable data structures, in a file calledcached_size.go
. This is required so we can obtain fully accurate type information even for private fields in the structs.The generator is smart and outputs optimal code: structs that are PODs do not get helper methods because their size is known a-priori. Pointers are always checked before being dereferenced. Slices take into account the capacity of their backing arrays, and maps use implementation internals to a get a down-to-the-byte count of how much is the map implementation using in backing data structures. For all the Interfaces that are stored in our cacheable structs, we use type inference to find all the structs in the codebase that could possibly implement such interface, and generate helper methods for all of them. Lastly, the generator takes special care to distinguish between fields that are embedded in structs and fields that are pointed at, to ensure that non-allocated data is not double counted.
The generator is also fully reproducible (particularly regarding to order), so that successive runs of the codegen always output the same files unless the underlying code has changed.
As far as I can tell, there are no outstanding issues or missing features on the codegen, but there may be correctness bugs, since there is a significant amount of complexity in the data structures we're storing on cache. I look forward to your feedback.
Related Issue(s)
#7304 is the WIP PR for the new cache implementation, where I explain in the detail the reason why accurate memory usage assessments for the cache are important.
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect:
cc @systay -- would you mind CCing whoever else you think would be interested on this?