-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exact slice allocation for fixed size repeated packed fields #437
exact slice allocation for fixed size repeated packed fields #437
Conversation
This is a regression test for issue 436: gogo#436
Suppose we have the protobuf definition message Foo { repeated type Stuff = 1; } that we deserialize in a fairly common way: f := &Foo{} f.Unmarshal(blob) Before the call to Unmarshal, `f.Stuff` will be a slice of length 0, so the Unmarshal operation will more or less be: for _, x := range xs { f.Stuff = append(f.Stuff, x) } If we don't know how many elements we're going to deserialize beforehand, this is the best we can do. Suppose, however, that we know that we're going to deserialize n elements. If k is such that 2^k < n <= 2^{k+1}, then the Go runtime's exponential doubling strategy for resizing the arrays that back slices will cause us to allocate memory for at least 1 + 2 + ... + 2^{k+1} = 2 * 2^{k+1} elements, which is usually more than double what we actually need. When we deserialize packed fields, we know how many bytes we're going to deserialize before we start the default append loop. If we furthermore know how many elements those bytes correspond to, which we do when the protobuf wire type corresponding to `type` has fixed length [1], we can prepend the default append loop with f.Stuff = make([]type, 0, n) and ask for exactly the memory we're going to use. This results in considerable memory savings, between 50 and 80 percent, compared with the default strategy. These savings are important to people who use protobuf to communicate things like time series between services, which consist almost entirely of large arrays of floats and doubles. This fixes gogo#436. It's conceivable to implement similar things for packed types of non-fixed length. They're encoded with varints, and we _could_ run through the byte stream we're going to deserialize and count how many bytes don't have the most significant bit set, but the performance implications of that seem less predictable than of the simple division we can perform here. [1] https://developers.google.com/protocol-buffers/docs/encoding#structure
Travis CI is unhappy with me.
I thought a bit more about the throwaway comment about packed types of non-fixed length. I think it's worth a shot to do that, and tune it in such a way that maybe it only kicks in if we're sure there are enough varints to deserialize to be worth it. It sounds like a nice project for junior developers, so I'm fishing for some at $work to see if anyone wants to have a go at this. It might take a little while to organize, so I think we'll do a separate PR for that. |
Thank you so much @gunnihinn this looks very promising. @donaldgraham would you mind reviewing this? |
This looks great. I like the way the test expresses the expectation that we allocate less than half of the memory we had allocated before... |
Thank you so much @gunnihinn and @donaldgraham |
... and generate the protobuf files with the latest version. We really want to run at least commit 5440baf of gogo/protobuf, because it contains a pull request we got merged upstream: gogo/protobuf#437 The effect of the PR is to reduce the memory allocated by the carbonapi_v3_pb.MultiFetchResponse.Unmarshal method by at least 50% and up to 80%. This is very relevant to Graphite's interests.
See comments in commit message and issue 436.