Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exact slice allocation for fixed size repeated packed fields #437

Merged

Conversation

gunnihinn
Copy link
Contributor

See comments in commit message and issue 436.

Gunnar Þór Magnússon added 3 commits July 25, 2018 15:22
This is a regression test for issue 436:

    gogo#436
Suppose we have the protobuf definition

    message Foo {
        repeated type Stuff = 1;
    }

that we deserialize in a fairly common way:

    f := &Foo{}
    f.Unmarshal(blob)

Before the call to Unmarshal, `f.Stuff` will be a slice of length 0, so
the Unmarshal operation will more or less be:

    for _, x := range xs {
        f.Stuff = append(f.Stuff, x)
    }

If we don't know how many elements we're going to deserialize
beforehand, this is the best we can do.

Suppose, however, that we know that we're going to deserialize n
elements. If k is such that 2^k < n <= 2^{k+1}, then the Go runtime's
exponential doubling strategy for resizing the arrays that back slices
will cause us to allocate memory for at least

    1 + 2 + ... + 2^{k+1} = 2 * 2^{k+1}

elements, which is usually more than double what we actually need.

When we deserialize packed fields, we know how many bytes we're going to
deserialize before we start the default append loop. If we furthermore
know how many elements those bytes correspond to, which we do when the
protobuf wire type corresponding to `type` has fixed length [1], we can
prepend the default append loop with

    f.Stuff = make([]type, 0, n)

and ask for exactly the memory we're going to use.

This results in considerable memory savings, between 50 and 80 percent,
compared with the default strategy. These savings are important to
people who use protobuf to communicate things like time series between
services, which consist almost entirely of large arrays of floats and
doubles.

This fixes gogo#436.

It's conceivable to implement similar things for packed types of
non-fixed length. They're encoded with varints, and we _could_ run
through the byte stream we're going to deserialize and count how many
bytes don't have the most significant bit set, but the performance
implications of that seem less predictable than of the simple division
we can perform here.

[1] https://developers.google.com/protocol-buffers/docs/encoding#structure
Travis CI is unhappy with me.
@gunnihinn
Copy link
Contributor Author

I thought a bit more about the throwaway comment about packed types of non-fixed length. I think it's worth a shot to do that, and tune it in such a way that maybe it only kicks in if we're sure there are enough varints to deserialize to be worth it. It sounds like a nice project for junior developers, so I'm fishing for some at $work to see if anyone wants to have a go at this. It might take a little while to organize, so I think we'll do a separate PR for that.

@awalterschulze
Copy link
Member

Thank you so much @gunnihinn this looks very promising.

@donaldgraham would you mind reviewing this?

@donaldgraham
Copy link
Contributor

This looks great. I like the way the test expresses the expectation that we allocate less than half of the memory we had allocated before...

@awalterschulze awalterschulze changed the title Gmagnusson/exact slice allocation exact slice allocation for fixed size repeated packed fields Jul 31, 2018
@awalterschulze awalterschulze merged commit 5440baf into gogo:master Jul 31, 2018
@awalterschulze
Copy link
Member

Thank you so much @gunnihinn and @donaldgraham

gunnihinn pushed a commit to go-graphite/protocol that referenced this pull request Jul 31, 2018
... and generate the protobuf files with the latest version.

We really want to run at least commit 5440baf of gogo/protobuf, because
it contains a pull request we got merged upstream:

    gogo/protobuf#437

The effect of the PR is to reduce the memory allocated by the

    carbonapi_v3_pb.MultiFetchResponse.Unmarshal

method by at least 50% and up to 80%. This is very relevant to
Graphite's interests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants