Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not generate XXX fields in protobufs #1725

Merged
merged 2 commits into from
Nov 13, 2019

Conversation

ppanyukov
Copy link
Contributor

@ppanyukov ppanyukov commented Nov 6, 2019

Changes

Use options in .proto files to not generate XXX fields in Go structs.

We don't need XXX fields - the interface is so stable it is VERY unlikely we'd ever need these at all, ever. By getting rid of them we achieve:

  • smaller memory footprint
  • smaller payloads on the wire (edit: XXX are not put on the wire it seems)
  • and most importantly same in-memory layout with core Prometheus structs opening path for all sorts of optimisations such as unsafe casts, leading to better memory use and faster :)
  • see related issue: Unify labels packages prometheus/prometheus#6029

Why this matters from memory perspective:

  • size of original storepb.Label type (with XXX fields): 64 bytes
  • size of new storepb.Label type (without XXX): 32 bytes
  • that's 2x reduction
  • considering we allocate massive slices of storepb.Label this matters

Why this matters to have same in-memory layout with core Prometheus structs:

  • we do tons of conversions between the prom structs and our own storepb types because of these XXX fields
  • this includes arrays of the same
  • this is not only slow, it allocates tons of memory, unnecessarily so
  • since now we don't have XXX fields, we can cast from type to type without allocating anything new
  • this is especially true for slices

The tests show Store Gateway memory allocations dropped 5% for query execution:

  • Showing nodes accounting for -359.20MB, 6.19% of 5798.93MB total

From code perspective, this is example of how much less memory is used for one query in blockSeries func:

    683            .          .           		s := seriesEntry{ 
    684    -122.54MB  -122.54MB           			lset: make([]storepb.Label, 0, len(lset)+len(extLset)), 
    685       2.50MB     2.50MB           			refs: make([]uint64, 0, len(chks)), 
    686    -102.04MB  -102.04MB           			chks: make([]storepb.AggrChunk, 0, len(chks)), 
    687            .          .           		} 

Verification

Verification

  • Sample data generated with thanosbench using realistic-k8s-1w-small profile
  • Query executed: count({__name__=~".+"}) by (__name__)
  • Instrumented BucketStore.Series to dump heap profile at the start and end of the function call.
  • Compared the profiles using pprof:
go tool pprof -sample_index=alloc_space -edgefraction=0 -lines -base heap-series-1-after-base.pb.gz  -http=:8080 heap-series-1-after-change.pb.gz
  • PNG attached

  • Heap profiles attached

profile005

heap-series-1-after-base.pb.gz
heap-series-1-after-change.pb.gz

EDIT 1: Notes and test on compatibility

I've done some tests on the compatibility to make sure this is not a breaking change for anything:

  • Store Gateway: using XXX fields; Querier: not using XXX fields
  • Store Gateway: not using XXX fields; Querier: using XXX fields

Everything seems to work just fine between the two.

Which of course makes sense since I also discovered that XXX fields are not put on the wire :)

  • smaller payloads on the wire (edit: XXX are not put on the wire it seems)

Basically this means that individual components can use or not use XXX fields and the comms between them are still OK.

Hopefully this should give people more confidence that this PR can be merged without ill effects.

We don't need them. By gettig rid of them we achieve:

- smaller memory footprint
- smaller payloads on the wire
- same in-memory layout with core Prometheus structs opening path for all sorts of optimisations such as unsafe casts, leading to better memory use and faster :)
- see related issue: prometheus/prometheus#6029

The tests show SG memory allocations dropped 5% for query execution:
- Showing nodes accounting for -359.20MB, 6.19% of 5798.93MB total

Signed-off-by: Philip Panyukov <[email protected]>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is amazing! I think that opens many doors for improvements!

Will review later as we prepare for PromCon.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this possibly hinder the upgrade paths in the future for our users if these types will ever be extended? I guess @bwplotka could chime in here about the actual stability of these things (:

We already extended this not such a long time ago:
#1284
#750

But I guess this trade-off could be acceptable given that we reduce the memory usage by ~5%.

@ppanyukov
Copy link
Contributor Author

ppanyukov commented Nov 7, 2019

So, after a bit of digging on these XXX fields, it's a touchy subject apparently. They came and gone and came back again. Apparently there was a lot of upset when these fields disappeared on one release of proto3 after Google decided they don't need them internally and therefore let's ditch them for everyone. So they made it back.

To give some idea what they are and what they are used for:

https://github.com/golang/protobuf/blob/master/README.md

we reserve the right to ... adding, removing, or changing ... fields that start with XXX. These ... should not be considered part of the public API.

https://groups.google.com/forum/#!msg/golang-nuts/F5xFHTfwRnY/sPv5nTVXBQAJ

When tested inside Google, we discovered that these changes were more disruptive than expected. The cause of the issues are mostly due to additional fields added to generated messages by protoc-gen-go:

  • An XXX_NoUnkeyedLiteral field that the generator now creates to force users to use keyed literals (e.g., foopb.Message{Name: "Golang", Age: 8} as opposed to foopb.Message{"Golang", 8}) to ensure forward compatible usage of messages.
  • An XXX_unrecognized field that is necessary for proto3 to always preserve unknown fields. This breaks users that assume comparability of proto3 messages. Since XXX_unrecognized is of type []byte, this means that proto3 messages cannot be used as map keys nor directly compared using the == operator.
  • An XXX_sizecache field that is part of the internal implementation of the table-driven serializer. The presence of this field can break tests that use reflect.DeepEqual to compare some received messages with some golden message. It is recommended that proto.Equal be used instead for this situation.

I think we can certainly ditch XXX_NoUnkeyedLiteral.
Also XXX_unrecognized unless someone thinks we do want to preserve unrecognised fields in messages? Do we ever want this?

The XXX_sizecache is use for their table-driven serialiser which they claim made things "1.3x to 2.1x faster when tested inside Google". So hmm. I don't even know if that is a thing anymore these days or whether it's of any significance for us.

Any thoughts from anyone?

@ppanyukov
Copy link
Contributor Author

Also, this produces same result as using protoc-gen-gogofaster plugin instead of protoc-gen-gogofast we are using now.

@bwplotka
Copy link
Member

The XXX_sizecache is use for their table-driven serialiser which they claim made things "1.3x to 2.1x faster when tested inside Google". So hmm. I don't even know if that is a thing anymore these days or whether it's of any significance for us.

Yes. It seems to help in parsing latency. I think we can use tradeoff of parsing latency vs the memory used

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just small comment to fill.

@@ -11,6 +11,11 @@ option (gogoproto.marshaler_all) = true;
option (gogoproto.unmarshaler_all) = true;
option (gogoproto.goproto_getters_all) = false;

// Do not generate XXX fields. As per https://stackoverflow.com/a/58088012
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's comment why (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done?

// Do not generate XXX fields to reduce memory footprint and opening a door
// for zero-copy casts to/from prometheus data types.

@ppanyukov ppanyukov requested a review from bwplotka November 13, 2019 11:46
@bwplotka bwplotka merged commit 5eb43c1 into thanos-io:master Nov 13, 2019
@ppanyukov
Copy link
Contributor Author

More detailed and precise measurements of the effect of this change.

Memory profile deltas for start/end of each of these functions:


Querier, func (s *ProxyStore) Series(), in pkg/store/proxy.go

BEFORE:

MEM STATS DIFF:   	q-Series 	q-Series - AFTER 	-> Delta
    HeapAlloc  : 	5.25M 		1.50G 			-> 1.49G
    HeapObjects: 	16.95K 		15.76M 			-> 15.75M

MEM PROF DIFF:    	q-Series 	q-Series - AFTER 	-> Delta
    InUseBytes  : 	3.76M 		993.00M 		-> 989.24M      <===!
    InUseObjects: 	141 		26.54K 			-> 26.40K       
    AllocBytes  : 	6.95M 		2.25G 			-> 2.25G
    AllocObjects: 	344 		26.36K 			-> 26.01K

AFTER:

MEM STATS DIFF:   	q-Series 	q-Series - AFTER 	-> Delta
    HeapAlloc  : 	5.28M 		853.92M 		-> 848.64M
    HeapObjects: 	17.01K 		13.27M 			-> 13.25M

MEM PROF DIFF:    	q-Series 	q-Series - AFTER 	-> Delta
    InUseBytes  : 	3.07M 		635.79M 		-> 632.72M      <===!
    InUseObjects: 	18 		14.10K 			-> 14.09K
    AllocBytes  : 	7.20M 		1.61G 			-> 1.60G
    AllocObjects: 	29 		17.93K 			-> 17.90K

RESULT: 36% reduction for InUseBytes.


StoreGateway, func (s *BucketStore) Series(), in pkg/store/bucket.go

BEFORE:

MEM STATS DIFF:   	sg-Series 	sg-Series - AFTER 	-> Delta
    HeapAlloc  : 	5.28M 		1.47G 			-> 1.47G
    HeapObjects: 	18.67K 		12.70M 			-> 12.69M

MEM PROF DIFF:    	sg-Series 	sg-Series - AFTER 	-> Delta
    InUseBytes  : 	3.14M 		1.27G 			-> 1.27G      <===!
    InUseObjects: 	6 		4.42K 			-> 4.41K
    AllocBytes  : 	7.80M 		2.05G 			-> 2.05G
    AllocObjects: 	33 		4.69K 			-> 4.66K

AFTER:

MEM STATS DIFF:   	sg-Series 	sg-Series - AFTER 	-> Delta
    HeapAlloc  : 	5.26M 		1.20G 			-> 1.19G
    HeapObjects: 	18.66K 		12.70M 			-> 12.69M

MEM PROF DIFF:    	sg-Series 	sg-Series - AFTER 	-> Delta
    InUseBytes  : 	1.99M 		985.38M 		-> 983.39M      <===!
    InUseObjects: 	7 		2.41K 			-> 2.40K
    AllocBytes  : 	6.74M 		1.76G 			-> 1.75G
    AllocObjects: 	40 		3.11K 			-> 3.07K

RESULT: 22% reduction for InUseBytes.

@ppanyukov ppanyukov deleted the feature/remove-pb-overhead-pr branch November 13, 2019 18:05
tianyuansun pushed a commit to tianyuansun/thanos that referenced this pull request Nov 19, 2019
* Do not generate XXX fields in protobufs

We don't need them. By gettig rid of them we achieve:

- smaller memory footprint
- smaller payloads on the wire
- same in-memory layout with core Prometheus structs opening path for all sorts of optimisations such as unsafe casts, leading to better memory use and faster :)
- see related issue: prometheus/prometheus#6029

The tests show SG memory allocations dropped 5% for query execution:
- Showing nodes accounting for -359.20MB, 6.19% of 5798.93MB total

Signed-off-by: Philip Panyukov <[email protected]>

* comment to explain why do don't generate XXX fields in protobufs

Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: suntianyuan <[email protected]>
IKSIN pushed a commit to monitoring-tools/thanos that referenced this pull request Nov 26, 2019
* Do not generate XXX fields in protobufs

We don't need them. By gettig rid of them we achieve:

- smaller memory footprint
- smaller payloads on the wire
- same in-memory layout with core Prometheus structs opening path for all sorts of optimisations such as unsafe casts, leading to better memory use and faster :)
- see related issue: prometheus/prometheus#6029

The tests show SG memory allocations dropped 5% for query execution:
- Showing nodes accounting for -359.20MB, 6.19% of 5798.93MB total

Signed-off-by: Philip Panyukov <[email protected]>

* comment to explain why do don't generate XXX fields in protobufs

Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Aleksey Sin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants