add EOF() method to proto.Buffer #296

jhump · 2017-02-23T17:57:43Z

The Buffer type has a wide API that is very useful for writing my own message marshaller. However, what is does not have is a way to check whether EOF has been reached. The various Decode* methods return io.ErrUnexpectedEOF when they reach EOF, but that's not sufficient. For example, when reading the next tag+value pair, there was no way to distinguish end-of-message (e.g. no bytes left) from a malformed message (e.g. try to read next varint and/or field contents and reach EOF in the middle).

This small change seems inline with the spirit of this type since much of its API is exported. With it, it is possible to write a function that unmarshals a message and properly distinguishes end-of-message from an erroneous/unexpected EOF.

jhump · 2017-02-28T17:33:01Z

I added ReaderIndex accessor and SetReaderIndex mutator to this PR so that a custom marshaller can skip past part of the buffer's contents (like when skipping over an unrecognized field).

Any chance of getting this small change approved and merged?

jhump · 2017-03-07T16:22:34Z

Ping?

jhump · 2017-03-14T13:15:02Z

Ping. Any chance that anyone will look at this, even if to say "no thanks"?

bcmills · 2017-03-14T20:29:21Z

The fact that Buffer already has such a wide API is IMO a reason to avoid adding to it, not a reason to add to it.

Can you explain a bit more what your use-case for EOF is? Perhaps there's a simpler way to address it with the existing API.

jhump · 2017-03-14T20:44:43Z

If I am using a buffer, there is no way to tell when I hit EOF on an expected boundary, as opposed to hitting it erroneously in the middle of a field. As is, anything that tries to consume the contents of the buffer would simply have to ignore malformed messages where EOF occurs in the middle of something (like incomplete encoded varint or incomplete length-delimited value).

The internal code that uses buffer can directly examine unexported fields: https://github.com/golang/protobuf/blob/master/proto/decode.go#L467
(Also see my change to said line in this PR.)

The fact that Buffer already has such a wide API is IMO a reason to avoid adding to it, not a reason to add to it.

Hmmm. I guess it depends on what is the purpose of the existing wide API. It appears to be for parsing a stream that is a serialized proto. And it seems to be exported for the value of code outside of the proto package. However, realistic parsing code cannot actually be written without exposing the buffer's reader index somehow. So I think this change remedies an omission rather than just "widens the API surface area".

EOF() isn't strictly necessary, nor is SetReaderIndex(int) (they really are niceties). It would suffice to simply expose a getter, like ReaderIndex(). Would that narrowing of this change make the maintainers more receptive to it?

bcmills · 2017-03-14T21:27:31Z

Honestly I'm not entirely sure why proto.Buffer has all those methods exported in the first place — I don't think they're intended for general use, and they've been present since the initial public release of the package.

The functions in the standard encoding/binary package will get you efficient varint and fixed-width decoding on an arbitrary slice or ByteReader, and the other integer decoding functions are fairly trivial in terms of those.

Beyond that, it's not clear to me why you would want to use Buffer to implement a custom Unmarshaler: if you're looking for better performance, I would expect the Buffer methods to perform similarly to proto.Unmarshal in the first place, and we're already investigating other approaches for addressing the known performance issues. (See also #280.)

jhump · 2017-03-14T21:45:45Z

I was using proto.Buffer because its API looked like almost exactly what I need (other than absence of accessor for reader index). So without it, I'll end up forking all of the logic, which I'm usually loathe to do when there's an open-source library that is already so close.

I was basically looking for something like Java's CodedInputStream and CodedOutputStream, except in Go.

My interest in an alternate parser isn't related to performance of the current one. Right now, the Go protobuf library only supports serializing from/de-serializing to protoc-generated structs. I was working on an implementation of a reflection-based message, that uses descriptor protos to parse and marshal messages. jhump/protoreflect@61443d9#diff-ac6f0c5d2bd348d0d1e29c20576fa9d1R305

bcmills · 2017-03-15T15:48:54Z

I agree that it would be nice to have an equivalent to CodedInputStream, but proto.Buffer is not that. (An API targeted to that use-case would probably wrap an io.Reader or io.ByteReader rather than a fixed-length byte slice.) Perhaps that's worth revisiting once the ongoing performance work lands.

Going in the other direction, I think if we were to make proto.Buffer more suitable for general use, the right way to do that would be to have the Decode* methods return io.EOF instead of io.UnexpectedEOF for calls that occur exactly at the end of the input. Unfortunately, I think that's technically a breaking change.

jhump · 2017-03-15T16:19:04Z

An API targeted to that use-case would probably wrap an io.Reader or io.ByteReader rather than a fixed-length byte slice.

That would be great.

So do you recommend that I just close this pull request and (sigh...) fork the parts of proto.Buffer that I need?

bcmills · 2017-03-15T16:35:46Z

So do you recommend that I just close this pull request and (sigh...) fork the parts of proto.Buffer that I need?

I think that's probably best for now. You could always fork them into a proper reusable CodedReader yourself, perhaps with an eye toward upstreaming it once you've worked the kinks out of the API...

zellyn · 2017-03-15T17:10:29Z

@bcmills I just want to make sure that it's obvious that while us folks using protobufs outside of Google understand the concerns, discussion, cautious curation etc. on issues like this one, and appreciate that there are long-term protobuf arcs of work inside Google that have months- and years-worth of sliding-block-puzzle dependency chains to work out, and that "working on protobuf" teams inside Google are understaffed and overburdened, and that there is a constant, wearying flood of terrible pull requests and suggestions that fail to comprehend even large design themes let alone wrinkles and intricacies… (whew)… It nevertheless feels almost impossible to contribute to protobufs in a meaningful way, even with the best of intentions and a lot of work trying to upstream things. And that feels bad, especially with protobufs being central to GRPC.

bcmills · 2017-03-15T17:18:28Z

@zellyn Agreed. We are aware that the current situation w.r.t. protobuf and pull requests is a significant problem, and we're trying to find a more sustainable way forward. I can't make any promises at the moment, but we feel your pain.

jhump · 2017-03-16T15:35:03Z

@bcmills: slightly related ask -- what would you say about changing mkeyprop and mvalprop fields of proto.Properties to be exported?

Right now, any code that is using reflection to crawl a generated struct has to re-parse the "protobuf_key" and "protobuf_val" struct tags on each encounter of a map field because these are currently unexported. All other properties are exposed, via proto.getProperties(reflect.Type) and the Prop and OneofTypes exported fields of StructProperties.

I'm happy to open a pull request if there's a chance it could be merged. WDYT?

bcmills · 2017-03-16T18:43:33Z

@jhump
I don't really know much about why the Properties struct is the way it is, so I'm not sure how the Map stuff fits into that. (I'd suggest opening an issue for discussion before you get too far into a pull request.)

jhump added 2 commits February 23, 2017 11:52

add EOF() method to proto.Buffer

ce9a17c

add Buffer.ReaderIndex and Buffer.SetReaderIndex

5e78c91

jhump mentioned this pull request Mar 1, 2017

Dynamic messages jhump/protoreflect#1

Merged

bcmills closed this Mar 15, 2017

jhump mentioned this pull request Apr 26, 2019

[PERF] Further performance improvements discussion jhump/protoreflect#189

Open

golang locked and limited conversation to collaborators Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add EOF() method to proto.Buffer #296

add EOF() method to proto.Buffer #296

jhump commented Feb 23, 2017

jhump commented Feb 28, 2017

jhump commented Mar 7, 2017

jhump commented Mar 14, 2017

bcmills commented Mar 14, 2017

jhump commented Mar 14, 2017

bcmills commented Mar 14, 2017

jhump commented Mar 14, 2017 •

edited

Loading

bcmills commented Mar 15, 2017

jhump commented Mar 15, 2017

bcmills commented Mar 15, 2017

zellyn commented Mar 15, 2017

bcmills commented Mar 15, 2017

jhump commented Mar 16, 2017

bcmills commented Mar 16, 2017

add EOF() method to proto.Buffer #296

add EOF() method to proto.Buffer #296

Conversation

jhump commented Feb 23, 2017

jhump commented Feb 28, 2017

jhump commented Mar 7, 2017

jhump commented Mar 14, 2017

bcmills commented Mar 14, 2017

jhump commented Mar 14, 2017

bcmills commented Mar 14, 2017

jhump commented Mar 14, 2017 • edited Loading

bcmills commented Mar 15, 2017

jhump commented Mar 15, 2017

bcmills commented Mar 15, 2017

zellyn commented Mar 15, 2017

bcmills commented Mar 15, 2017

jhump commented Mar 16, 2017

bcmills commented Mar 16, 2017

jhump commented Mar 14, 2017 •

edited

Loading