Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add EOF() method to proto.Buffer #296

Closed
wants to merge 2 commits into from
Closed

Conversation

jhump
Copy link
Contributor

@jhump jhump commented Feb 23, 2017

The Buffer type has a wide API that is very useful for writing my own message marshaller. However, what is does not have is a way to check whether EOF has been reached. The various Decode* methods return io.ErrUnexpectedEOF when they reach EOF, but that's not sufficient. For example, when reading the next tag+value pair, there was no way to distinguish end-of-message (e.g. no bytes left) from a malformed message (e.g. try to read next varint and/or field contents and reach EOF in the middle).

This small change seems inline with the spirit of this type since much of its API is exported. With it, it is possible to write a function that unmarshals a message and properly distinguishes end-of-message from an erroneous/unexpected EOF.

@jhump
Copy link
Contributor Author

jhump commented Feb 28, 2017

I added ReaderIndex accessor and SetReaderIndex mutator to this PR so that a custom marshaller can skip past part of the buffer's contents (like when skipping over an unrecognized field).

Any chance of getting this small change approved and merged?

@jhump
Copy link
Contributor Author

jhump commented Mar 7, 2017

Ping?

@jhump
Copy link
Contributor Author

jhump commented Mar 14, 2017

Ping. Any chance that anyone will look at this, even if to say "no thanks"?

@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2017

The fact that Buffer already has such a wide API is IMO a reason to avoid adding to it, not a reason to add to it.

Can you explain a bit more what your use-case for EOF is? Perhaps there's a simpler way to address it with the existing API.

@jhump
Copy link
Contributor Author

jhump commented Mar 14, 2017

If I am using a buffer, there is no way to tell when I hit EOF on an expected boundary, as opposed to hitting it erroneously in the middle of a field. As is, anything that tries to consume the contents of the buffer would simply have to ignore malformed messages where EOF occurs in the middle of something (like incomplete encoded varint or incomplete length-delimited value).

The internal code that uses buffer can directly examine unexported fields: https://github.com/golang/protobuf/blob/master/proto/decode.go#L467
(Also see my change to said line in this PR.)

The fact that Buffer already has such a wide API is IMO a reason to avoid adding to it, not a reason to add to it.

Hmmm. I guess it depends on what is the purpose of the existing wide API. It appears to be for parsing a stream that is a serialized proto. And it seems to be exported for the value of code outside of the proto package. However, realistic parsing code cannot actually be written without exposing the buffer's reader index somehow. So I think this change remedies an omission rather than just "widens the API surface area".

EOF() isn't strictly necessary, nor is SetReaderIndex(int) (they really are niceties). It would suffice to simply expose a getter, like ReaderIndex(). Would that narrowing of this change make the maintainers more receptive to it?

@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2017

Honestly I'm not entirely sure why proto.Buffer has all those methods exported in the first place — I don't think they're intended for general use, and they've been present since the initial public release of the package.

The functions in the standard encoding/binary package will get you efficient varint and fixed-width decoding on an arbitrary slice or ByteReader, and the other integer decoding functions are fairly trivial in terms of those.

Beyond that, it's not clear to me why you would want to use Buffer to implement a custom Unmarshaler: if you're looking for better performance, I would expect the Buffer methods to perform similarly to proto.Unmarshal in the first place, and we're already investigating other approaches for addressing the known performance issues. (See also #280.)

@jhump
Copy link
Contributor Author

jhump commented Mar 14, 2017

I was using proto.Buffer because its API looked like almost exactly what I need (other than absence of accessor for reader index). So without it, I'll end up forking all of the logic, which I'm usually loathe to do when there's an open-source library that is already so close.

I was basically looking for something like Java's CodedInputStream and CodedOutputStream, except in Go.

My interest in an alternate parser isn't related to performance of the current one. Right now, the Go protobuf library only supports serializing from/de-serializing to protoc-generated structs. I was working on an implementation of a reflection-based message, that uses descriptor protos to parse and marshal messages. jhump/protoreflect@61443d9#diff-ac6f0c5d2bd348d0d1e29c20576fa9d1R305

@bcmills
Copy link
Contributor

bcmills commented Mar 15, 2017

I agree that it would be nice to have an equivalent to CodedInputStream, but proto.Buffer is not that. (An API targeted to that use-case would probably wrap an io.Reader or io.ByteReader rather than a fixed-length byte slice.) Perhaps that's worth revisiting once the ongoing performance work lands.

Going in the other direction, I think if we were to make proto.Buffer more suitable for general use, the right way to do that would be to have the Decode* methods return io.EOF instead of io.UnexpectedEOF for calls that occur exactly at the end of the input. Unfortunately, I think that's technically a breaking change.

@jhump
Copy link
Contributor Author

jhump commented Mar 15, 2017

An API targeted to that use-case would probably wrap an io.Reader or io.ByteReader rather than a fixed-length byte slice.

That would be great.

So do you recommend that I just close this pull request and (sigh...) fork the parts of proto.Buffer that I need?

@bcmills
Copy link
Contributor

bcmills commented Mar 15, 2017

So do you recommend that I just close this pull request and (sigh...) fork the parts of proto.Buffer that I need?

I think that's probably best for now. You could always fork them into a proper reusable CodedReader yourself, perhaps with an eye toward upstreaming it once you've worked the kinks out of the API...

@bcmills bcmills closed this Mar 15, 2017
@zellyn
Copy link
Contributor

zellyn commented Mar 15, 2017

@bcmills I just want to make sure that it's obvious that while us folks using protobufs outside of Google understand the concerns, discussion, cautious curation etc. on issues like this one, and appreciate that there are long-term protobuf arcs of work inside Google that have months- and years-worth of sliding-block-puzzle dependency chains to work out, and that "working on protobuf" teams inside Google are understaffed and overburdened, and that there is a constant, wearying flood of terrible pull requests and suggestions that fail to comprehend even large design themes let alone wrinkles and intricacies… (whew)… It nevertheless feels almost impossible to contribute to protobufs in a meaningful way, even with the best of intentions and a lot of work trying to upstream things. And that feels bad, especially with protobufs being central to GRPC.

@bcmills
Copy link
Contributor

bcmills commented Mar 15, 2017

@zellyn Agreed. We are aware that the current situation w.r.t. protobuf and pull requests is a significant problem, and we're trying to find a more sustainable way forward. I can't make any promises at the moment, but we feel your pain.

@jhump
Copy link
Contributor Author

jhump commented Mar 16, 2017

@bcmills: slightly related ask -- what would you say about changing mkeyprop and mvalprop fields of proto.Properties to be exported?

Right now, any code that is using reflection to crawl a generated struct has to re-parse the "protobuf_key" and "protobuf_val" struct tags on each encounter of a map field because these are currently unexported. All other properties are exposed, via proto.getProperties(reflect.Type) and the Prop and OneofTypes exported fields of StructProperties.

I'm happy to open a pull request if there's a chance it could be merged. WDYT?

@bcmills
Copy link
Contributor

bcmills commented Mar 16, 2017

@jhump
I don't really know much about why the Properties struct is the way it is, so I'm not sure how the Map stuff fits into that. (I'd suggest opening an issue for discussion before you get too far into a pull request.)

@golang golang locked and limited conversation to collaborators Jun 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants