-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization API - efficiency #30
Comments
As noted in #21 (comment) this would not work when using other serialization format.
I think Grpc.AspNetCore should not have a dependency on Google.Protobuf as
We should make it possible to use other serialization formats and expect users to do that at some point. I'd follow the pattern used by Grpc.Core: The only thing that Grpc.Core currently assumes is that a serialized message will be an instance of a class and that one defines a "Marshaller" that knows how to serialize/deserialize (invoking the protobuf-specific serialization/deserialization is part of the generated code right now: https://github.com/grpc/grpc/blob/b6d8957b0d78f768b858698ba6a79296db448bb2/src/csharp/Grpc.Examples/MathGrpc.cs#L30) |
https://github.com/grpc/grpc/blob/master/src/csharp/Grpc.Core/Marshaller.cs An issue with the current Marshaller pattern is it operates on byte arrays. We'd want to have a pattern that doesn't require allocating an array and buffering to memory. For best performance we'd ideally read directly from the request and write directly to the response. For serialization extensibility, do you expect compatible serializer libraries to always generate a BindService method? Would you recommend BindService as the way Grpc.AspNetCore.Server discovers services and aquires marshallers? If the answer is yes, I think there are two changes we would want to make:
/// <summary>Register service method implementations with a service binder. Useful when customizing the service binding logic.
/// Note: this method is part of an experimental API that can change or be removed without any prior notice.</summary>
/// <param name="serviceBinder">Service methods will be bound by calling <c>AddMethod</c> on this object.</param></param>
public static void BindService(grpc::ServiceBinderBase serviceBinder)
{
serviceBinder.AddMethod(__Method_Div);
serviceBinder.AddMethod(__Method_DivMany);
serviceBinder.AddMethod(__Method_Fib);
serviceBinder.AddMethod(__Method_Sum);
}
(1) is something to figure out now. It will be the main way Grpc.AspNetCore.Server integrates with Grpc.Core so it is something to lock down. (2) is largely an API detail, and is something that doesn't need to be figured out right away. I'd like to learn more about how gRPC and ASP.NET Core interact before deciding what the best signature would be. ASP.NET Core will soon expose System.IO.Pipelines in HTTP2 and it will provide the best opportunity to maximize performance. |
I am aware of that limitation and I've had a solution in progress for a while: Grpc.Core already uses contextual serializers internally (but they delegate to the old-style byte array conversions for now). Serialization/DeserializationContext is not populated yet with the advanced members, but I have a PR in progress. A taste of new DeserializationContext (not yet merged): Btw, this effort started before we came up with the plan for Grpc.AspNetCore, so it would be good to review if this design works well with the planned APIs for Asp.NET Core pipeline access to requests/responses - let's create a separate issue for this review?
I think the answer is Yes.
I think we can add that soon. If you want, you can also create a PR yourself, here is an example PR that added ServiceBinderBase: https://github.com/grpc/grpc/pull/17157/commits
This is what ContextualSerializer/ContextualDeserializer can help with (see above). Let's review in a separate issue to make sure the (De)SerializerContext has methods that play well with both Grpc.Core and Grpc.AspNetCore (which will use System.IO.Pipelines). |
Sure, we'll create a PR for it.
Great. ASP.NET Core currently doesn't expose pipelines, but it will come in the next preview of ASP.NET Core 3.0. That shouldn't be too far away. We'll look at whether the contextual serializers are the right shape then. FYI @davidfowl |
We'll need another implementation to vet the abstraction. We can try using the other popular .NET protobuf library as a test https://www.nuget.org/packages/protobuf-net/. |
I'm not sure that Protobuf.NET supports proto3 (which is the recommended version of .proto format to use and the only one supported by Google.Protobuf for C#). But it terms of vetting the abstraction, I think just the ability to expose a "generic" service (e.g. a method handler that gives access to the actual binary payloads instead of serialized/deserialized messages would be good enough). E.g. here: https://github.com/grpc/grpc/blob/1064c6f663a2735fa2f4230dd8d278ae66d344b7/src/csharp/Grpc.IntegrationTesting/ServerRunners.cs#L92 |
I've renamed this issue. Grpc.AspNetCore should use Grpc.Core's abstraction.
I think we should have a meeting about this: @jtattermusch, @davidfowl, @JunTaoLuo and I. My high-level thoughts after some research:
|
A bit more detail here:
|
Let's go over this in our next weekly sync. |
Hmm, yeah I looked at the implementation of CodedInputStream and CodedOutputStream. They are built around byte arrays, either as the final place for data or a buffer to then write to a stream. Reading and writing directly with pipelines isn't possible with the current Google.Protobuf API. Unfortunately those types are both sealed. |
@JamesNK https://github.com/mkosieradzki/protobuf/tree/spans-pr-rebased - as far as I remember - this specific branch in my forked repo contains implementation that is compatible with pipelines and is pretty much optimized, but it modifies both codegen and the sdk. It should be also significantly faster, because it exercises aggressive inlining instead of virtual calls. |
Thanks for the link. The CodedOutputStream changes are understandable just from reading the code. Nice job! CodedInputStream is more complicated. I want to debug it at runtime to understand it fully. +1 on discussing this at sync next week. Getting performance to 100% is lower priority that functionality for the first preview, but a plan forward would be good 👍 |
After playing with this for a little bit, we can continue to use the marshaller for now but we need some changes to it. Instead of taking a byte[] it needs to take a |
If you look at the new pipelines code you can see the exact points where the marshaller could slot in and use those types seamlessly. |
We need to consider targeting for this feature. Because Grpc.Tools supports older versions of .NET, codegen will still continue to output serialization logic for |
Interesting project. I'm also author of MessagePack-Csharp, currently AArnott working support Span and others. AArnott/MessagePack-CSharp#9 // byte[] based formatter(ver 1.x) to span based formatter(ver 2.x)
public interface IMessagePackFormatter<T>
{
void Serialize(IBufferWriter<byte> writer, T value, IFormatterResolver resolver);
T Deserialize(ref ReadOnlySequence<byte> byteSequence, IFormatterResolver resolver);
} If you change the marshaller, this can be supported directly. |
@mkosieradzki Are you still willing to help out here? |
@davidfowl yes. What kind of help is needed? My protobuf fork should be compatible with those requirements. |
We're going to look at perf some more now that a lot of basic functionality is working. @mkosieradzki I think your PR goes a long way to adding the right features to protobuf. Are you able to rebase it on the latest version of google/protobuf? |
@JamesNK I will try to revisit and rebase it soon. If I fail to find some time during week, I will definitely do this on the upcoming Saturday. |
That would be great. I'm going to put together some microbenchmarks for serialization/deserialization performance. I'm also going to see about getting benchmarks in the ASP.NET Core and protobuf perf environments. That should track performance over time. |
@JamesNK I started going through the merge, it's not that easy, because it seems that legacy proto2 support has been added to the C# version, after creating the initial PR. So it's not only merging, but also re-implementing parts of proto2 support. Please let me know, whether you need my code updated to the latest branch, because you need features like proto2 support (I see scenarios related to gRPC where this will be useful), or just to have the latest version. It will allow me to prioritize my work. I will also need to know if the intention is to have this code merged back into protobuf, in that case we need to go back with @jtattermusch to the painful PR process, or do we need a span-friendly fork that works. For the PoC purposes - this branch should be good enough: https://github.com/mkosieradzki/protobuf/tree/spans-pr-rebased |
Before proceeding to implementation let's agree on the API design (the actual implementation is a lot of code and that can easily obscures the important concepts). Here's some proposal for the parsing API (to be iterated upon): // TODO: class name TBD (CodedInputReader, CodedInputContext, CodedInputByRefReader, WireFormatReader)
public ref struct CodedInputReader
{
// to avoid accessing first slice from inputSequence all the time,
// also to allow reading from single span.
ReadOnlySpan<byte> immediateBuffer;
ReadOnlySequence<byte> inputSequence; // struct!
// maybe we need bufferPos to avoid incrementing totalBytesRetired all the time..
// int bufferPos;
public CodedInputReader(ReadOnlySpan<byte> buffer)
{
this.immediateBuffer = buffer;
this.inputSequence = default;
}
public CodedInputReader(ReadOnlySequence<byte> input)
{
this.immediateBuffer = default;
this.inputSequence = input;
}
// TODO: add constructor that allows reading from a single span, buts calls a refill callback
// if span has been exhausted to allow supplying the next buffer slice?
// (= same as read only sequence, but allows supplying buffer slices reading lazily)
// internal state of coded input stream
private uint lastTag;
private uint nextTag;
private bool hasNextTag;
private int totalBytesRetired;
private int currentLimit; // initialize to maxInt!
private int recursionDepth;
// parsing limits
private readonly int recursionLimit;
private readonly int sizeLimit;
// exposed methods that copy potentially large amount of data:
// TODO: figure out an opt-in way to avoid copying and/or allocation
// ReadBytes()
// ReadString();
} |
You could but I wouldn't start with that. We may also want to make it possible to return a state object like the Utf8JsonReader does: @ahsonkhan Can you lend your experience here with the JsonReader. See the other state also. |
You could expose the raw input data slices similar to what the Utf8JsonReader does and let the caller transcode themselves: And if you are slicing
I am not too familiar with the types of input CodedInputReader would accept, but when it comes to returning/accepting a state for partial reading (and to support re-entrancy), you may also want an Additionally, you would likely need to know if you are at the end or not: private bool IsLastSpan => _isFinalBlock && (_isSingleSegment || _isLastSegment);
Is this the amount of bytes read from the input so far? |
That sounds like a good start, but in our case, the caller will be the Google.Protobuf assembly itself, so we would also need to figure out an alternative way of exposing the bytestring in the message api (this would be an advanced use case for very performance-oriented users I assume).
I like the idea of being able to save the parsing state using
See comment here: |
We'd also need to prototype the solution for maintaining backwards compatibility.
|
CC @jskeet |
When used with gRPC the protobuf serializer doesn't need to be re-entrant. gRPC will prefix the message with its size in bytes so it is simple to read from the request pipeline until we have all the necessary data to deserialize the message. |
Which is why we have a max buffer size 😄 |
This is DONE with gRPC 2.32.0 and Protobuf 3.13.0 🎉 🥳 🎈 🍰 |
@JamesNK What's a good entry point for grokking the final design (e.g. docs, examples, tests, etc.)? |
Consider providing an extension point allowing users to configure serialization libraries other than Google.Protobuf.Grpc.Core already offers an abstraction layer that we can use.
Issue is for discussing improving it and Google.Protobuf (default serialization layer)
Start from #30 (comment)
The text was updated successfully, but these errors were encountered: