Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-031: reconsider type url format for service msgs encoded as google.Protobuf.Any #9063

Closed
fdymylja opened this issue Apr 7, 2021 · 20 comments · Fixed by #9139
Closed
Assignees
Milestone

Comments

@fdymylja
Copy link
Contributor

fdymylja commented Apr 7, 2021

Problem definition

ADR-031 states the following: In order to encode service methods in transactions, we encode them as Anys in the same TxBody.messages field as other Msgs. We simply set Any.type_url to the full-qualified method name (ex. /cosmos.gov.Msg/SubmitProposal) and set Any.value to the protobuf encoding of the request message (MsgSubmitProposal in this case).

Whilst this allows us to route Msgs to their respective handlers, it breaks the google.Protobuf.Any spec which states the following: The last segment of the URL's path must represent the fully qualified name of the type (as in path/google.protobuf.Duration). The name should be in a canonical form (e.g., leading "." is not accepted).

This aside from breaking the spec, it also causes problems in different google.protobuf.Any implementations across several languages, for example:

Testing the spec breakage:
grpcurl -plaintext -d '{"hash":"txhash"}' grpcendpoint:port cosmos.tx.v1beta1.Service/GetTx

Is going to return the following (truncated):

 "messages": [
        {"@error":"Send is not recognized; see @value for raw binary message data","@type":"/cosmos.bank.v1beta1.Msg/Send","@value":"Ci1jb3Ntb3MxdHFwd25seDU1OHIzNnBtZjVoOHh4Y3Q5NHFjcTAwaDJwNWV2YWcSLWNvc21vczFtcjU3eDRuMzlzczVsYzI2eHB4MGVwYzV0OTJhN3B5MmxzcDg4eRoLCgVzdGFrZRICMTA="}
      ]

This causes protobuf language compatibility to be unavailable for all the ServiceMsgs which are wrapped as Any's as every client would need to write its own type url resolver. Aside from this proto implementations which rely on Any implementations might still end up being broken as the message type would end up being resolved to something invalid.

This needs to be fixed as it propagates across every endpoint which contains service msgs, transactions etc (in and out of the cosmos-sdk).

Proposed solutions

mitigation 1

In order to mitigate the proposed solution would be to form a type URL like this /method_fullname/input_fullname, for example:

taken the rpc: Msg.SubmitProposal which lays in package cosmos.v1beta1.gov and accepts as input MsgSubmitProposal the type URL would be the following: /cosmos.v1beta1.gov.Msg.SubmitProposal/cosmos.v1beta1.gov.MsgSubmitProposal, this would not break google.protobuf.Any spec as the last path segment is a fully qualified type name (message fullname).

This would still be not optimal because consumers would still need to be aware of the service definitions (or type URL definitions from a reflection point of view) to correctly form and send messages.

mitigation 2

Do not encode the method in the any type at all, and use the correct expected type URL /cosmos.v1beta1.gov.MsgSubmitProposal, then map internally in our ServiceMsg router Input to Method.

Only limitation: there can only be one RPC which accepts the same proto.Message as Input (which is good practice by the way).

mitigation 3

Make ServiceMsg its own type defined as:

message ServiceMsg {
  string method = 1;
  google.Protobuf.Any msg = 2;
}

But this would probably force us in Tx to do a double any wrapping.

cc @AmauryM, @aaronc

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

My preference lies towards solution 2, it has one limitation, which in my opinion is just a good practice enforcement. Externally clients would not need to bother with ServiceMsg, ServiceMsg would remain something internal to the sdk's runtime.

@amaury1093
Copy link
Contributor

amaury1093 commented Apr 7, 2021

The last segment of the URL's path must represent the fully qualified name of the type (as in path/google.protobuf.Duration). The name should be in a canonical form (e.g., leading "." is not accepted).

You raise a good point: if it's spec-breaking, then our "service method fq name as type_url" is maybe a deal breaker. Where did you read this spec? When I was reviewing, I remember reading this one, where afaiu type_url and value can both be arbitrary.

If going with one of the proposed mitigations, (2) also sounds the best for me. However, how do we create a mapping proto Message fq_name -> service method fq_name? Do we just add a "/" char right after the the "Msg" string in the proto message type_url? (The way the SDK currently handles proto message sdk.Msgs is via legacy handlers, but I would prefer to remove it along with sdk.Msg, cf #8864).

Externally clients would not need to bother with ServiceMsg, ServiceMsg would remain something internal to the sdk's runtime.

Just to confirm, we will still keep the service Msg{...} in each module's proto files, right? Just that their implementation would return a correct Any instead of an abused Any. service Msg{...} are still useful for clients imo.

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

@AmauryM

You raise a good point: if it's spec-breaking, then our "service method fq name as type_url" is maybe a deal breaker. Where did you read this spec? When I was reviewing, I remember reading this one, where afaiu type_url and value can both be arbitrary.

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto#L128

If going with one of the proposed mitigations, (2) also sounds the best for me. However, how do we create a mapping proto Message fq_name -> service method fq_name? Do we just add a "/" char right after the the "Msg" string in the proto message type_url? (The way the SDK currently handles proto message sdk.Msgs is via legacy handlers, but I would prefer to remove it along

Yes this is the intention, based on how we handle the Msg service , clients would deal with ServiceMsgs as they were legacy Msgs, meaning that the type url would be "/" + proto.MessageName(msg). IDK if this could clash and cause conflicts with legacy msgs handlers, it'd need to be investigated (from client perspective it wouldn't, from module development perspective I think yes).

Just to confirm, we will still keep the service Msg{...} in each module's proto files, right? Just that their implementation would return a correct Any instead of an abused Any. service Msg{...} are still useful for clients imo.

Yes the service Msg remains, just type URL formation for routing towards the correct service implementer handler would change. When I meant servicemsg would remain internal to the sdk I intended that the client would not need to understand how to send a ServiceMsg (as it needs custom type_url formation) but just make sure the type it is sending towards the SendTx endpoint implements the sdk.ServiceMsg interface.

@tac0turtle
Copy link
Member

tac0turtle commented Apr 7, 2021

I would say this is pretty urgent, the rust team is working on proto txs and running into issues because we are not spec compliant. I would guess teams that want to use proto txs outside cosmjs are running into similar issues

@amaury1093
Copy link
Contributor

@fdymylja I think I might not have phrased my question well. Let me try again, it's related to "However, how do we create a mapping proto Message fq_name -> service method fq_name?"

Baseapp has a MsgServiceRouter, which maps service method fq_name -> grpc handler. When baseapp receives (e.g. from a client) a tx containing a Msg with "/" + proto.MessageName(msg), how does it know to which service method to route to (i.e. to which grpc handler)?

Apart from this question, overall I'm okay with mitigation (2).

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

@fdymylja I think I might not have phrased my question well. Let me try again, it's related to "However, how do we create a mapping proto Message fq_name -> service method fq_name?"

From my perspective the following would be the action points:

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

I would say this is pretty urgent, the rust team is working on proto txs and running into issues because we are not spec compliant. I would guess teams that want to use proto txs outside cosmjs are running into similar issues

Marko IDK what work is being done ATM, but if it includes json transcoding - I would put it temporarly on halt because ATM in the SDK we're not compliant with protojson spec too and it wouldn't be worth it IMHO to write a customization for rust too which would need to be dropped eventually.

@jgimeno
Copy link
Contributor

jgimeno commented Apr 7, 2021

I agree, this is an urgent issue that needs to be addressed with high priority for next release.

@amaury1093 amaury1093 added this to the v0.43 milestone Apr 7, 2021
@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

I will take a shoot on this following solution n2, as it is currently blocking me a little on rosetta for multinetwork support.

We can see how the fix shapes up and discuss it from there.

@fdymylja fdymylja self-assigned this Apr 7, 2021
@fdymylja fdymylja added the T: API Breaking Breaking changes that impact APIs and the SDK only (not state machine). label Apr 7, 2021
@aaronc
Copy link
Member

aaronc commented Apr 7, 2021

I think approach 3 is correct. We were aware of the possible spec breakage and as I recall the spec we found indicated that the type urls could be arbitrary. I don't think approach 2 is right and basically breaks the intention of ADR 031 altogether. We did consider taking approach 3 initially but thought we were not violating the spec in any major ways.

@aaronc
Copy link
Member

aaronc commented Apr 7, 2021

Also with approach 3 we don't need to do double Any wrapping. The request type is deterministic and we can serialization as bytes.

I would add that before we make a big change I would ask they we evaluate if this really is that big a deal. Maybe but I'm not 100% convinced. Btw last I checked rust will have other problems because they don't even have proto3 json. The intention with service messages is to do custom code generation anyway and when I looked at proto implementations most of them handled Any's pretty basically. There is no type resolver at all in js for instance and the expectation is IMHO generally that type resolvers need to be custom. For instance they preprend all type URLs in the default case with the non existent google APIs type server with the intention tha clients will resolve types over HTTP. So I think the expectation that all types should be registered via code generation is not realistic or aligned with the spec at all.

@tac0turtle
Copy link
Member

I would add that before we make a big change I would ask they we evaluate if this really is that big a deal. Maybe but I'm not 100% convinced. Btw last I checked rust will have other problems because they don't even have proto3 json. The intention with service messages is to do custom code generation anyway and when I looked at proto implementations most of them handled Any's pretty basically. There is no type resolver at all in js for instance and the expectation is IMHO generally that type resolvers need to be custom. For instance they preprend all type URLs in the default case with the non existent google APIs type server with the intention tha clients will resolve types over HTTP. So I think the expectation that all types should be registered via code generation is not realistic or aligned with the spec at all.

I would think being spec compliant is crucial and a large reason we migrated away from amino. We didn't want to have to write custom code for many languages. On top of this reason we also migrated away with the idea we would need to maintain less code and custom code generation is starting to smell like another item that will have a low bus factor. This is my 2cents.

@aaronc
Copy link
Member

aaronc commented Apr 7, 2021

Well maybe @fdymylja's approach 2 is a reasonable way to approach this. It can probably be done without changing how we register MsgServer's. I think it would basically make using the service approach optional for better or worse.

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

I think it would basically make using the service approach optional for better or worse.

It depends on the "module" interface, if a module needs to register a service msg desc to implement the module interface then service becomes mandatory. Which is something that I support by the way.

For instance they preprend all type URLs in the default case with the non existent google APIs type server with the intention tha clients will resolve types over HTTP.

The problem does not lay with building custom type resolvers, at least from golang's API this is quite easy and straightforward to do. The issue lays in Any's implementations in different languages as I have highlighted here:

(Go) https://github.com/protocolbuffers/protobuf-go/blob/v1.26.0/types/known/anypb/any.pb.go#L326 any.IsMessage(m proto.Message) is going to always evaluate to false, even if we try to compare it with the correct message.
(Python) https://github.com/protocolbuffers/protobuf/blob/master/python/google/protobuf/internal/well_known_types.py#L92 same as above.

this can behave in unpredictable ways, if you want for example to handle dynamic protobuf to support multiple chains without relying on app specific codecs it is gonna give you problems, but aside from this really, just try to build the proto files in python and query a transaction, it's not going to work without hacking the protobuf implementation or implementing your own codec, which IMHO defeats the purpose of protobuf if you have to do customizations to decode/encode stuff.

@aaronc
Copy link
Member

aaronc commented Apr 7, 2021

Alright, well I'm fine with approach 2 after considering all this.

Is this something you'll take on or should we?

We might be able to also mostly remove ServiceMsg from user APIs btw and rename MsgRequest to Msg (renaming the existing more involved Msg interface LegacyMsg). What do you think?

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 7, 2021

We might be able to also mostly remove ServiceMsg from user APIs btw and rename MsgRequest to Msg (renaming the existing more involved Msg interface LegacyMsg). What do you think?

Huge thumbs up for me.

I was also wondering if this might be the correct context to also extend the module interface with a method of the likes of:

RegisterMsgServer(msgServerRouter)

This would clear more boilerplate from codec.go, and also identify a clear path forward in which we only support msg handlers via protobuf service definitions.

And deprecate sdk.Handler altogether in favor of MsgServer, since from my investigation after this change (mitigation2) the same Msg cannot be handled by both sdk.Handler and MsgServer.

This would only impact app dev UX, but since we're already deprecating the legacy sdk.Msg this would allow us to kill two birds with one stone.

Is this something you'll take on or should we?

After a talk with @jgimeno and @alessio we would be happy to take care of this

@aaronc
Copy link
Member

aaronc commented Apr 7, 2021

Okay, things that we will want to address from our end is making this approach canonical in x/feegrant and x/authz. It's just some minor renaming but I think it will address some of the concerns in #9037.

@aaronc
Copy link
Member

aaronc commented Apr 12, 2021

When we implement this, let's make sure that the state machine still accepts fully-qualified method names (as in the current ADR 031) for the foreseeable future. We don't want to break any clients that have started to use this.

Also let's make sure ADR 031 and appropriate docs get updated. I can tackle the ADR updates if that helps.

@robert-zaremba
Copy link
Collaborator

What's wrong with mitigation 3 ? I think it's the cleanest one. We have a new type for the RPC method. And as @aaronc mentioned we could avoid double Any packing by using bytes.

@fdymylja
Copy link
Contributor Author

fdymylja commented Apr 13, 2021

What's wrong with mitigation 3 ? I think it's the cleanest one. We have a new type for the RPC method. And as @aaronc mentioned we could avoid double Any packing by using bytes.

It would add 1 more type of Msg, it would be breaking with current implementation.

Then the thing that I don't like about using bytes over Any is that basically you're aliasing Any under the hood, each client would need to do custom decoding to get the concrete type of the Msg. And this propagates to any endpoint which contains a Tx or a ServiceMsg, and it assumes clients to build their own service msg registry and all of that.

I think solution 2 is better, simply because it doesn't break the spec, it allows us to build MsgServer, clients can query endpoints which contain msgs or txs and the proto implementation will resolve automatically for them. This seems stupid but IMHO it's an important feature UX wise, would you rather query a TX and get its raw contents as bytes or would you like to see the actual MsBankSend and so on?

The only limitation which solution 2 has is that you have 1 MSG 1 RPC which is a good practice enforcement. (And also I don't think anyone would like to have bank.Send and bank2.Send to be callable using the same proto.Message, it'd cause a lot of ambiguity).

Solution 1 was also good IMHO, it was the most straightforward to implement.

@fdymylja fdymylja removed their assignment Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants