Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add ADR 054 semver compatible SDK modules #11802

Merged
merged 36 commits into from
Mar 8, 2023
Merged

Conversation

aaronc
Copy link
Member

@aaronc aaronc commented Apr 27, 2022

Description

Addresses issues related to semantic versioning and protobuf generated code discussed in #10582.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@aaronc aaronc requested a review from alexanderbez as a code owner April 27, 2022 16:07
@github-actions github-actions bot added the T: ADR An issue or PR relating to an architectural decision record label Apr 27, 2022
Copy link
Contributor

@amaury1093 amaury1093 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem section is clear to me.

The decision chosen here is currently my favorite approach, on a high-level. But I'm personally also not 100% confident we thought about all edge/weird cases. Maybe issues & iterations of this ADR might come up during implementation.

@aaronc
Copy link
Member Author

aaronc commented May 4, 2022

This needs more reviews. @alexanderbez @fdymylja ? This is one of the more complicated pieces of using semver with protobuf and needs to be thought through carefully

@alexanderbez
Copy link
Contributor

Will be reviewed by EOD

@aaronc
Copy link
Member Author

aaronc commented May 4, 2022

One thought I have is that maybe in the public api module only gRPC Client interfaces should get generated and in the internal server codegen only gRPC Server interfaces get generated. This can make it more explicit which code is for what and why they are separated. Public Client, internal Server

protoreflect.ProtoMessage
GetFromAddress() string
GetToAddress() string
GetAmount() []v1beta1.CoinI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this returning v1beta1.CoinI. What if the proto type changes between versions of MsgSend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the idea is to move towards interfaces where possible in this codegen. Changing the proto type (say to v1.CoinI) would be breaking which is one thing we're trying to avoid with semver. It should be in a differente semver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I'm not following. Isn't this a type of "foo"? So I don't see how you can have a version specific type here.

GetToAddress() string
GetAmount() []v1beta1.CoinI

IsCosmosBankV1MsgSend() // this method distinguishes this interface from other proto types which may otherwise implement the same fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? Can't callers just do an explicit type check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but they don't know which concrete type it is because there's different generated code in the same binary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. These can get pretty verbose if there are many versions, but I guess there's no other way.


type MsgClient interface {
Send(ctx context.Context, in MsgSendI, opts …grpc.CallOption) (MsgSendResponseI, error)
// note that for MsgSendResponseI we would need to have a generated response wrapper type to deal with the case where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this comment. What does this mean exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance if the client version is newer than the server version, then there may be a newer field in the client version of MsgSendResponseI than the server version. A wrapper type will need to gracefully handle the fact that the newer field is missing and return the default value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a distinction in this ADR that describes client vs server version? Or is that something that already exists in the Proto usage within the SDK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by client and server version is simply the version of the proto files that the client and server have. It could be the scenario that the client has newer proto files than the server and vice versa. We need to gracefully handle both.

@blushi
Copy link
Contributor

blushi commented Jun 8, 2022

@aaronc any updates on this?

@aaronc
Copy link
Member Author

aaronc commented Jun 8, 2022

@aaronc any updates on this?

The main thing here is to get more feedback from the SDK team as to whether this is an acceptable solution. I've tried to outline the rationale and solution as best as I can here. If there's still lack of clarity please ask questions or we can discuss in a call. This is not a simple matter so we should make sure we consider carefully

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@JeancarloBarrios
Copy link
Contributor

I've added a section on unit and integration test frameworks here (actually wrote this a little while ago) - see the Testing header in the ADR. There has been some discussion about better testing frameworks in the SDK. Hopefully this may be useful. The general idea is that there is a testing matrix that tests different versions of module against each other. This may be jumping the gun a bit on a decision about approach here, but this does reflect the last conversation the SDK team had internally.

I agree with Aron's addition of a section on unit and integration test frameworks in the ADR. The concept of a testing matrix that tests different versions of modules against each other is a practical approach.

I also believe that ADR-54 and ADR-33 will allow us to integrate with tools built in other ecosystems like for example Rust for ZK, removing the need for some teams to create their own Rust Cosmos SDKs as Zaki pointed out in an interview that the obstacles for ZK in the Cosmos ecosystem can be quite significant without languages like rust.

@robert-zaremba
Copy link
Collaborator

Can you explain more what feels hacky to you about it?

  • compatibility in the "bytes" level, rather than an API level
  • "big changes to how people use protobuf types and would be a substantial rewrite of the protobuf code generator. "

@JeancarloBarrios
Copy link
Contributor

JeancarloBarrios commented Jan 11, 2023

  • compatibility in the "bytes" level, rather than an API level

Idk it feels to me that compatibility is still at the API level and is just protobuff instead of interfaces.

@github-actions github-actions bot removed the stale label Jan 12, 2023
@aaronc
Copy link
Member Author

aaronc commented Jan 12, 2023

Can you explain more what feels hacky to you about it?

  • compatibility in the "bytes" level, rather than an API level

Yeah, I also think this is not true. It's still API level.

  • "big changes to how people use protobuf types and would be a substantial rewrite of the protobuf code generator. "

Code generator changes are optional, although it would be a big change if we go that way.

@aaronc
Copy link
Member Author

aaronc commented Jan 27, 2023

A bit of an update on my thoughts here. I do actually think using an API module (approach A) generally makes sense and is more convenient. I think changing the assumption that there is a single generated file per proto file introduces lots of complexity. We have discussed scoping of API modules/proto files and I think in the SDK and generally there should be one API module per go module. I think otherwise versioning becomes much more complex as we've discussed elsewhere.

I'm not a big fan of approach B currently (code generator changes) unless integrating with something like Rust is a high priority and serialization is benchmarked to be the bottleneck.

One alternative idea which hasn't really been discussed yet but is alluded to in approach C may be viable is actually embracing build time tags and replace directives. For instance, if module foo/v2 was built against the foo/api v1.2, it should be an error for the app to include foo/api v1.3 because then the file descriptors and generated proto code would be for a later version of the foo module (maybe foo/v3). I mentioned embedding file descriptors in approach A and doing runtime checks to deal with this issue. But instead, maybe we could use the embedded file descriptors just to ensure that they match the ones in the generated code at runtime. If not, there could be an error on initialization and the correct thing would be to introduce a replace directive into the app to ensure foo/api v1.2 and nothing later is used. Instead of doing runtime minor version checks - say if bar/v2 wanted to optionally use some features introduced in foo/api v1.3 - the correct way to support both that and foo/api v1.2 would be with a go build tag. If I think about what would happen in another language like say rust, I think it would be something like this, although I think rust features are much more ergonomic than go build tags. Normalizing the use of replace directives and build tags might be the most technically simple way to deal with this. It would avoid weird runtime checks as to whether fields intended for a later version of a module were set and would allow the global protobuf registry to be generally trusted. It would avoid requiring all keeper usage to be refactored to ADR 033 I think. We probably still wouldn't want to define interface methods on generated types but the concern would be more semantic versioning than incorrect behavior. The main safeguard needed I think would be this one startup check to see if the pinned file descriptors (or even a hash of them) matches what's available in the global registry. Anyway, something to consider...

@robert-zaremba
Copy link
Collaborator

One alternative idea which hasn't really been discussed yet but is alluded to in approach C may be viable is actually embracing build time tags and replace directives

Can we do it without replace? It feels that we may abuse replace directive, but using it for all API dependencies. But maybe it is ok? Other idea would be:

  1. if module x/foo requires x/bar/api v1.2 then it can copy it's descriptor, and check during startup that it matches (not sure if it's possible to do it in in compilation time)... that being said compilation time checks are better.

foo/api v1.3 - the correct way to support both that and foo/api v1.2 would be with a go build tag

How is it possible? I thought that go dependency system only allows to link only one minor (here 1.x) version in one binary.

@robert-zaremba
Copy link
Collaborator

Can you explain more what feels hacky to you about it?

  • compatibility in the "bytes" level, rather than an API level

Yeah, I also think this is not true. It's still API level.

In Approach B) Changes to Generated Code you write about a need to translate types, and generate / create new internal types to satisfy binary interface. Hence for module developer it is ABI level:

For instance, if bar needs to talk to foo, it could generate its own version of MsgDoSomething as bar/internal/foo/v1.MsgDoSomething and just pass this to the inter-module router which would somehow convert it to the version which foo needs (ex. foo/internal.MsgDoSomething).
[...] If modules only do ADR 033 message passing then a naive and non-performant solution for converting bar/internal/foo/v1.MsgDoSomething to foo/internal.MsgDoSomething would be marshaling and unmarshaling in the ADR 033 router.

With API level compatibility, a developer would directly use /foo/v1.MsgDoSomething rather than generating the internal types and use router etc...

@peterbourgon
Copy link

One alternative idea which hasn't really been discussed yet but is alluded to in approach C may be viable is actually embracing build time tags and replace directives.

  1. Build tags, or build constraints, are a mechanism to control which source files are included in a compilation directive. They're meant to dictate which OS, platform, and/or build-target specific source code files should be included during compilation. They're not designed as a way to express runtime schema constraints between communicating processes.
  2. Replace directives are a mechanism for end-users to temporarily redefine import paths when building binaries, and are not meant as as a permanent way to express modifications to a dependency graph.

Both of these points are unfortunate, and certainly make life harder for users, like the Cosmos SDK, who want to express stuff that is straightforward in other language ecosystems like C or Rust. But Go simply doesn't let you do it.

@aaronc
Copy link
Member Author

aaronc commented Feb 7, 2023

Despite this approach not being well-supported by go tooling, it might be the least bad short-term way to deal with this.

The replace directives would only happen on the app level, so there is no need for them to apply transitively to any other library consumers. And they could be considered temporary - the only real reason you'd want to do this is if you want to hold off on upgrading to a new state machine version after it has been released. Hopefully, that's a temporary concern.

Build tags being used to express conditions would hopefully be a rare occurrence where the convention could be to hold off on adopting newer API methods until all consumers are ready to adopt the newest state machine version of a dependency. But as long as they apply transitively to dependencies (I'm not sure of this) they could be an option.

At least for the time-being, I'd like to avoid any super radical solutions like requiring all modules to migrate to ADR 033, generated code changes or excessive runtime checks. The speed at which modules are being split up into standalone units is proceeding faster than any of those other solutions will be ready. So having a small startup check seems like the least bad option.

@aaronc aaronc mentioned this pull request Feb 22, 2023
19 tasks
@kocubinski
Copy link
Member

kocubinski commented Mar 8, 2023

This document was originally written for, and shared with, the ADR-54 working group and is repeated here. Internally we now have consensus on moving towards message passing (for cross process composability). There is also agreement that forward compatibility, in the context of the same version message with added fields, is a non-goal, which is the same conclusion the below arrives at regarding sane API compatibility.

Principles of Sane API Development

This section identifies the principles of an API development and maintenance path for the CosmosSDK which is
already well understood in practice, reasonably sane, not too clever, and solves the problems put forward by
ADR-054.

Interface

  1. CosmosSDK APIs are specified by proto IDL.
  2. There is one API go module containing code generated from (1).
  3. Within this API module, SDK module APIs are separated by path.
  4. Semantic versioning of module APIs is maintained by path, e.g. bank/v1/tx.go, bank/v2/tx.go.
  5. API changes within a single module-version must be non buf-breaking (additive)
  6. Old versions of APIs are kept forever.
  7. When an additive API change is state machine breaking for a module a new API version (folder) must be
    introduced.
  8. Buf breaking changes are permitted between module API versions.

Points (1) through (5), and to some extent (6) (as in the case of x/gov), are already the SDK status quo. The
only real departure is (7), with (8) as a refinement. Problem 3: Handling Minor Version Incompatibilities is
solved by (7) by breaking the contract in a way which forces the developer to explicitly handle message
conversion.

Taken as a whole these points also address Problem 2: Circular dependencies since the API is
versioned separately from sate machine modules.

Implementation

  1. Module API types are permitted in a module’s message and query server API
  2. Following the Robustness Principle, modules message servers should support all past versions of API
    messages; a module must know how to map past versions of messages to the latest.
  3. A module is made capable of supporting future versions of API messages (i.e. the SDK module is held back
    while the API advances) through the injection of an adapter with knowledge of mapping between past and
    future API messages.
  4. Therefore, for each module message server API at semantic version n, mapping code must be committed to
    translate to versions n-1 and n+1.
  5. Module API types must not be present in a Keeper API, as presented at the consumer (i.e. expected_keeper.go)

Points (1) through (3) address Problem 1: Semantic Import Versioning Compatibility, (2) for backward, and (3)
for forward compatibility at the message server level.

Point (4) can be implemented without a version-aware message router or inter-module message client, but will
require a fork on either the client or server module to migrate the message from version n to n-1 to fully
support (3), forward compatibility. Backward compatibility (2) can be handled at the server directly, and
therefore no fork is needed. The introduction of a version aware routing layer or inter-module message client
can push this problem up to configuration, thereby removing the need for a fork. I would not recommend either
of these solutions (router or message client) initially until we discover how common the problem of forward
compatibility is in practice.

Point (5) is nearly the status the quo of the SDK with a few (easily fixable) exceptions. By maintaining a
consumer side Keeper API free from the types defined in (Interface/2) we allow for continued Keeper API level
compatibility between modules irrespective of SDK API version, provided we continue to implement non-breaking,
additive changes to Keeper APIs.

Summary

To summarize, it seems possible to achieve the goals of ADR-54, namely SDK modules as semantically versioned
independent go modules, with some strategic refactoring and without an SDK wide rewrite. Solutions (A) and
(B) proposed in ADR-54 also achieve this goal too but with increased scope and features. This proposal seems
to the author the minimum distillation of work required for the goals of a sensible dependency graph and
independent go modules in the SDK.

References

@aaronc
Copy link
Member Author

aaronc commented Mar 8, 2023

Based on our latest working group discussion, we decided to merge this PR as a draft and to update it later once we have a clearer design.

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the working group call we decided to merge this as a draft.

@aaronc aaronc merged commit 2928fa4 into main Mar 8, 2023
@aaronc aaronc deleted the aaronc/adr-proto-go-module branch March 8, 2023 16:25
@robert-zaremba
Copy link
Collaborator

robert-zaremba commented Mar 17, 2023

@kocubinski thanks for the summary.

Backward compatibility (2) can be handled at the server directly, and therefore no fork is needed.

how this can be achieved?

If our msg server defines the following RPC (in Go):

import "cosmos-sdk/api/cosmos/bar/v3"

func (s MsgSrv) Foo(b api.Bar) {}

How method Foo can handle api/v2 directly? You will have a compile time error. This would have to go either through interfaces (api.Bar being an interface) -- so not as generated by protobuf. Moreover, since api/v3 may not be backward compatible with apiv/3 then even api.Bar as an interface won't work.

@peterbourgon
Copy link

import (
    bar_v1 "cosmos-sdk/api/cosmos/bar/v1"
    bar_v2 "cosmos-sdk/api/cosmos/bar/v2"
    bar_v3 "cosmos-sdk/api/cosmos/bar/v3"
)

func (s MsgServer) FooV1(b bar_v1.Bar) { ... }
func (s MsgServer) FooV2(b bar_v2.Bar) { ... }
func (s MsgServer) FooV3(b bar_v3.Bar) { ... }

@kocubinski
Copy link
Member

Basically the above. I think there could be a DevUX optimization on top with dynamic dispatch like below. Multiple wire protocol implementations could be bound to the same go RPC function too, but it is probably more useful intraprocess as a go API.

import (
    bar    "cosmos-sdk/api/cosmos/bar"
    bar_v1 "cosmos-sdk/api/cosmos/bar/v1"
    bar_v2 "cosmos-sdk/api/cosmos/bar/v2"
    bar_v3 "cosmos-sdk/api/cosmos/bar/v3"
)

// defined in "cosmos-sdk/api/cosmos/bar"
// each version of `Bar` implements this interface, which could be included in code gen.
type BarMessageI interface {
    isBarMessage()
}

// pure boilerplate supporting dynamic dispatch on a interface
func (s MsgServer) Foo(b bar.BarMessageI) {
    switch m := b.(type) {
	case bar_v1.Bar:
		s.FooV1(m)
	case bar_v2.Bar:
		s.FooV2(m)
	case bar_v3.Bar:
		s.FooV3(m)
	default:
		// return error
	}
}

func (s MsgServer) FooV1(b bar_v1.Bar) { s.FooV2(b.ToV2(); }
func (s MsgServer) FooV2(b bar_v2.Bar) { so.FooV3(b.ToV3(); }
func (s MsgServer) FooV3(b bar_v3.Bar) { ... }

Or if we wanted to throw out even more type safety:

func (s MsgServer) Foo(b any) {  ... }

@robert-zaremba
Copy link
Collaborator

But how do you do it with proto? The protobuf compiler don't generate methods for FooV1, FooV2 ... unless you explicitly define them in the Protobuf Service as different RPC methods.

Also Protobuf compiler uses structs in the interfaces, not interfaces (Foo(b bar.BarMessageI)).

So to make it working we would require another custom change in the protobuf compiler, right?

@kocubinski
Copy link
Member

Practically speaking (about implementation) I think it'd make sense to separate FooV[1-3] as Foo in go packages v[1-3] to align with the proto files and generated api types. I don't think this would require a code generator change.

@peterbourgon
Copy link

peterbourgon commented Mar 22, 2023

@robert-zaremba

But how do you do it with proto? The protobuf compiler don't generate methods for FooV1, FooV2 ... unless you explicitly define them in the Protobuf Service as different RPC methods.

No change to the compiler should be necessary. Distinct (and therefore incompatible) proto services BarV1, BarV2, etc. are necessarily defined in distinct proto files, which therefore generate distinct code when run thru protoc, which lives in distinct Go packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: ADR An issue or PR relating to an architectural decision record Type: ADR
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.