Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Descriptors #7097

Closed
aaronc opened this issue Aug 18, 2020 · 14 comments
Closed

Schema Descriptors #7097

aaronc opened this issue Aug 18, 2020 · 14 comments

Comments

@aaronc
Copy link
Member

aaronc commented Aug 18, 2020

This is linked to meta-issue #7096.

Summary

This proposes a new x/schema module for reflecting module kv-store schemas to clients

Problem Definition

Currently modules store state in their kv-store using various prefixed store keys. In order for a client to understand what the layout of a module kv-store is, clients would need to directly inspect keeper code to figure out all of the prefixes.

Clients may want to understand the store layout in order to:

Proposal

An x/schema module which allows clients to query kv-store layouts for registers modules is proposed.

The core of the schema module would be some schema description language which can accurately describe all store layouts in the SDK.

Schemas would be available for clients to use either via runtime query methods or possibly statically using descriptor files.

An appropriate schema for these descriptors would need to be developed based on existing use cases. x/staking is a great example to work off of where effectively a database with complex secondary indexes has been created (see x/staking/types/keys.go).

Here is one potential protobuf layout:

message KeyDescriptor {
    string name = 1;
    string description = 2;
    string store = 4;
    bytes prefix = 5;
    repeated Part parts = 6;
    string value_type = 7;

    message Part {
        oneof sum {
            Bytes bytes = 1;
            String string = 2;
            Separator separator = 3;
        }

        message Bytes {
          string name = 1;
          string description = 2;
          uint32 fixed_width = 3;
        }

        message String {
            string name = 1;
            string description = 2;
        }

        message Separator {
            string separator = 1;
        }
    }
}

Here’s an example using the above KeyDescriptor for part of x/staking :

var StakingSchema = []keeper.KeyDescriptor{
	{
		Name:   "LastValidatorPowerKey",
		Prefix: LastValidatorPowerKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
			}},
		ValueProtoType: &types.UInt64Value{},
	}, {
		Name:           "LastTotalPowerKey",
		Description:    "",
		Prefix:         LastTotalPowerKey,
		KeyParts:       nil,
		ValueProtoType: &sdk.IntProto{},
	}, {
		Name:        "ValidatorsKey",
		Description: "",
		Prefix:      ValidatorsKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
			}},
		ValueProtoType: &Validator{},
	}, {
		Name:        "ValidatorsByConsAddrKey",
		Description: "",
		Prefix:      ValidatorsByConsAddrKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator consensus address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ConsAddress{},
			}},
		ValueProtoType: &types.BytesValue{},
		ValueGoType:    sdk.ValAddress{},
	},
}
@i-norden
Copy link
Contributor

i-norden commented Dec 2, 2020

Hi @aaronc I've been thinking about this for a while and haven't come up with any way to improve the schema descriptor.

I'm still trying to understand the best way to register other modules' schemas. For backwards compatibility do we want to avoid needing to change a module's Keeper interface or AppModule constructor in order to register its schema? In that case we could use a helper function to generate the schemas (automatically from specified YAML or TOML files, as you had mentioned previously) for modules from within an app's AppCreator, and then load those schemas directly into the x/schema module.

Alternatively, if we don't mind not being backwards compatible, we could define an expected keeper interface in x/schema e.g.

// SchemaKeeper is an expected keeper interface for exporting a module's schema to x/schema
type SchemaKeeper interface {
    GetSchema() []keeper.KeyDescriptor
}

And add support for this interface to modules, whose keepers we could then load into x/schema. This would leave it up to the module to define/generate their schema, instead of the app developer handling that within the AppCreator function.

@aaronc
Copy link
Member Author

aaronc commented Dec 2, 2020

Hi @i-norden, so generally I think it's too early to think about how all of this stuff gets wired up - whether it's in keepers or app modules. The most relevant thing right now I think is the structure of those schemas - i.e. what is the concrete schema of the schema for defining key-value pair layouts.

Also, the schema isn't really relevant for apps - it's more relevant for clients so it doesn't really even need to get registered at runtime. It could be a set of yaml files for instance that are used by clients similar to .proto files.

@aaronc
Copy link
Member Author

aaronc commented Dec 2, 2020

Maybe I should qualify that a bit and say it's not relevant at runtime right now, but maybe in the future it could be - if for instance there was a contract that wanted to query it. But I don't really think that should be the concern right now. Just having some descriptor file should be the starting point.

@i-norden
Copy link
Contributor

i-norden commented Dec 2, 2020

Thanks @aaronc! That's very helpful. I don't know how much I'll be able to improve on what you've put forth here, it appears perfectly comprehensive to me. I'll put some more thought into it.

Only thing that comes to mind right now is if it might be beneficial to explicitly define relations between different KeyDescriptors within a schema e.g. signifying that the operator address values from "ValidatorsByConsAddrKey" are the KeyParts of "LastValidatorPowerKey" and "ValidatorsKey". We could do this by adding a new relations field to the KeyParts:

message Part {
        oneof sum {
            Bytes bytes = 1;
            String string = 2;
            Separator separator = 3;
        }

        message Bytes {
            string name = 1;
            string description = 2;
            uint32 fixed_width = 3;
            repeated string relations = 4;
        }

        message String {
            string name = 1;
            string description = 2;
            repeated string relations = 3;
        }

        message Separator {
            string separator = 1;
        }
    }
var StakingSchema = []keeper.KeyDescriptor{
	{
		Name:   "LastValidatorPowerKey",
		Prefix: LastValidatorPowerKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
                                Relations:   []string{"ValidatorsByConsAddrKey"},
			}},
		ValueProtoType: &types.UInt64Value{},
	}, {
		Name:           "LastTotalPowerKey",
		Description:    "",
		Prefix:         LastTotalPowerKey,
		KeyParts:       nil,
		ValueProtoType: &sdk.IntProto{},
	}, {
		Name:        "ValidatorsKey",
		Description: "",
		Prefix:      ValidatorsKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
                                Relations:   []string{"ValidatorsByConsAddrKey"},
			}},
		ValueProtoType: &Validator{},
	}, {
		Name:        "ValidatorsByConsAddrKey",
		Description: "",
		Prefix:      ValidatorsByConsAddrKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator consensus address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ConsAddress{},
			}},
		ValueProtoType: &types.BytesValue{},
		ValueGoType:    sdk.ValAddress{},
	},
}

Would you like me to begin drafting an ADR for this (after giving it some more thought)? And/or would it be helpful for me to finish mocking up the KeyDescriptors for the rest of the staking module, or the other common modules?

@aaronc
Copy link
Member Author

aaronc commented Dec 2, 2020

Yep, foreign keys/relations is a great idea.

I think it would be good to continue surveying the usage in staking and other modules. The discussion in #8041 is also relevant to this.

@alexanderbez
Copy link
Contributor

Why is this being proposed as a keeper? Is it part of the state-machine execution model? If not, it shouldn't be a "keeper" (although I guess that definition is also going away and/or changing).

@aaronc
Copy link
Member Author

aaronc commented Dec 3, 2020

I think I explained that it shouldn't be part of the state machine @alexanderbez. It's just for clients. But... maybe an x/schema module that just serves up schemas for clients to query does make sense. Don't know if it would actually need to keep the schemas in the state store... probably not unless there is a use case.

@alexanderbez
Copy link
Contributor

Right, no need for it to be a module then, right? Also, will it fulfill the module interface(s)?

@aaronc
Copy link
Member Author

aaronc commented Dec 3, 2020

Well, if we had schemas do you think clients would ever want to make proofs about them? If not, then there's no need for state storage. But, it might register a gRPC query service which might be easiest to wire up as a module in RegisterServices. There is an alternative though where we just start with .yaml files on disk that clients could consume as they choose...

@robert-zaremba
Copy link
Collaborator

I'm for starting without state machine integration. The use case is clearly for off-chain applications. If one will need to validate data, it will need to have a data proof (schema won't help). In the current storage design, these proofs change between blocks.

I don't see any advantage at the moment of keeping it in files versus defining it directly in in Go file. In the future, if a single module has implementation in different languages, that could make sense.

@i-norden
Copy link
Contributor

i-norden commented Dec 10, 2020

After thinking about this some more, one other thing I think we should add is an indication of the underlying data structure- e.g. IAVL- so that we know how to reconstitute (and IPLD-ize) it from all the key-value pairs we listen to.

Somewhat related, looking ahead there are a few approaches to consider for supporting the arbitrary protobuf types in IPLD

  1. Leave them undefined, e.g. for IAVL just define a multicodec packed content type for IAVL nodes and leave the content type of the stored values unspecified. For eth's state trie, for example, there is a codec to specify trie nodes and there is also a codec to specify state account as the state value type. But for the storage trie there is only a codec for the trie node since the storage value types are indeterminate.
  2. Define a new composite multicodec, where the prefix byte specifies protobuf and suffix bytes specify specific protobuf types that have been registered
  3. Define a new multicodec which specifies that the first x bytes of the hash-linked object are another CID/multihash which references the .proto message definition that the remaining bytes should be unpacked into

3 is the most elegant way of supporting things, I think, but it would get very complicated for proto types which import from other proto packages. 2 is clunky but would be the more idiomatic way to support various types. 1 is the simplest approach and its precedent is set by how the eth storage trie is supported, but it doesn't leverage protobuf like we could.

In that third case, it would be useful to be able to retrieve the complete .proto message definitions for ValueProtoType from the schema api. But I'm not sure how this is tenable for proto types that import other proto packages.

Off in dream land now, but I think in an ideal world there'd be a protobuf compiler plugin that allowed .proto files to import other proto packages by CID, then we could use IPFS as a hash-linked registry for every protobuf type and (3) would be completely feasible.

@robert-zaremba
Copy link
Collaborator

@i-norden , could you have a look at Canonical ID section of #7100 (comment) ?
This week I was thinking if we can solve this issue without introducing a schema module. Specifically, create more manageable storage keys. Would be great to hear your feedback.

@robert-zaremba
Copy link
Collaborator

Also, for a general note, our approach for the future of storage is to get rid of IAVL.

@aaronc
Copy link
Member Author

aaronc commented Apr 21, 2021

Replaced by #9158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants