Schema Descriptors #7097

aaronc · 2020-08-18T19:11:25Z

This is linked to meta-issue #7096.

Summary

This proposes a new x/schema module for reflecting module kv-store schemas to clients

Problem Definition

Currently modules store state in their kv-store using various prefixed store keys. In order for a client to understand what the layout of a module kv-store is, clients would need to directly inspect keeper code to figure out all of the prefixes.

Clients may want to understand the store layout in order to:

make queries with merkle proofs, or
synchronize state to an external database (Synchronize state to an external database #7099)

Proposal

An x/schema module which allows clients to query kv-store layouts for registers modules is proposed.

The core of the schema module would be some schema description language which can accurately describe all store layouts in the SDK.

Schemas would be available for clients to use either via runtime query methods or possibly statically using descriptor files.

An appropriate schema for these descriptors would need to be developed based on existing use cases. x/staking is a great example to work off of where effectively a database with complex secondary indexes has been created (see x/staking/types/keys.go).

Here is one potential protobuf layout:

message KeyDescriptor {
    string name = 1;
    string description = 2;
    string store = 4;
    bytes prefix = 5;
    repeated Part parts = 6;
    string value_type = 7;

    message Part {
        oneof sum {
            Bytes bytes = 1;
            String string = 2;
            Separator separator = 3;
        }

        message Bytes {
          string name = 1;
          string description = 2;
          uint32 fixed_width = 3;
        }

        message String {
            string name = 1;
            string description = 2;
        }

        message Separator {
            string separator = 1;
        }
    }
}

Here’s an example using the above KeyDescriptor for part of x/staking :

var StakingSchema = []keeper.KeyDescriptor{
	{
		Name:   "LastValidatorPowerKey",
		Prefix: LastValidatorPowerKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
			}},
		ValueProtoType: &types.UInt64Value{},
	}, {
		Name:           "LastTotalPowerKey",
		Description:    "",
		Prefix:         LastTotalPowerKey,
		KeyParts:       nil,
		ValueProtoType: &sdk.IntProto{},
	}, {
		Name:        "ValidatorsKey",
		Description: "",
		Prefix:      ValidatorsKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
			}},
		ValueProtoType: &Validator{},
	}, {
		Name:        "ValidatorsByConsAddrKey",
		Description: "",
		Prefix:      ValidatorsByConsAddrKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator consensus address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ConsAddress{},
			}},
		ValueProtoType: &types.BytesValue{},
		ValueGoType:    sdk.ValAddress{},
	},
}

The text was updated successfully, but these errors were encountered:

i-norden · 2020-12-02T20:56:24Z

Hi @aaronc I've been thinking about this for a while and haven't come up with any way to improve the schema descriptor.

I'm still trying to understand the best way to register other modules' schemas. For backwards compatibility do we want to avoid needing to change a module's Keeper interface or AppModule constructor in order to register its schema? In that case we could use a helper function to generate the schemas (automatically from specified YAML or TOML files, as you had mentioned previously) for modules from within an app's AppCreator, and then load those schemas directly into the x/schema module.

Alternatively, if we don't mind not being backwards compatible, we could define an expected keeper interface in x/schema e.g.

// SchemaKeeper is an expected keeper interface for exporting a module's schema to x/schema
type SchemaKeeper interface {
    GetSchema() []keeper.KeyDescriptor
}

And add support for this interface to modules, whose keepers we could then load into x/schema. This would leave it up to the module to define/generate their schema, instead of the app developer handling that within the AppCreator function.

aaronc · 2020-12-02T21:06:48Z

Hi @i-norden, so generally I think it's too early to think about how all of this stuff gets wired up - whether it's in keepers or app modules. The most relevant thing right now I think is the structure of those schemas - i.e. what is the concrete schema of the schema for defining key-value pair layouts.

Also, the schema isn't really relevant for apps - it's more relevant for clients so it doesn't really even need to get registered at runtime. It could be a set of yaml files for instance that are used by clients similar to .proto files.

aaronc · 2020-12-02T21:09:13Z

Maybe I should qualify that a bit and say it's not relevant at runtime right now, but maybe in the future it could be - if for instance there was a contract that wanted to query it. But I don't really think that should be the concern right now. Just having some descriptor file should be the starting point.

i-norden · 2020-12-02T23:19:41Z

Thanks @aaronc! That's very helpful. I don't know how much I'll be able to improve on what you've put forth here, it appears perfectly comprehensive to me. I'll put some more thought into it.

Only thing that comes to mind right now is if it might be beneficial to explicitly define relations between different KeyDescriptors within a schema e.g. signifying that the operator address values from "ValidatorsByConsAddrKey" are the KeyParts of "LastValidatorPowerKey" and "ValidatorsKey". We could do this by adding a new relations field to the KeyParts:

message Part {
        oneof sum {
            Bytes bytes = 1;
            String string = 2;
            Separator separator = 3;
        }

        message Bytes {
            string name = 1;
            string description = 2;
            uint32 fixed_width = 3;
            repeated string relations = 4;
        }

        message String {
            string name = 1;
            string description = 2;
            repeated string relations = 3;
        }

        message Separator {
            string separator = 1;
        }
    }

var StakingSchema = []keeper.KeyDescriptor{
	{
		Name:   "LastValidatorPowerKey",
		Prefix: LastValidatorPowerKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
                                Relations:   []string{"ValidatorsByConsAddrKey"},
			}},
		ValueProtoType: &types.UInt64Value{},
	}, {
		Name:           "LastTotalPowerKey",
		Description:    "",
		Prefix:         LastTotalPowerKey,
		KeyParts:       nil,
		ValueProtoType: &sdk.IntProto{},
	}, {
		Name:        "ValidatorsKey",
		Description: "",
		Prefix:      ValidatorsKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator operator address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ValAddress{},
                                Relations:   []string{"ValidatorsByConsAddrKey"},
			}},
		ValueProtoType: &Validator{},
	}, {
		Name:        "ValidatorsByConsAddrKey",
		Description: "",
		Prefix:      ValidatorsByConsAddrKey,
		KeyParts: []keeper.KeyPart{
			keeper.BytesKeyPart{
				Name:        "Operator",
				Description: "Validator consensus address",
				FixedWidth:  sdk.AddrLen,
				GoType:      sdk.ConsAddress{},
			}},
		ValueProtoType: &types.BytesValue{},
		ValueGoType:    sdk.ValAddress{},
	},
}

Would you like me to begin drafting an ADR for this (after giving it some more thought)? And/or would it be helpful for me to finish mocking up the KeyDescriptors for the rest of the staking module, or the other common modules?

aaronc · 2020-12-02T23:27:16Z

Yep, foreign keys/relations is a great idea.

I think it would be good to continue surveying the usage in staking and other modules. The discussion in #8041 is also relevant to this.

alexanderbez · 2020-12-03T14:07:16Z

Why is this being proposed as a keeper? Is it part of the state-machine execution model? If not, it shouldn't be a "keeper" (although I guess that definition is also going away and/or changing).

aaronc · 2020-12-03T14:13:20Z

I think I explained that it shouldn't be part of the state machine @alexanderbez. It's just for clients. But... maybe an x/schema module that just serves up schemas for clients to query does make sense. Don't know if it would actually need to keep the schemas in the state store... probably not unless there is a use case.

alexanderbez · 2020-12-03T14:25:14Z

Right, no need for it to be a module then, right? Also, will it fulfill the module interface(s)?

aaronc · 2020-12-03T14:50:54Z

Well, if we had schemas do you think clients would ever want to make proofs about them? If not, then there's no need for state storage. But, it might register a gRPC query service which might be easiest to wire up as a module in RegisterServices. There is an alternative though where we just start with .yaml files on disk that clients could consume as they choose...

robert-zaremba · 2020-12-04T09:21:02Z

I'm for starting without state machine integration. The use case is clearly for off-chain applications. If one will need to validate data, it will need to have a data proof (schema won't help). In the current storage design, these proofs change between blocks.

I don't see any advantage at the moment of keeping it in files versus defining it directly in in Go file. In the future, if a single module has implementation in different languages, that could make sense.

i-norden · 2020-12-10T19:43:49Z

After thinking about this some more, one other thing I think we should add is an indication of the underlying data structure- e.g. IAVL- so that we know how to reconstitute (and IPLD-ize) it from all the key-value pairs we listen to.

Somewhat related, looking ahead there are a few approaches to consider for supporting the arbitrary protobuf types in IPLD

Leave them undefined, e.g. for IAVL just define a multicodec packed content type for IAVL nodes and leave the content type of the stored values unspecified. For eth's state trie, for example, there is a codec to specify trie nodes and there is also a codec to specify state account as the state value type. But for the storage trie there is only a codec for the trie node since the storage value types are indeterminate.
Define a new composite multicodec, where the prefix byte specifies protobuf and suffix bytes specify specific protobuf types that have been registered
Define a new multicodec which specifies that the first x bytes of the hash-linked object are another CID/multihash which references the .proto message definition that the remaining bytes should be unpacked into

3 is the most elegant way of supporting things, I think, but it would get very complicated for proto types which import from other proto packages. 2 is clunky but would be the more idiomatic way to support various types. 1 is the simplest approach and its precedent is set by how the eth storage trie is supported, but it doesn't leverage protobuf like we could.

In that third case, it would be useful to be able to retrieve the complete .proto message definitions for ValueProtoType from the schema api. But I'm not sure how this is tenable for proto types that import other proto packages.

Off in dream land now, but I think in an ideal world there'd be a protobuf compiler plugin that allowed .proto files to import other proto packages by CID, then we could use IPFS as a hash-linked registry for every protobuf type and (3) would be completely feasible.

robert-zaremba · 2021-01-08T16:36:03Z

@i-norden , could you have a look at Canonical ID section of #7100 (comment) ?
This week I was thinking if we can solve this issue without introducing a schema module. Specifically, create more manageable storage keys. Would be great to hear your feedback.

robert-zaremba · 2021-01-08T16:39:47Z

Also, for a general note, our approach for the future of storage is to get rid of IAVL.

aaronc · 2021-04-21T15:22:48Z

Replaced by #9158

This was referenced Aug 18, 2020

Store Improvements #7096

Closed

Synchronize state to an external database #7099

Closed

Table Store (aka ORM) Package #7098

Closed

aaronc mentioned this issue Oct 14, 2020

Store Improvements #7539

Closed

UnitylChaos mentioned this issue Jan 8, 2021

What to do about IAVL? #7100

Closed

aaronc changed the title ~~Schema Module~~ Schema Descriptors Jan 8, 2021

i-norden mentioned this issue Mar 31, 2021

feat: ADR-038 Part 2: StreamingService interface, file writing implementation, and configuration #8664

Merged

9 tasks

aaronc closed this as completed Apr 21, 2021

i-norden mentioned this issue Sep 6, 2022

Finalize spec for arbitrary protobuf support vulcanize/go-codec-dagcosmos#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema Descriptors #7097

Schema Descriptors #7097

aaronc commented Aug 18, 2020 •

edited

Loading

i-norden commented Dec 2, 2020 •

edited

Loading

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

i-norden commented Dec 2, 2020 •

edited

Loading

aaronc commented Dec 2, 2020

alexanderbez commented Dec 3, 2020

aaronc commented Dec 3, 2020 •

edited

Loading

alexanderbez commented Dec 3, 2020

aaronc commented Dec 3, 2020

robert-zaremba commented Dec 4, 2020

i-norden commented Dec 10, 2020 •

edited

Loading

robert-zaremba commented Jan 8, 2021

robert-zaremba commented Jan 8, 2021

aaronc commented Apr 21, 2021

Schema Descriptors #7097

Schema Descriptors #7097

Comments

aaronc commented Aug 18, 2020 • edited Loading

Summary

Problem Definition

Proposal

i-norden commented Dec 2, 2020 • edited Loading

aaronc commented Dec 2, 2020

aaronc commented Dec 2, 2020

i-norden commented Dec 2, 2020 • edited Loading

aaronc commented Dec 2, 2020

alexanderbez commented Dec 3, 2020

aaronc commented Dec 3, 2020 • edited Loading

alexanderbez commented Dec 3, 2020

aaronc commented Dec 3, 2020

robert-zaremba commented Dec 4, 2020

i-norden commented Dec 10, 2020 • edited Loading

robert-zaremba commented Jan 8, 2021

robert-zaremba commented Jan 8, 2021

aaronc commented Apr 21, 2021

aaronc commented Aug 18, 2020 •

edited

Loading

i-norden commented Dec 2, 2020 •

edited

Loading

i-norden commented Dec 2, 2020 •

edited

Loading

aaronc commented Dec 3, 2020 •

edited

Loading

i-norden commented Dec 10, 2020 •

edited

Loading