Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: indexer base types #20629

Merged
merged 25 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
76f6f32
feat: indexer base types
aaronc Jun 11, 2024
63aeb85
WIP on tests
aaronc Jun 11, 2024
216e8f8
update listener
aaronc Jun 12, 2024
b9fb6c9
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/index…
aaronc Jun 13, 2024
663ed17
rename column to field
aaronc Jun 13, 2024
4311357
delete code, simplify
aaronc Jun 13, 2024
c52655a
add error return
aaronc Jun 13, 2024
46669d3
remove ability to filter subscribed modules - this is a bit dangerous
aaronc Jun 13, 2024
0a47c39
add docs about fields
aaronc Jun 13, 2024
7fd604f
update table and entity language to object
aaronc Jun 13, 2024
4a00094
rename to type
aaronc Jun 13, 2024
0c7f529
add CHANGELOG.md
aaronc Jun 13, 2024
408ddc4
add DecodableModule interface
aaronc Jun 13, 2024
599e7cf
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/index…
aaronc Jun 14, 2024
bc98756
make compatible with go 1.12
aaronc Jun 14, 2024
68d0afc
remove CommitCatchupSync - catch-up design in flux, may be premature …
aaronc Jun 17, 2024
dba4b3c
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/index…
aaronc Jun 17, 2024
6bb9c48
WIP on mermaid
aaronc Jun 17, 2024
4d6e54d
mermaid sequence diagram
aaronc Jun 17, 2024
fcaa9b9
update listener docs
aaronc Jun 17, 2024
313a778
cleanup
aaronc Jun 17, 2024
ae6d861
Update indexer/base/README.md
aaronc Jun 17, 2024
8c0ac57
Update indexer/base/README.md
aaronc Jun 17, 2024
78155c6
gofumpt
aaronc Jun 17, 2024
c65a94a
spelling
aaronc Jun 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions indexer/base/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<!--
Guiding Principles:

Changelogs are for humans, not machines.
There should be an entry for every single version.
The same types of changes should be grouped.
Versions and sections should be linkable.
The latest version comes first.
The release date of each version is displayed.
Mention whether you follow Semantic Versioning.

Usage:

Change log entries are to be added to the Unreleased section under the
appropriate stanza (see below). Each entry should ideally include a tag and
the Github issue reference in the following format:

* (<tag>) \#<issue-number> message

The issue numbers will later be link-ified during the release process so you do
not have to worry about including a link manually, but you can if you wish.

Types of changes (Stanzas):

"Features" for new features.
"Improvements" for changes in existing functionality.
"Deprecated" for soon-to-be removed features.
"Bug Fixes" for any bug fixes.
"Client Breaking" for breaking Protobuf, gRPC and REST routes used by end-users.
"CLI Breaking" for breaking CLI commands.
"API Breaking" for breaking exported APIs used by developers building on SDK.
Ref: https://keepachangelog.com/en/1.0.0/
-->

# Changelog

## [Unreleased]
32 changes: 32 additions & 0 deletions indexer/base/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Indexer Base

The indexer base module is designed to provide a stable, zero-dependency base layer for the built-in indexer functionality. Packages that integrate with the indexer should feel free to depend on this package without fear of any external dependencies being pulled in.

The basic types for specifying index sources, targets and decoders are provided here. An indexing source should accept a `Listener` instance and invoke the provided callbacks in the correct order. An indexer should provide a `Listener` instance and perform indexing operations based on the data passed to its callbacks. A module that exposes logical updates in the form of `ObjectUpdate`s should implement the `IndexableModule` interface.
aaronc marked this conversation as resolved.
Show resolved Hide resolved

## `Listener` Callback Order

`Listener` callbacks should be called in this order

```mermaid
sequenceDiagram
actor Source
participant Indexer
Source ->> Indexer: Initialize
Source -->> Indexer: InitializeModuleSchema
loop Block
Source ->> Indexer: StartBlock
Source ->> Indexer: OnBlockHeader
Source -->> Indexer: OnTx
Source -->> Indexer: OnEvent
Source -->> Indexer: OnKVPair
Source -->> Indexer: OnObjectUpdate
Source ->> Indexer: Commit
end
```

`Initialize` must be called before any other method and should only be invoked once. `InitializeModuleSchema` should be called at most once for every module with logical data.

Sources will generally only call `InitializeModuleSchema` and `OnObjectUpdate` if they have native logical decoding capabilities. Usually, the indexer framework will provide this functionality based on `OnKVPair` data and `IndexableModule` implementations.

`StartBlock` and `OnBlockHeader` should be called only once at the beginning of a block, and `Commit` should be called only once at the end of a block. The `OnTx`, `OnEvent`, `OnKVPair` and `OnObjectUpdate` must be called after `OnBlockHeader`, may be called multiple times within a block and indexers should not assume that the order is logical unless `InitializationData.HasEventAlignedWrites` is true.
26 changes: 26 additions & 0 deletions indexer/base/decoder.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package indexerbase

// DecodableModule is an interface that modules can implement to provide a ModuleDecoder.
// Usually these modules would also implement appmodule.AppModule, but that is not included
// to keep this package free of any dependencies.
type DecodableModule interface {
// ModuleDecoder returns a ModuleDecoder for the module.
ModuleDecoder() (ModuleDecoder, error)
}

// ModuleDecoder is a struct that contains the schema and a KVDecoder for a module.
type ModuleDecoder struct {
// Schema is the schema for the module.
Schema ModuleSchema

// KVDecoder is a function that decodes a key-value pair into an ObjectUpdate.
// If modules pass logical updates directly to the engine and don't require logical decoding of raw bytes,
// then this function should be nil.
KVDecoder KVDecoder
}

// KVDecoder is a function that decodes a key-value pair into an ObjectUpdate.
// If the KV-pair doesn't represent an object update, the function should return false
// as the second return value. Error should only be non-nil when the decoder expected
// to parse a valid update and was unable to.
type KVDecoder = func(key, value []byte) (ObjectUpdate, bool, error)
Comment on lines +22 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify behavior and error handling in KVDecoder function.

The KVDecoder function type's documentation could be improved by explicitly stating what constitutes a "valid update" and under what conditions error should be non-nil. This will enhance understanding and correct usage of this function across different modules.

// KVDecoder is a function that decodes a key-value pair into an ObjectUpdate.
// If the KV-pair doesn't represent an object update, the function should return false
// as the second return value. Error should only be non-nil when the decoder expected
// to parse a valid update and was unable to.
+ // A "valid update" refers to a key-value pair that conforms to the expected schema and data format for the module.
+ // Errors should be returned if the key-value pair is malformed or if essential data is missing, making decoding impossible.
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// KVDecoder is a function that decodes a key-value pair into an ObjectUpdate.
// If the KV-pair doesn't represent an object update, the function should return false
// as the second return value. Error should only be non-nil when the decoder expected
// to parse a valid update and was unable to.
type KVDecoder = func(key, value []byte) (ObjectUpdate, bool, error)
// KVDecoder is a function that decodes a key-value pair into an ObjectUpdate.
// If the KV-pair doesn't represent an object update, the function should return false
// as the second return value. Error should only be non-nil when the decoder expected
// to parse a valid update and was unable to.
// A "valid update" refers to a key-value pair that conforms to the expected schema and data format for the module.
// Errors should be returned if the key-value pair is malformed or if essential data is missing, making decoding impossible.
type KVDecoder = func(key, value []byte) (ObjectUpdate, bool, error)

10 changes: 10 additions & 0 deletions indexer/base/enum.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
package indexerbase

// EnumDefinition represents the definition of an enum type.
type EnumDefinition struct {
// Name is the name of the enum type.
Name string

// Values is a list of distinct values that are part of the enum type.
Values []string
}
19 changes: 19 additions & 0 deletions indexer/base/field.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package indexerbase

// Field represents a field in an object type.
type Field struct {
// Name is the name of the field.
Name string

// Kind is the basic type of the field.
Kind Kind

// Nullable indicates whether null values are accepted for the field.
Nullable bool

// AddressPrefix is the address prefix of the field's kind, currently only used for Bech32AddressKind.
AddressPrefix string

// EnumDefinition is the definition of the enum type and is only valid when Kind is EnumKind.
EnumDefinition EnumDefinition
}
7 changes: 7 additions & 0 deletions indexer/base/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module cosmossdk.io/indexer/base
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe just call this cosmossdk.io/indexer? I'm not sure what package would be simply cosmossdk.io/indexer if it's not this one, although maybe good to leave as base to communicate the intent of a base package clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works, no need to change


// NOTE: this go.mod should have zero dependencies and remain on go 1.12 to stay compatible
// with all known production releases of the Cosmos SDK. This is to ensure that all historical
// apps could be patched to support indexing if desired.

go 1.12
Empty file added indexer/base/go.sum
Empty file.
82 changes: 82 additions & 0 deletions indexer/base/kind.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
package indexerbase

// Kind represents the basic type of a field in an object.
// Each kind defines the types of go values which should be accepted
// by listeners and generated by decoders when providing entity updates.
type Kind int

const (
// InvalidKind indicates that an invalid type.
InvalidKind Kind = iota

// StringKind is a string type and values of this type must be of the go type string
// or implement fmt.Stringer().
StringKind

// BytesKind is a bytes type and values of this type must be of the go type []byte.
BytesKind

// Int8Kind is an int8 type and values of this type must be of the go type int8.
Int8Kind

// Uint8Kind is a uint8 type and values of this type must be of the go type uint8.
Uint8Kind

// Int16Kind is an int16 type and values of this type must be of the go type int16.
Int16Kind

// Uint16Kind is a uint16 type and values of this type must be of the go type uint16.
Uint16Kind

// Int32Kind is an int32 type and values of this type must be of the go type int32.
Int32Kind

// Uint32Kind is a uint32 type and values of this type must be of the go type uint32.
Uint32Kind

// Int64Kind is an int64 type and values of this type must be of the go type int64.
Int64Kind

// Uint64Kind is a uint64 type and values of this type must be of the go type uint64.
Uint64Kind

// IntegerKind represents an arbitrary precision integer number. Values of this type must
// be of the go type int64, string or a type that implements fmt.Stringer with the resulted string
// formatted as an integer number.
IntegerKind

// DecimalKind represents an arbitrary precision decimal or integer number. Values of this type
// must be of the go type string or a type that implements fmt.Stringer with the resulting string
// formatted as decimal numbers with an optional fractional part. Exponential E-notation
// is supported but NaN and Infinity are not.
DecimalKind

// BoolKind is a boolean type and values of this type must be of the go type bool.
BoolKind

// TimeKind is a time type and values of this type must be of the go type time.Time.
TimeKind

// DurationKind is a duration type and values of this type must be of the go type time.Duration.
DurationKind

// Float32Kind is a float32 type and values of this type must be of the go type float32.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we want to support this day0 considering they're not used in the state machine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm thinking better to be thorough, but I don't have a strong opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of cosmwasm i belive they have this type and would want to add it if they were to write a decoder

Float32Kind

// Float64Kind is a float64 type and values of this type must be of the go type float64.
Float64Kind

// Bech32AddressKind is a bech32 address type and values of this type must be of the go type string or []byte
// or a type which implements fmt.Stringer. Fields of this type are expected to set the AddressPrefix field
// in the field definition to the bech32 address prefix.
Bech32AddressKind
Dismissed Show dismissed Hide dismissed

// EnumKind is an enum type and values of this type must be of the go type string or implement fmt.Stringer.
// Fields of this type are expected to set the EnumDefinition field in the field definition to the enum
// definition.
EnumKind

// JSONKind is a JSON type and values of this type can either be of go type json.RawMessage
// or any type that can be marshaled to JSON using json.Marshal.
JSONKind
)
119 changes: 119 additions & 0 deletions indexer/base/listener.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package indexerbase

import (
"encoding/json"
)

// Listener is an interface that defines methods for listening to both raw and logical blockchain data.
// It is valid for any of the methods to be nil, in which case the listener will not be called for that event.
// Listeners should understand the guarantees that are provided by the source they are listening to and
// understand which methods will or will not be called. For instance, most blockchains will not do logical
// decoding of data out of the box, so the InitializeModuleSchema and OnObjectUpdate methods will not be called.
// These methods will only be called when listening logical decoding is setup.
type Listener struct {
// Initialize is called when the listener is initialized before any other methods are called.
// The lastBlockPersisted return value should be the last block height the listener persisted if it is
// persisting block data, 0 if it is not interested in persisting block data, or -1 if it is
// persisting block data but has not persisted any data yet. This check allows the indexer
// framework to ensure that the listener has not missed blocks.
Initialize func(InitializationData) (lastBlockPersisted int64, err error)

// StartBlock is called at the beginning of processing a block.
StartBlock func(uint64) error

// OnBlockHeader is called when a block header is received.
OnBlockHeader func(BlockHeaderData) error

// OnTx is called when a transaction is received.
OnTx func(TxData) error

// OnEvent is called when an event is received.
OnEvent func(EventData) error

// OnKVPair is called when a key-value has been written to the store for a given module.
OnKVPair func(moduleName string, key, value []byte, delete bool) error

// Commit is called when state is committed, usually at the end of a block. Any
// indexers should commit their data when this is called and return an error if
// they are unable to commit.
Commit func() error

// InitializeModuleSchema should be called whenever the blockchain process starts OR whenever
// logical decoding of a module is initiated. An indexer listening to this event
// should ensure that they have performed whatever initialization steps (such as database
// migrations) required to receive OnObjectUpdate events for the given module. If the
// indexer's schema is incompatible with the module's on-chain schema, the listener should return
// an error.
InitializeModuleSchema func(module string, schema ModuleSchema) error

// OnObjectUpdate is called whenever an object is updated in a module's state. This is only called
// when logical data is available. It should be assumed that the same data in raw form
// is also passed to OnKVPair.
OnObjectUpdate func(module string, update ObjectUpdate) error
}

// InitializationData represents initialization data that is passed to a listener.
type InitializationData struct {
// HasEventAlignedWrites indicates that the blockchain data source will emit KV-pair events
// in an order aligned with transaction, message and event callbacks. If this is true
// then indexers can assume that KV-pair data is associated with these specific transactions, messages
// and events. This may be useful for indexers which store a log of all operations (such as immutable
// or version controlled databases) so that the history log can include fine grain correlation between
// state updates and transactions, messages and events. If this value is false, then indexers should
// assume that KV-pair data occurs out of order with respect to transaction, message and event callbacks -
// the only safe assumption being that KV-pair data is associated with the block in which it was emitted.
HasEventAlignedWrites bool
}

// BlockHeaderData represents the raw block header data that is passed to a listener.
type BlockHeaderData struct {
// Height is the height of the block.
Height uint64

// Bytes is the raw byte representation of the block header.
Bytes ToBytes

// JSON is the JSON representation of the block header. It should generally be a JSON object.
JSON ToJSON
}

// TxData represents the raw transaction data that is passed to a listener.
type TxData struct {
// TxIndex is the index of the transaction in the block.
TxIndex int32

// Bytes is the raw byte representation of the transaction.
Bytes ToBytes

// JSON is the JSON representation of the transaction. It should generally be a JSON object.
JSON ToJSON
}

// EventData represents event data that is passed to a listener.
type EventData struct {
// TxIndex is the index of the transaction in the block to which this event is associated.
// It should be set to a negative number if the event is not associated with a transaction.
// Canonically -1 should be used to represent begin block processing and -2 should be used to
// represent end block processing.
TxIndex int32

// MsgIndex is the index of the message in the transaction to which this event is associated.
// If TxIndex is negative, this index could correspond to the index of the message in
// begin or end block processing if such indexes exist, or it can be set to zero.
MsgIndex uint32

// EventIndex is the index of the event in the message to which this event is associated.
EventIndex uint32

// Type is the type of the event.
Type string

// Data is the JSON representation of the event data. It should generally be a JSON object.
Data ToJSON
}

// ToBytes is a function that lazily returns the raw byte representation of data.
type ToBytes = func() ([]byte, error)

// ToJSON is a function that lazily returns the JSON representation of data.
type ToJSON = func() (json.RawMessage, error)
7 changes: 7 additions & 0 deletions indexer/base/module_schema.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package indexerbase

// ModuleSchema represents the logical schema of a module for purposes of indexing and querying.
type ModuleSchema struct {
// ObjectTypes describe the types of objects that are part of the module's schema.
ObjectTypes []ObjectType
}
23 changes: 23 additions & 0 deletions indexer/base/object_type.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package indexerbase

// ObjectType describes an object type a module schema.
type ObjectType struct {
// Name is the name of the object.
Name string

// KeyFields is a list of fields that make up the primary key of the object.
// It can be empty in which case indexers should assume that this object is
// a singleton and ony has one value.
KeyFields []Field

// ValueFields is a list of fields that are not part of the primary key of the object.
// It can be empty in the case where all fields are part of the primary key.
ValueFields []Field

// RetainDeletions is a flag that indicates whether the indexer should retain
// deleted rows in the database and flag them as deleted rather than actually
// deleting the row. For many types of data in state, the data is deleted even
// though it is still valid in order to save space. Indexers will want to have
// the option of retaining such data and distinguishing from other "true" deletions.
RetainDeletions bool
}
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading