-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: in-process off chain indexing #20352
Labels
Comments
tac0turtle
added
T: Client UX
T:Epic
Epics
and removed
needs-triage
Issue that needs to be triaged
labels
May 11, 2024
2 tasks
12 tasks
This was referenced Jun 11, 2024
Closed
12 tasks
12 tasks
Merged
12 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Indexing data from a chain allows teams to build complex front ends that are not limited based on the nodes performance. We have seen data teams spend countless hours building complex systems allowing them to build front ends.
State streaming is a good step towards allowing teams to build off chain indexes. It has its limitations. State streaming is not a first class citizen forcing off chain actors to need to decode data. This leads to complex software being built.
lastly the state machine is creating countless more writes which are needed for querying. This increases the amount of io a state machine does. In order to reduce over head, create a more performant state machine it should only hold the state needed for going to the next block. Extra information for queries should be handled with a in process off chain indexer.
This epic proposes changes to the state machine and the creation of an in process off chain indexer allowing users to build more complex applications without being prohibited by maintaining complex pieces of software.
The feature should have a plugin based system allowing teams to extend the indexing functionality to create a richer schema than the default which will be offered by the cosmos sdk team.
There are a few things to be aware of. The state machine has a differentiation between deleted data and pruned data. Deleted data refers to the removal of data due to an action. Pruning of data within in the state machine refers to data that is not needed for the state machine to continue and is removed but it is useful for users to know this information later on.
Problem Definition
Indexing of state events and blocks is a complex process with countless steps needed in order to get enough information to build complex applications.
state streaming is not a first class citizen within the software forcing users to decode the data received.
the state machine is storing more data than it needs to due to queries. Reducing h to e amount of data the state machine stores allows the state machine to have less io there fore be more performant.
Work Breakdown
BaseApp/Server integration
collections
IntegrationHasSchemaCodec
with each collectionsKeyCodec
andValueCodec
in bothcollections
and the SDK. There is a fallback implementation ifHasSchemaCodec
isn't implemented but we should avoid relying on thatschema.ReferenceType
s being returned incollections/codec.SchemaCodec
(basically we just need to add aReferencedTypes []schema.ReferenceType
field and then integrate that intoSchema.ModuleCodec
)Collection.isSecondaryIndex
method but we're not implementing it anywhere)Module Integration
Every module should have
schema.HasModuleCodec
implemented starting with:Non-collections Modules
orm
integrationEvents
cosmossdk.io/schema/testing/appdatasim
Indexing Framework Support
The
indexer.Start
method has missing support for@aaronc has unmerged code for the above in the
aaronc/indexer-manager-impl
branch (only tests are missing)Migration Support
Some of this stuff could be considered phase 2.
cosmossdk.io/schema/diff
allows diffing of different versions of schemas. In Postgres we should add support for:InitializeModuleData
is called we should save the JSON of the schema in a postgres table for that module if we are seeing the module for the first time, otherwise retrieve the existing schema and compare it with the new one for changes - initially we should reject changesALTER TABLE
statements for all compatible changes in the diff from one schema version to the nextschema.ModuleCodec
type should have some field which allows specifying indexer specific options as eitherinterface{}
orstring
Phase 2
A mirror of
cosmossdk.io/schema
is being built in Rust inixc_schema
. Some additional types are being added there, proposed on the golang side in #21482. Once that work is further along we will want to make sure the Rust and Golang schema packages have parity. There will need a native schema wire format for indexing crosslang modules (also proposed in #21482). We are also talking about a native proto -> schema mapping on the Rust side which we may (or may not) also want to port over to go. We have also talked about and specified a native JSON encoding which could be used for genesis or even signing.ixc_schema
(additionalStruct
,OneOf
, etc. types)The text was updated successfully, but these errors were encountered: