Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update "how it works" article with an explanation about historical backfill #431

Closed
bucanero opened this issue Nov 28, 2023 · 1 comment · Fixed by near/docs#1599
Closed
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@bucanero
Copy link
Contributor

bucanero commented Nov 28, 2023

related to #395

Explain the historical backfill process of QueryAPI

@bucanero bucanero self-assigned this Nov 28, 2023
@bucanero bucanero added the documentation Improvements or additions to documentation label Nov 28, 2023
@bucanero bucanero linked a pull request Nov 28, 2023 that will close this issue
@morgsmccauley
Copy link
Collaborator

morgsmccauley commented Nov 29, 2023

When an indexer is created, two processes are triggered:

  • real-time - starts from the block the indexer was registered X, and will execute the indexer function on every matching block from there on
  • historical - starts from the configured start_from_block height, and will execute the indexer function for all matching blocks up until X

For historical, this can be broken down in to two parts: indexed, and unindexed blocks. These may not be the most suitable names, but it is what they are called in code.

Unindexed Indexed Blocks come from the near-delta-lake bucket in S3. This bucket is populated via a DataBricks job which streams blocks from NEAR Lake, and for every account, stores the block heights which contain transactions made against them. This data allows us to quickly fetch a list of block heights which match the contract ID defined on the Indexer, rather than doing filtering through all blocks ourselves.

NEAR Delta Lake is not updated in real time, so for the historical process to close the gap between it and the starting point of the real-time process, it must also manually process the remaining blocks. This is the 'unindexed' portion of the backfill.


This will slightly change with the introduction of the control plane work. Rather than having two separate real-time and historical processes which run concurrently, we will have a single sequential process. It is essentially the 'Historical' process, expect that the 'unindexed' portion does not stop, and it continues indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants