Skip to content

chaoyaji-cb/chainstorage

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents generated with DocToc

Overview

ChainStorage is inspired by the Change Data Capture paradigm, commonly used in the big data world. It continuously replicates the changes (i.e. new blocks) on the blockchain, and acts like a distributed file system for the blockchain.

It aims to provide an efficient and flexible way to access the on-chain data:

  • Efficiency is optimized by storing data in horizontally-scalable storage with a key-value schema. At Coinbase's production environment, ChainStorage can serve up to 1,500 blocks per second, enabling teams to build various indexers in a cost-effective manner.
  • Flexibility is improved by decoupling data interpretation from data ingestion. ChainStorage stores the raw data and the parsing is deferred until the data is consumed. The parsers are shipped as part of the SDK and run on the consumer side. Thanks to the ELT (Extract, Load, Transform) architecture, we can easily iterate on the parser without ingesting the data from blockchain again.

Quick Start

Make sure your local go version is 1.18 by running the following commands:

brew install [email protected]
brew unlink go
brew link [email protected]

brew install [email protected]
brew unlink protobuf
brew link protobuf

To set up for the first time (only done once):

make bootstrap

Rebuild everything:

make build

Configuration

Environment Variables

ChainStorage depends on the following environment variables to resolve the path of the configuration. The directory structure is as follows: config/{namespace}/{blockchain}/{network}/{environment}.yml.

  • CHAINSTORAGE_NAMESPACE: A {namespace} is logical grouping of several services, each of which manages its own blockchain and network. The default namespace is chainstorage. To deploy a different namespace, set the env var to the name of a subdirectory of ./config.
  • CHAINSTORAGE_CONFIG: This env var, in the format of {blockchain}-{network}, determines the blockchain and network managed by the service. The naming is defined in c3/common.
  • CHAINSTORAGE_ENVIRONMENT: This env var controls the {environment} in which the service is deployed. Possible values include production , development, and local (which is also the default value).

Creating New Configurations

Every new asset in ChainStorage consists of ChainStorage configuration files. These configuration files are generated from .template.yml template files using:

make config

these templates will be under a directory dedicated to storing the config templates in a structure that mirrors the final config structure of the config directories. All configurations from this directory will be generated within the final respective config directories

Template Format and Inheritance

Configuration templates are composable and inherit configuration properties from "parent templates", which can be defined in base.template.yml, local.template.yml, development.template.yml, and production.template.yml. These parent templates are merged into the final blockchain and network specific base.template.yml, local.template.yml, development.template.yml, production.template.yml configurations respectively.

In the following example, config/chainstorage/ethereum/mainnet/base.yml inherits from config_templates/base.template.yml and config_templates/chainstorage/ethereum/mainnet/base.template.yml, with the latter taking precedence over the former.

config
  chainstorage
    ethereum
      mainnet
        base.yml
        development.yml
        local.yml
        production.yml
config_templates
  chainstorage
    ethereum
      mainnet
        base.template.yml
        development.template.yml
        local.template.yml
        production.template.yml
    base.template.yml
    development.template.yml
    local.template.yml
    production.template.yml

The template language supports string substitution for the Config-Name and Environment using the {{, }} tags.

Example:

foo: {{blockchain}}-{{network}}-{{environment}}

The blockchain, {{blockchain}}, network, {{network}}, and environment, {{environment}} template variables are derived from the directory and file naming schemes associated with cloud and ChainStorage configurations.

Endpoint Group

Endpoint group is an abstraction for one or more JSON-RPC endpoints. EndpointProvider uses the endpoint_group config to implement client-side routing to the node provider.

ChainStorage utilizes two endpoint groups to speed up data ingestion:

  • master: This endpoint group is used to resolve the canonical chain and determine what blocks to ingest next. Typically, sticky session is turned on for this group to ensure stronger data consistency between the requests.
  • slave: This endpoint group is used to ingest the data from the blockchain. During data ingestion, the new blocks are ingested in parallel and out of order. Typically, the endpoints are selected in a round-robin fashion, but you may increase the weights to send more traffic to certain endpoints.

If your node provider, e.g. QuickNode, already has built-in load balancing, your endpoint group may contain only one endpoint, as illustrated by the following configuration:

chain:
  client:
    master:
      endpoint_group: |
        {
          "endpoints": [
            {
              "name": "quicknode-foo-bar-sticky",
              "url": "https://foo-bar.matic.quiknode.pro/****",
              "weight": 1
            }
          ],
          "sticky_session": {
            "header_hash": "x-session-hash"
          }
        }
    slave:
      endpoint_group: |
        {
          "endpoints": [
            {
              "name": "quicknode-foo-bar-round-robin",
              "url": "https://foo-bar.matic.quiknode.pro/****",
              "weight": 1
            }
          ]
        }

Overriding the Configuration

You may override any configuration using an environment variable. The environment variable should be prefixed with "CHAINSTORAGE_". For nested dictionary, use underscore to separate the keys.

For example, you may override the endpoint group config at runtime by injecting the following environment variables:

  • master: CHAINSTORAGE_CHAIN_CLIENT_MASTER_ENDPOINT_GROUP
  • slave: CHAINSTORAGE_CHAIN_CLIENT_SLAVE_ENDPOINT_GROUP

Alternatively, you may override the configuration by creating secrets.yml within the same directory. Its attributes will be merged into the runtime configuration and take the highest precedence. Note that this file may contain credentials and is excluded from check-in by .gitignore.

Command Line

the cmd/admin tool consists of multiple sub command.

admin is a utility for managing chainstorage

Usage:
  admin [command]

Available Commands:
  backfill    Backfill a block
  block       Fetch a block
  completion  Generate the autocompletion script for the specified shell
  event       tool for managing events storage
  help        Help about any command
  sdk
  validator
  workflow    tool for managing chainstorage workflows

Flags:
      --blockchain string   blockchain full name (e.g. ethereum)
      --env string          one of [local, development, production]
  -h, --help                help for admin
      --meta                output metadata only
      --network string      network name (e.g. mainnet)
      --out string          output filepath: default format is json; use a .pb extension for protobuf format
      --parser string       parser type: one of native, rosetta, or raw (default "native")

Use "admin [command] --help" for more information about a command.

All sub-commands require the blockchain, env, network flags.

Block Command

Fetch a block from ethereum mainnet:

go run ./cmd/admin block --blockchain ethereum --network mainnet --env local --height 46147

Fetch a block from ethereum goerli:

go run ./cmd/admin block --blockchain ethereum --network goerli --env local --height 46147

Backfill Command (development)

Backfill a block from BSC mainnet:

go run ./cmd/admin backfill --blockchain bsc --network mainnet --env development --start-height 10408613 --end-height 10408614

Stream Command

Stream block events from a specific event sequence id:

go run ./cmd/admin sdk stream --blockchain ethereum --network mainnet --env development --sequence 2228575 --event-tag 1

Testing

Unit Test

# Run everything
make test

# Run the blockchain package only
make test TARGET=internal/blockchain/...

Integration Test

# Run everything
make integration

# Run the storage package only
make integration TARGET=internal/storage/...

Functional Test

Before running the functional test, you need to provide the endpoint group config by creating secrets.yml. See here for more details.

# Run everything
make functional

# Run the workflow package only
make functional TARGET=internal/workflow/...

# Run TestIntegrationEthereumGetBlock only
make functional TARGET=internal/blockchain/... TEST_FILTER='TestIntegrationEthereumGetBlock$$'

# If test class implemented with test suite, add suite name before the test name
make functional TARGET=internal/blockchain/... TEST_FILTER=TestIntegrationPolygonTestSuite/TestPolygonGetBlock

Development

Running Server

Start the dockers by the docker-compose file from project root folder:

make localstack

The next step is to start the server locally:

# Ethereum Mainnet
# Use aws local stack
make server

# If want to start testnet (goerli) server
# Use aws local stack
make server CHAINSTORAGE_CONFIG=ethereum_goerli

AWS localstack

Check S3 files:

aws s3 --no-sign-request --region local --endpoint-url http://localhost:4566 ls --recursive cba-chainstore-eth-dev/

Check DynamoDB rows:

aws dynamodb --no-sign-request --region local --endpoint-url http://localhost:4566 scan --table-name cba_chainstore_blocks_eth_main

Check DLQ:

aws sqs --no-sign-request --region local --endpoint-url http://localhost:4566/000000000000/cba_chainstore_blocks_eth_main_dlq receive-message --queue-url "http://localhost:4566/000000000000/cba_chainstore_blocks_eth_main_dlq" --max-number-of-messages 10 --visibility-timeout 2

Temporal Workflow

Open Temporal UI in a browser by entering the URL: http://localhost:8088/namespaces/chainstorage-ethereum-mainnet/workflows

Start the backfill workflow:

go run ./cmd/admin workflow start --workflow backfiller --input '{"StartHeight": 11000000, "EndHeight": 11000100, "NumConcurrentExtractors": 24}' --blockchain ethereum --network mainnet --env local

Start the benchmarker workflow:

go run ./cmd/admin workflow start --workflow benchmarker --input '{"StartHeight": 1, "EndHeight": 12000000, "NumConcurrentExtractors": 24, "StepSize":1000000, "SamplesToTest":500}' --blockchain ethereum --network mainnet --env local

Start the monitor workflow:

go run ./cmd/admin workflow start --workflow monitor --input '{"StartHeight": 0, "Tag": 0}' --blockchain ethereum --network mainnet --env local

Start the poller workflow:

go run ./cmd/admin workflow start --workflow poller --input '{"Tag": 0, "MaxBlocksToSync": 200, "Parallelism":32}' --blockchain ethereum --network mainnet --env local

NOTE: the recommended value for "parallelism" depend on the capacity of your node provider. If you are not sure what value should be used, just drop it from the command.

Start the streamer workflow:

go run ./cmd/admin workflow start --workflow streamer --input '{}' --blockchain ethereum --network goerli --env local

Stop the monitor workflow:

go run ./cmd/admin workflow stop --workflow monitor --input '' --blockchain ethereum --network mainnet --env local

Checking Workflow Statuses

Install tctl, it is a command-line tool that you can use to interact with a Temporal cluster. More info can be found here: https://docs.temporal.io/tctl/

brew install tctl

APIs

# local
grpcurl --plaintext localhost:9090 coinbase.chainstorage.ChainStorage/GetLatestBlock
grpcurl --plaintext -d '{"start_height": 0, "end_height": 10}' localhost:9090 coinbase.chainstorage.ChainStorage/GetBlockFilesByRange
grpcurl --plaintext -d '{"sequence_num": 2223387}' localhost:9090 coinbase.chainstorage.ChainStorage/StreamChainEvents
grpcurl --plaintext -d '{"initial_position_in_stream": "EARLIEST"}' localhost:9090 coinbase.chainstorage.ChainStorage/StreamChainEvents
grpcurl --plaintext -d '{"initial_position_in_stream": "LATEST"}' localhost:9090 coinbase.chainstorage.ChainStorage/StreamChainEvents
grpcurl --plaintext -d '{"initial_position_in_stream": "13222054"}' localhost:9090 coinbase.chainstorage.ChainStorage/StreamChainEvents

SDK

Chainstorage also provides SDK, and you can find supported methods here

Note:

  • GetBlocksByRangeWithTag is not equivalent to the batch version of GetBlockWithTag since you don't have a way to specify the block hash. So when you use GetBlocksByRangeWithTag and if it goes beyond the current tip of chain due to reorg, you'll get back the FailedPrecondition error because it exceeds the latest watermark.

    In conclusion, it's safe to use GetBlocksByRangeWithTag for backfilling since the reorg will not happen for past blocks, however, you'd be suggested to use GetBlockWithTag for recent blocks (e.g. streaming case).

Examples

See below for a few examples for implementing a simple indexer using the SDK. Note that the examples are provided in increasing complexity.

Batch

In this example, we use the blocks API to fetch the confirmed blocks as follows:

  1. Fetch the maximum reorg distance (irreversibleDistance).
  2. Fetch the latest block height (latest).
  3. Poll for new blocks from the checkpoint up to the latest confirmed block (latest - irreversibleDistance). using GetBlocksByRange.
  4. Update the checkpoint.
  5. Repeat above steps periodically.
export CHAINSTORAGE_SDK_AUTH_HEADER=cb-nft-api-token
export CHAINSTORAGE_SDK_AUTH_TOKEN=****
go run ./examples/batch

Stream

This example demonstrates how to stream the latest blocks and handle chain reorgs. The worker processes the events sequentially and relies on BlockchainEvent_Type to construct the canonical chain. For example, given +1, +2, +3, -3, -2, +2', +3' as the events, the canonical chain would be +1, +2', +3'.

export CHAINSTORAGE_SDK_AUTH_HEADER=cb-nft-api-token
export CHAINSTORAGE_SDK_AUTH_TOKEN=****
go run ./examples/stream

Unified

The last example showcases how to turn the data processing into an embarrassingly parallel problem by leveraging the mono-increasing sequence number. In this example, though the events are processed in parallel and out of order, the logical ordering guarantee is preserved.

  1. Download, say 10k events, using GetChainEvents. Note that this API is non-blocking, and it returns all the available events if the requested amount is not available. This enables us to unify batch and stream processing.
  2. Break down 10k events into small batches, e.g. 20 events/batch.
  3. Distribute those batches to a number of workers for parallel processing. Note that this step is not part of the example.
  4. For events in each batch, it can be processed either sequentially or in parallel using GetBlockWithTag.
  5. Implement versioning using the mono-increasing sequence numbers provided by the events. See here for more details.
  6. Update watermark once all the batches have been processed.
  7. Repeat above steps.
export CHAINSTORAGE_SDK_AUTH_HEADER=cb-nft-api-token
export CHAINSTORAGE_SDK_AUTH_TOKEN=****
go run ./examples/unified

Public APIs

The ChainStorage APIs are in beta preview. Note that the APIs are currently exposed as restful APIs through grpc transcoding. Please refer to the proto file for the data schema.

See below for a few examples.

export CHAINSTORAGE_SDK_AUTH_TOKEN=****

curl -s -X POST \
  -H "content-type: application/json" \
  -H "cb-nft-api-token: ${CHAINSTORAGE_SDK_AUTH_TOKEN}" \
  https://nft-api.coinbase.com/api/exp/chainstorage/ethereum/mainnet/v1/coinbase.chainstorage.ChainStorage/GetLatestBlock | jq

curl -s -X POST \
  -H "content-type: application/json" \
  -H "cb-nft-api-token: ${CHAINSTORAGE_SDK_AUTH_TOKEN}" \
  -d '{"height": 16000000}' \
  https://nft-api.coinbase.com/api/exp/chainstorage/ethereum/mainnet/v1/coinbase.chainstorage.ChainStorage/GetNativeBlock | jq

curl -s -X POST \
  -H "content-type: application/json" \
  -H "cb-nft-api-token: ${CHAINSTORAGE_SDK_AUTH_TOKEN}" \
  -d '{"start_height": 16000000, "end_height": 16000005}' \
  https://nft-api.coinbase.com/api/exp/chainstorage/ethereum/mainnet/v1/coinbase.chainstorage.ChainStorage/GetNativeBlocksByRange | jq

Contact Us

We will set up a discord server soon. Stay tuned!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 99.8%
  • Other 0.2%