From c983ff133f915f464cf77677614625f9e8fdf55e Mon Sep 17 00:00:00 2001 From: Ian Norden Date: Thu, 27 Aug 2020 11:19:52 -0500 Subject: [PATCH] update docs --- README.md | 137 +++++++----------- documentation/apis.md | 258 ---------------------------------- documentation/architecture.md | 132 ----------------- documentation/resync.md | 70 --------- documentation/watcher.md | 16 --- 5 files changed, 54 insertions(+), 559 deletions(-) delete mode 100644 documentation/apis.md delete mode 100644 documentation/architecture.md delete mode 100644 documentation/resync.md delete mode 100644 documentation/watcher.md diff --git a/README.md b/README.md index 306ed2a13..2cbac0ff3 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,22 @@ -# ipfs-blockchain-watcher +# ipld-eth-indexer -[![Go Report Card](https://goreportcard.com/badge/github.com/vulcanize/ipfs-blockchain-watcher)](https://goreportcard.com/report/github.com/vulcanize/ipfs-blockchain-watcher) +[![Go Report Card](https://goreportcard.com/badge/github.com/vulcanize/ipld-eth-indexer)](https://goreportcard.com/report/github.com/vulcanize/ipld-eth-indexer) -> ipfs-blockchain-watcher is used to extract, transform, and load all eth or btc data into an IPFS-backing Postgres datastore while generating useful secondary indexes around the data in other Postgres tables +> ipld-eth-indexer is used to extract, transform, and load all eth IPLD data into an IPFS-backing Postgres datastore while generating useful secondary indexes around the data in other Postgres tables ## Table of Contents 1. [Background](#background) -1. [Architecture](#architecture) 1. [Install](#install) 1. [Usage](#usage) 1. [Contributing](#contributing) 1. [License](#license) ## Background -ipfs-blockchain-watcher is a collection of interfaces that are used to extract, process, store, and index -all blockchain data in Postgres-IPFS. The raw data indexed by ipfs-blockchain-watcher serves as the basis for more specific watchers and applications. +ipld-eth-indexer is a collection of interfaces that are used to extract, transform, store, and index +all Ethereum IPLD data in Postgres. The raw data indexed by ipld-eth-indexer serves as the basis for more specific watchers and applications. Currently the service supports complete processing of all Bitcoin and Ethereum data. -## Architecture -More details on the design of ipfs-blockchain-watcher can be found in [here](./documentation/architecture.md) - ## Dependencies Minimal build dependencies * Go (1.13) @@ -37,9 +33,8 @@ Potential external dependencies ## Install 1. [Goose](#goose) 1. [Postgres](#postgres) -1. [IPFS](#ipfs) -1. [Blockchain](#blockchain) -1. [Watcher](#watcher) +1. [Ethereum](#ethereum) +1. [Indexer](#indexer) ### Goose [goose](https://github.com/pressly/goose) is used for migration management. While it is not necessary to use `goose` for manual setup, it @@ -49,7 +44,7 @@ is required for running the automated tests and is used by the `make migrate` co 1. [Install Postgres](https://wiki.postgresql.org/wiki/Detailed_installation_guides) 1. Create a superuser for yourself and make sure `psql --list` works without prompting for a password. 1. `createdb vulcanize_public` -1. `cd $GOPATH/src/github.com/vulcanize/ipfs-blockchain-watcher` +1. `cd $GOPATH/src/github.com/vulcanize/ipld-eth-indexer` 1. Run the migrations: `make migrate HOST_NAME=localhost NAME=vulcanize_public PORT=5432` - There are optional vars `USER=username:password` if the database user is not the default user `postgres` and/or a password is present - To rollback a single step: `make rollback NAME=vulcanize_public` @@ -63,24 +58,6 @@ localhost. To allow access on Ubuntu, set localhost connections via hostname, ip (It should be noted that trusted auth should only be enabled on systems without sensitive data in them: development and local test databases) -### IPFS -Data is stored in an [IPFS-backing Postgres datastore](https://github.com/ipfs/go-ds-sql). -By default data is written directly to the ipfs blockstore in Postgres; the public.blocks table. -In this case no further IPFS configuration is needed at this time. - -Optionally, ipfs-blockchain-watcher can be configured to function through an internal ipfs node interface using the flag: `-ipfs-mode=interface`. -Operating through the ipfs interface provides the option to configure a block exchange that can search remotely for IPLD data found missing in the local datastore. -This option is irrelevant in most cases and this mode has some disadvantages, namely: - -1. Environment must have IPFS configured -1. Process will contend with the lockfile at `$IPFS_PATH` -1. Publishing and indexing of data must occur in separate db transactions - -More information for configuring Postgres-IPFS can be found [here](./documentation/ipfs.md) - -### Blockchain -This section describes how to setup an Ethereum or Bitcoin node to serve as a data source for ipfs-blockchain-watcher - #### Ethereum For Ethereum, [a special fork of go-ethereum](https://github.com/vulcanize/go-ethereum/tree/statediff_at_anyblock-1.9.11) is currently *requirde*. This can be setup as follows. @@ -121,81 +98,71 @@ Also in the output will be the endpoints that will be used to interface with the The default ws url is "127.0.0.1:8546" and the default http url is "127.0.0.1:8545". These values will be used as the `ethereum.wsPath` and `ethereum.httpPath` in the config, respectively. -#### Bitcoin -For Bitcoin, ipfs-blockchain-watcher is able to operate entirely through the universally exposed JSON-RPC interfaces. -This means any of the standard full nodes can be used (e.g. bitcoind, btcd) as the data source. - -Point at a remote node or set one up locally using the instructions for [bitcoind](https://github.com/bitcoin/bitcoin) and [btcd](https://github.com/btcsuite/btcd). +### Indexer +Finally, setup the indexer process itself. -The default http url is "127.0.0.1:8332". We will use the http endpoint as both the `bitcoin.wsPath` and `bitcoin.httpPath` -(bitcoind does not support websocket endpoints, the watcher currently uses a "subscription" wrapper around the http endpoints) +Start by downloading ipld-eth-indexer and moving into the repo: -### Watcher -Finally, setup the watcher process itself. +`GO111MODULE=off go get -d github.com/vulcanize/ipld-eth-indexer` -Start by downloading ipfs-blockchain-watcher and moving into the repo: - -`GO111MODULE=off go get -d github.com/vulcanize/ipfs-blockchain-watcher` - -`cd $GOPATH/src/github.com/vulcanize/ipfs-blockchain-watcher` +`cd $GOPATH/src/github.com/vulcanize/ipld-eth-indexer` Then, build the binary: `make build` ## Usage -After building the binary, run as +After building the binary, three commands are available + +* Sync: Streams raw chain data at the head, transforms it into IPLD objects, and indexes the resulting set of CIDs in Postgres with useful metadata. + +`./ipld-eth-indexer sync --config=` + +* Backfill: Automatically searches for and detects gaps in the DB; fetches, converts, publishes, and indexes the data to fill these gaps. + +`./ipld-eth-indexer backfill --config=` + +* Resync: Manually define block ranges within which to (re)fill in data over HTTP; can be ran in parallel with non-overlapping regions to scale historical data processing + +`./ipld-eth-indexer resync --config=` -`./ipfs-blockchain-watcher watch --config=` ### Configuration -Below is the set of universal config parameters for the ipfs-blockchain-watcher command, in .toml form, with the respective environmental variables commented to the side. -This set of parameters needs to be set no matter the chain type. +Below is the set of parameters for the ipld-eth-indexer command, in .toml form, with the respective environmental variables commented to the side. +The corresponding CLI flags can be found with the `./ipld-eth-indexer {command} --help` command. ```toml [database] name = "vulcanize_public" # $DATABASE_NAME hostname = "localhost" # $DATABASE_HOSTNAME port = 5432 # $DATABASE_PORT - user = "vdbm" # $DATABASE_USER + user = "postgres" # $DATABASE_USER password = "" # $DATABASE_PASSWORD -[watcher] - chain = "bitcoin" # $SUPERNODE_CHAIN - server = true # $SUPERNODE_SERVER - ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH - wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH - httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH - sync = true # $SUPERNODE_SYNC - workers = 1 # $SUPERNODE_WORKERS - backFill = true # $SUPERNODE_BACKFILL - frequency = 45 # $SUPERNODE_FREQUENCY - batchSize = 1 # $SUPERNODE_BATCH_SIZE - batchNumber = 50 # $SUPERNODE_BATCH_NUMBER - timeout = 300 # $HTTP_TIMEOUT - validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL -``` +[log] + level = "info" # $LOGRUS_LEVEL -Additional parameters need to be set depending on the specific chain. +[sync] + workers = 4 # $SYNC_WORKERS -For Bitcoin: - -```toml -[bitcoin] - wsPath = "127.0.0.1:8332" # $BTC_WS_PATH - httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH - pass = "password" # $BTC_NODE_PASSWORD - user = "username" # $BTC_NODE_USER - nodeID = "ocd0" # $BTC_NODE_ID - clientName = "Omnicore" # $BTC_CLIENT_NAME - genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK - networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID -``` - -For Ethereum: +[backfill] + frequency = 15 # $BACKFILL_FREQUENCY + batchSize = 2 # $BACKFILL_BATCH_SIZE + workers = 4 # $BACKFILL_WORKERS + timeout = 300 # $HTTP_TIMEOUT + validationLevel = 1 # $BACKFILL_VALIDATION_LEVEL + +[resync] + type = "full" # $RESYNC_TYPE + start = 0 # $RESYNC_START + stop = 0 # $RESYNC_STOP + batchSize = 2 # $RESYNC_BATCH_SIZE + workers = 4 # $RESYNC_WORKERS + timeout = 300 # $HTTP_TIMEOUT + clearOldCache = false # $RESYNC_CLEAR_OLD_CACHE + resetValidation = false # $RESYNC_RESET_VALIDATION -```toml [ethereum] wsPath = "127.0.0.1:8546" # $ETH_WS_PATH httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH @@ -206,8 +173,12 @@ For Ethereum: chainID = "1" # $ETH_CHAIN_ID ``` +`sync`, `backfill`, and `resync` parameters are only applicable to their respective commands. + +`backfill` and `resync` requires only an `ethereum.httpPath` while `sync` requires only an `ethereum.wsPath`. + ### Exposing the data -A number of different APIs for remote access to ipfs-blockchain-watcher data can be exposed, these are discussed in more detail [here](./documentation/apis.md) +See [ipld-eth-server](https://github.com/vulcanize/ipld-eth-server) ### Testing `make test` will run the unit tests diff --git a/documentation/apis.md b/documentation/apis.md deleted file mode 100644 index 131be915b..000000000 --- a/documentation/apis.md +++ /dev/null @@ -1,258 +0,0 @@ -## ipfs-blockchain-watcher APIs -We can expose a number of different APIs for remote access to ipfs-blockchain-watcher data - - -### Table of Contents -1. [Postgraphile](#postgraphile) -1. [RPC Subscription Interface](#rpc-subscription-interface) -1. [Native API Recapitulation](#native-api-recapitulation) - - -### Postgraphile -ipfs-blockchain-watcher stores all processed data in Postgres using PG-IPFS, this includes all of the IPLD objects. -[Postgraphile](https://www.graphile.org/postgraphile/) can be used to expose GraphQL endpoints for the Postgres tables. - -e.g. - -`postgraphile --plugins @graphile/pg-pubsub --subscriptions --simple-subscriptions -c postgres://localhost:5432/vulcanize_public?sslmode=disable -s public,btc,eth -a -j` - - -This will stand up a Postgraphile server on the public, eth, and btc schemas- exposing GraphQL endpoints for all of the tables contained under those schemas. -All of their data can then be queried with standard [GraphQL](https://graphql.org) queries. - - -### RPC Subscription Interface -A direct, real-time subscription to the data being processed by ipfs-blockchain-watcher can be established over WS or IPC through the [Stream](../pkg/sync/api.go#L53) RPC method. -This method is not chain-specific and each chain-type supports it, it is accessed under the "vdb" namespace rather than a chain-specific namespace. An interface for -subscribing to this endpoint is provided [here](../pkg/client/client.go). - -When subscribing to this endpoint, the subscriber provides a set of RLP-encoded subscription parameters. These parameters will be chain-specific, and are used -by ipfs-blockchain-watcher to filter and return a requested subset of chain data to the subscriber. (e.g. [BTC](../pkg/btc/subscription_config.go), [ETH](../../pkg/eth/subscription_config.go)). - -#### Ethereum RPC Subscription -An example of how to subscribe to a real-time Ethereum data feed from ipfs-blockchain-watcher using the `Stream` RPC method is provided below - -```go - package main - - import ( - "github.com/ethereum/go-ethereum/rlp" - "github.com/ethereum/go-ethereum/rpc" - "github.com/spf13/viper" - - "github.com/vulcanize/ipfs-blockchain-watcher/pkg/client" - "github.com/vulcanize/ipfs-blockchain-watcher/pkg/eth" - "github.com/vulcanize/ipfs-blockchain-watcher/pkg/watch" - ) - - config, _ := eth.NewEthSubscriptionConfig() - rlpConfig, _ := rlp.EncodeToBytes(config) - vulcPath := viper.GetString("watcher.ethSubscription.path") - rpcClient, _ := rpc.Dial(vulcPath) - subClient := client.NewClient(rpcClient) - payloadChan := make(chan watch.SubscriptionPayload, 20000) - subscription, _ := subClient.Stream(payloadChan, rlpConfig) - for { - select { - case payload := <- payloadChan: - // do something with the subscription payload - case err := <- subscription.Err(): - // do something with the subscription error - } - } -``` - -The .toml file being used to fill the Ethereum subscription config would look something like this: - -```toml -[watcher] - [watcher.ethSubscription] - historicalData = false - historicalDataOnly = false - startingBlock = 0 - endingBlock = 0 - wsPath = "ws://127.0.0.1:8080" - [watcher.ethSubscription.headerFilter] - off = false - uncles = false - [watcher.ethSubscription.txFilter] - off = false - src = [] - dst = [] - [watcher.ethSubscription.receiptFilter] - off = false - contracts = [] - topic0s = [] - topic1s = [] - topic2s = [] - topic3s = [] - [watcher.ethSubscription.stateFilter] - off = false - addresses = [] - intermediateNodes = false - [watcher.ethSubscription.storageFilter] - off = true - addresses = [] - storageKeys = [] - intermediateNodes = false -``` - -These configuration parameters are broken down as follows: - -`ethSubscription.wsPath` is used to define the watcher ws url OR ipc endpoint to subscribe to - -`ethSubscription.historicalData` specifies whether or not ipfs-blockchain-watcher should look up historical data in its cache and -send that to the subscriber, if this is set to `false` then only newly synced/incoming data is streamed - -`ethSubscription.historicalDataOnly` will tell ipfs-blockchain-watcher to only send historical data with the specified range and -not stream forward syncing data - -`ethSubscription.startingBlock` is the starting block number for the range to receive data in - -`ethSubscription.endingBlock` is the ending block number for the range to receive data in; -setting to 0 means the process will continue streaming indefinitely. - -`ethSubscription.headerFilter` has two sub-options: `off` and `uncles`. - -- Setting `off` to true tells ipfs-blockchain-watcher to not send any headers to the subscriber -- setting `uncles` to true tells ipfs-blockchain-watcher to send uncles in addition to normal headers. - -`ethSubscription.txFilter` has three sub-options: `off`, `src`, and `dst`. - -- Setting `off` to true tells ipfs-blockchain-watcher to not send any transactions to the subscriber -- `src` and `dst` are string arrays which can be filled with ETH addresses to filter transactions for, -if they have any addresses then ipfs-blockchain-watcher will only send transactions that were sent or received by the addresses contained -in `src` and `dst`, respectively. - -`ethSubscription.receiptFilter` has four sub-options: `off`, `topics`, `contracts` and `matchTxs`. - -- Setting `off` to true tells ipfs-blockchain-watcher to not send any receipts to the subscriber -- `topic0s` is a string array which can be filled with event topics to filter for, -if it has any topics then ipfs-blockchain-watcher will only send receipts that contain logs which have that topic0. -- `contracts` is a string array which can be filled with contract addresses to filter for, if it contains any contract addresses the watcher will -only send receipts that correspond to one of those contracts. -- `matchTrxs` is a bool which when set to true any receipts that correspond to filtered for transactions will be sent by the watcher, regardless of whether or not the receipt satisfies the `topics` or `contracts` filters. - -`ethSubscription.stateFilter` has three sub-options: `off`, `addresses`, and `intermediateNodes`. - -- Setting `off` to true tells ipfs-blockchain-watcher to not send any state data to the subscriber -- `addresses` is a string array which can be filled with ETH addresses to filter state for, -if it has any addresses then ipfs-blockchain-watcher will only send state leafs (accounts) corresponding to those account addresses. -- By default ipfs-blockchain-watcher only sends along state leafs, to receive branch and extension nodes as well `intermediateNodes` can be set to `true`. - -`ethSubscription.storageFilter` has four sub-options: `off`, `addresses`, `storageKeys`, and `intermediateNodes`. - -- Setting `off` to true tells ipfs-blockchain-watcher to not send any storage data to the subscriber -- `addresses` is a string array which can be filled with ETH addresses to filter storage for, -if it has any addresses then ipfs-blockchain-watcher will only send storage nodes from the storage tries at those state addresses. -- `storageKeys` is another string array that can be filled with storage keys to filter storage data for. It is important to note that the storage keys need to be the actual keccak256 hashes, whereas -the addresses in the `addresses` fields are pre-hashed ETH addresses. -- By default ipfs-blockchain-watcher only sends along storage leafs, to receive branch and extension nodes as well `intermediateNodes` can be set to `true`. - -### Bitcoin RPC Subscription: -An example of how to subscribe to a real-time Bitcoin data feed from ipfs-blockchain-watcher using the `Stream` RPC method is provided below - -```go - package main - - import ( - "github.com/ethereum/go-ethereum/rlp" - "github.com/ethereum/go-ethereum/rpc" - "github.com/spf13/viper" - - "github.com/vulcanize/ipfs-blockchain-watcher/pkg/btc" - "github.com/vulcanize/ipfs-blockchain-watcher/pkg/client" - "github.com/vulcanize/ipfs-blockchain-watcher/pkg/watch" - ) - - config, _ := btc.NewBtcSubscriptionConfig() - rlpConfig, _ := rlp.EncodeToBytes(config) - vulcPath := viper.GetString("watcher.btcSubscription.path") - rpcClient, _ := rpc.Dial(vulcPath) - subClient := client.NewClient(rpcClient) - payloadChan := make(chan watch.SubscriptionPayload, 20000) - subscription, _ := subClient.Stream(payloadChan, rlpConfig) - for { - select { - case payload := <- payloadChan: - // do something with the subscription payload - case err := <- subscription.Err(): - // do something with the subscription error - } - } -``` - -The .toml file being used to fill the Bitcoin subscription config would look something like this: - -```toml -[watcher] - [watcher.btcSubscription] - historicalData = false - historicalDataOnly = false - startingBlock = 0 - endingBlock = 0 - wsPath = "ws://127.0.0.1:8080" - [watcher.btcSubscription.headerFilter] - off = false - [watcher.btcSubscription.txFilter] - off = false - segwit = false - witnessHashes = [] - indexes = [] - pkScriptClass = [] - multiSig = false - addresses = [] -``` - -These configuration parameters are broken down as follows: - -`btcSubscription.wsPath` is used to define the ipfs-blockchain-watcher ws url OR ipc endpoint to subscribe to - -`btcSubscription.historicalData` specifies whether or not ipfs-blockchain-watcher should look up historical data in its cache and -send that to the subscriber, if this is set to `false` then ipfs-blockchain-watcher only streams newly synced/incoming data - -`btcSubscription.historicalDataOnly` will tell ipfs-blockchain-watcher to only send historical data with the specified range and -not stream forward syncing data - -`btcSubscription.startingBlock` is the starting block number for the range to receive data in - -`btcSubscription.endingBlock` is the ending block number for the range to receive data in; -setting to 0 means the process will continue streaming indefinitely. - -`btcSubscription.headerFilter` has one sub-option: `off`. - -- Setting `off` to true tells ipfs-blockchain-watcher to -not send any headers to the subscriber. -- Additional header-filtering options will be added in the future. - -`btcSubscription.txFilter` has seven sub-options: `off`, `segwit`, `witnessHashes`, `indexes`, `pkScriptClass`, `multiSig`, and `addresses`. - -- Setting `off` to true tells ipfs-blockchain-watcher to not send any transactions to the subscriber. -- Setting `segwit` to true tells ipfs-blockchain-watcher to only send segwit transactions. -- `witnessHashes` is a string array that can be filled with witness hash string; if it contains any hashes ipfs-blockchain-watcher will only send transactions that contain one of those hashes. -- `indexes` is an int64 array that can be filled with tx index numbers; if it contains any integers ipfs-blockchain-watcher will only send transactions at those indexes (e.g. `[0]` will send only coinbase transactions) -- `pkScriptClass` is an uint8 array that can be filled with pk script class numbers; if it contains any integers ipfs-blockchain-watcher will only send transactions that have at least one tx output with one of the specified pkscript classes; -possible class types are 0 through 8 as defined [here](https://github.com/btcsuite/btcd/blob/master/txscript/standard.go#L52). -- Setting `multisig` to true tells ipfs-blockchain-watcher to send only multi-sig transactions- to send only transaction that have at least one tx output that requires more than one signature to spend. -- `addresses` is a string array that can be filled with btc address strings; if it contains any addresses ipfs-blockchain-watcher will only send transactions that have at least one tx output with at least one of the provided addresses. - - -### Native API Recapitulation: -In addition to providing novel Postgraphile and RPC-Subscription endpoints, we are working towards complete recapitulation of the -standard chain APIs. This will allow direct compatibility with software that already makes use of the standard interfaces. - -#### Ethereum JSON-RPC API -ipfs-blockchain-watcher currently faithfully recapitulates portions of the Ethereum JSON-RPC api standard. - -The currently supported endpoints include: -`eth_blockNumber` -`eth_getLogs` -`eth_getHeaderByNumber` -`eth_getBlockByNumber` -`eth_getBlockByHash` -`eth_getTransactionByHash` - -Additional endpoints will be added in the near future, with the immediate goal of recapitulating the largest set of "eth_" endpoints which can be provided as a service. - -#### Bitcoin JSON-RPC API: -In the near future, the standard Bitcoin JSON-RPC interfaces will be implemented. diff --git a/documentation/architecture.md b/documentation/architecture.md deleted file mode 100644 index ec38858ae..000000000 --- a/documentation/architecture.md +++ /dev/null @@ -1,132 +0,0 @@ -# ipfs-blockchain-watcher architecture -1. [Processes](#processes) -1. [Command](#command) -1. [Configuration](#config) -1. [Database](#database) -1. [APIs](#apis) -1. [Resync](#resync) -1. [IPFS Considerations](#ipfs-considerations) - -## Processes -ipfs-blockchain-watcher is a [service](../pkg/sync/service.go#L61) comprised of the following interfaces: - -* [Payload Fetcher](../pkg/shared/interfaces.go#L29): Fetches raw chain data from a half-duplex endpoint (HTTP/IPC), used for historical data fetching. ([BTC](../pkg/btc/payload_fetcher.go), [ETH](../pkg/eth/payload_fetcher.go)). -* [Payload Streamer](../pkg/shared/interfaces.go#L24): Streams raw chain data from a full-duplex endpoint (WebSocket/IPC), used for syncing data at the head of the chain in real-time. ([BTC](../pkg/btc/http_streamer.go), [ETH](../pkg/eth/streamer.go)). -* [Payload Converter](../pkg/shared/interfaces.go#L34): Converters raw chain data to an intermediary form prepared for IPFS publishing. ([BTC](../pkg/btc/converter.go), [ETH](../pkg/eth/converter.go)). -* [IPLD Publisher](../pkg/shared/interfaces.go#L39): Publishes the converted data to IPFS, returning their CIDs and associated metadata for indexing. ([BTC](../pkg/btc/publisher.go), [ETH](../pkg/eth/publisher.go)). -* [CID Indexer](../pkg/shared/interfaces.go#L44): Indexes CIDs in Postgres with their associated metadata. This metadata is chain specific and selected based on utility. ([BTC](../pkg/btc/indexer.go), [ETH](../pkg/eth/indexer.go)). -* [CID Retriever](../pkg/shared/interfaces.go#L54): Retrieves CIDs from Postgres by searching against their associated metadata, is used to lookup data to serve API requests/subscriptions. ([BTC](../pkg/btc/retriever.go), [ETH](../pkg/eth/retriever.go)). -* [IPLD Fetcher](../pkg/shared/interfaces.go#L62): Fetches the IPLDs needed to service API requests/subscriptions from IPFS using retrieved CIDS; can route through a IPFS block-exchange to search for objects that are not directly available. ([BTC](../pkg/btc/ipld_fetcher.go), [ETH](../pkg/eth/ipld_fetcher.go)) -* [Response Filterer](../pkg/shared/interfaces.go#L49): Filters converted data payloads served to API subscriptions; filters according to the subscriber provided parameters. ([BTC](../pkg/btc/filterer.go), [ETH](../pkg/eth/filterer.go)). -* [API](https://github.com/ethereum/go-ethereum/blob/master/rpc/types.go#L31): Expose RPC methods for clients to interface with the data. Chain-specific APIs should aim to recapitulate as much of the native API as possible. ([VDB](../pkg/api.go), [ETH](../pkg/eth/api.go)). - - -Appropriating the service for a new chain is done by creating underlying types to satisfy these interfaces for -the specifics of that chain. - -The service uses these interfaces to operate in any combination of three modes: `sync`, `serve`, and `backfill`. -* Sync: Streams raw chain data at the head, converts and publishes it to IPFS, and indexes the resulting set of CIDs in Postgres with useful metadata. -* BackFill: Automatically searches for and detects gaps in the DB; fetches, converts, publishes, and indexes the data to fill these gaps. -* Serve: Opens up IPC, HTTP, and WebSocket servers on top of the ipfs-blockchain-watcher DB and any concurrent sync and/or backfill processes. - - -These three modes are all operated through a single vulcanizeDB command: `watch` - -## Command - -Usage: `./ipfs-blockchain-watcher watch --config={config.toml}` - -Configuration can also be done through CLI options and/or environmental variables. -CLI options can be found using `./ipfs-blockchain-watcher watch --help`. - -## Config - -Below is the set of universal config parameters for the ipfs-blockchain-watcher command, in .toml form, with the respective environmental variables commented to the side. -This set of parameters needs to be set no matter the chain type. - -```toml -[database] - name = "vulcanize_public" # $DATABASE_NAME - hostname = "localhost" # $DATABASE_HOSTNAME - port = 5432 # $DATABASE_PORT - user = "vdbm" # $DATABASE_USER - password = "" # $DATABASE_PASSWORD - -[watcher] - chain = "bitcoin" # $SUPERNODE_CHAIN - server = true # $SUPERNODE_SERVER - ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH - wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH - httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH - sync = true # $SUPERNODE_SYNC - workers = 1 # $SUPERNODE_WORKERS - backFill = true # $SUPERNODE_BACKFILL - frequency = 45 # $SUPERNODE_FREQUENCY - batchSize = 1 # $SUPERNODE_BATCH_SIZE - batchNumber = 50 # $SUPERNODE_BATCH_NUMBER - timeout = 300 # $HTTP_TIMEOUT - validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL -``` - -Additional parameters need to be set depending on the specific chain. - -For Bitcoin: - -```toml -[bitcoin] - wsPath = "127.0.0.1:8332" # $BTC_WS_PATH - httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH - pass = "password" # $BTC_NODE_PASSWORD - user = "username" # $BTC_NODE_USER - nodeID = "ocd0" # $BTC_NODE_ID - clientName = "Omnicore" # $BTC_CLIENT_NAME - genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK - networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID -``` - -For Ethereum: - -```toml -[ethereum] - wsPath = "127.0.0.1:8546" # $ETH_WS_PATH - httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH - nodeID = "arch1" # $ETH_NODE_ID - clientName = "Geth" # $ETH_CLIENT_NAME - genesisBlock = "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3" # $ETH_GENESIS_BLOCK - networkID = "1" # $ETH_NETWORK_ID -``` - -## Database - -Currently, ipfs-blockchain-watcher persists all data to a single Postgres database. The migrations for this DB can be found [here](../db/migrations). -Chain-specific data is populated under a chain-specific schema (e.g. `eth` and `btc`) while shared data- such as the IPFS blocks table- is populated under the `public` schema. -Subsequent watchers which act on the raw chain data should build and populate their own schemas or separate databases entirely. - -In the future, the database architecture will be moving to a foreign table based architecture wherein a single db is used for shared data while each watcher uses -its own database and accesses and acts on the shared data through foreign tables. Isolating watchers to their own databases will prevent complications and -conflicts between watcher db migrations. - - -## APIs - -ipfs-blockchain-watcher provides mutliple types of APIs by which to interface with its data. -More detailed information on the APIs can be found [here](apis.md). - -## Resync - -A separate command `resync` is available for directing the resyncing of data within specified ranges. -This is useful if there is a need to re-validate a range of data using a new source or clean out bad/deprecated data. -More detailed information on this command can be found [here](resync.md). - -## IPFS Considerations - -Currently the IPLD Publisher and Fetcher can either use internalized IPFS processes which interface with a local IPFS repository, or can interface -directly with the backing Postgres database. -Both these options circumvent the need to run a full IPFS daemon with a [go-ipld-eth](https://github.com/ipfs/go-ipld-eth) or [go-ipld-btc](https://github.com/ipld/go-ipld-btc) plugin. -The former approach can lead to issues with lock-contention on the IPFS repo if another IPFS process is configured and running at the same $IPFS_PATH, it also necessitates the need for -a locally configured IPFS repository. The later bypasses the need for a configured IPFS repository/$IPFS_PATH and allows all Postgres write operations at a given block height -to occur in a single transaction, the only disadvantage is that by avoiding moving through an IPFS node intermediary the direct ability to reach out to the block -exchange for data not found locally is lost. - -Once go-ipld-eth and go-ipld-btc have been updated to work with a modern version of PG-IPFS, an additional option will be provided to direct -all publishing and fetching of IPLD objects through a remote IPFS daemon. \ No newline at end of file diff --git a/documentation/resync.md b/documentation/resync.md deleted file mode 100644 index b0de3c2e5..000000000 --- a/documentation/resync.md +++ /dev/null @@ -1,70 +0,0 @@ -## ipfs-blockchain-watcher resync -The `resync` command is made available for directing the resyncing of ipfs-blockchain-watcherdata within specified ranges. -It also contains a utility for cleaning out old data, and resetting the validation level of data. - -### Rational - -Manual resyncing of data can be used to re-validate data within specific ranges using a new source. - -Option to remove data may be needed for bad/deprecated data or to prepare for breaking changes to the db schemas. - -Resetting the validation level of data is useful for designating ranges of data for resyncing by an ongoing ipfs-blockchain-watcher -backfill process. - -### Command - -Usage: `./ipfs-blockchain-watcher resync --config={config.toml}` - -Configuration can also be done through CLI options and/or environmental variables. -CLI options can be found using `./ipfs-blockchain-watcher resync --help`. - -### Config - -Below is the set of universal config parameters for the resync command, in .toml form, with the respective environmental variables commented to the side. -This set of parameters needs to be set no matter the chain type. - -```toml -[database] - name = "vulcanize_public" # $DATABASE_NAME - hostname = "localhost" # $DATABASE_HOSTNAME - port = 5432 # $DATABASE_PORT - user = "vdbm" # $DATABASE_USER - password = "" # $DATABASE_PASSWORD - -[resync] - chain = "ethereum" # $RESYNC_CHAIN - type = "state" # $RESYNC_TYPE - start = 0 # $RESYNC_START - stop = 1000 # $RESYNC_STOP - batchSize = 10 # $RESYNC_BATCH_SIZE - batchNumber = 100 # $RESYNC_BATCH_NUMBER - timeout = 300 # $HTTP_TIMEOUT - clearOldCache = true # $RESYNC_CLEAR_OLD_CACHE - resetValidation = true # $RESYNC_RESET_VALIDATION -``` - -Additional parameters need to be set depending on the specific chain. - -For Bitcoin: - -```toml -[bitcoin] - httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH - pass = "password" # $BTC_NODE_PASSWORD - user = "username" # $BTC_NODE_USER - nodeID = "ocd0" # $BTC_NODE_ID - clientName = "Omnicore" # $BTC_CLIENT_NAME - genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK - networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID -``` - -For Ethereum: - -```toml -[ethereum] - httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH - nodeID = "arch1" # $ETH_NODE_ID - clientName = "Geth" # $ETH_CLIENT_NAME - genesisBlock = "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3" # $ETH_GENESIS_BLOCK - networkID = "1" # $ETH_NETWORK_ID -``` diff --git a/documentation/watcher.md b/documentation/watcher.md deleted file mode 100644 index c7748f6dc..000000000 --- a/documentation/watcher.md +++ /dev/null @@ -1,16 +0,0 @@ -These are the components of a VulcanizeDB Watcher: -* Data Fetcher/Streamer sources: - * go-ethereum - * bitcoind - * btcd - * IPFS -* Transformers contain: - * converter - * publisher - * indexer -* Endpoints contain: - * api - * backend - * filterer - * retriever - * ipld_server \ No newline at end of file