KYVE - DataLoadTool

Load datasets from KYVE pools to BigQuery and Postgres

Build from Source

git clone https://github.com/KYVENetwork/kyve-dlt.git
cd kyve-dlt
make build

This will build dlt in /build directory. Afterward, you may want to put it into your machine's PATH like as follows:

cp build/dlt ~/go/bin/dlt

Initialization

To set up the dlt config, run:

dlt init

This will either guide you to set up a first source and destination or create a config with default values. A connection consisting of a source and a destination is required to start any sync process.

Usage

Depending on what you want to achieve with dlt there are two commands available. A quick summary of what they do and when to use them can be found below:

	Description	Recommendation
load	Loads data from a KYVE source into a destination.	Generally recommended to load a dataset into a destination.
sync	Runs a supervised incremental loading process in cronjob to keep the data destination in sync with the data source.	Recommended to keep an already loaded dataset updated with incrementally added data.

`load`

Usage:

dlt load --connection connection_1

To start the loading process from or to a certain bundle, simply use the --from-bundle-id or --to-bundle-id flags.

dlt always checks if a bundle was already loaded into a destination. By default, the to-bundle-id is set to the latest found bundle ID + 1. To force the loading of the specified range of bundle, simple use the --force flag.

`sync`

Usage:

dlt sync --connection connection_1

sync executes the loading process in a cronjob. Therefore, specify the connection -> cron field in the config. The following cron syntax is expected: * * * * *. The supervising process always checks if an underlying loading process is currently running to prevent parallelized syncs. It waits until all processes are finished to proceed with the sync.

Manage config

With the following commands, sources, destinations, and connections can be added, removed or listed:

dlt sources      {add|remove|list}
dlt destinations {add|remove|list}
dlt connections  {add|remove|list}

Schemas

Base

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "_dlt_raw_id": { "type": "string" },
    "_dlt_extracted_at": { "type": "string" },
    "key": { "type": "string" },
    "value": { "type": "string" },
    "bundle_id": { "type": "integer" }
  },
  "required": ["_dlt_raw_id", "_dlt_extracted_at", "key", "value", "bundle_id"]
}

This schema supports all KYVE datasets by default and consists of the core key and value structure that is required by the KYVE protocol. The key is the unique identifier of the data item in a data pool (e.g. height of a block, timestamp), whereas the value includes the actual data.

Height

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "_dlt_raw_id": { "type": "string" },
    "_dlt_extracted_at": { "type": "string" },
    "height": { "type": "integer" },
    "value": { "type": "string" },
    "bundle_id": { "type": "integer" }
  },
  "required": ["_dlt_raw_id", "_dlt_extracted_at", "key", "value", "bundle_id"]
}

This schema is supported for all KYVE pools with an INTEGER as key (runtime: @kyvejs/tendermint, @kyvejs/tendermint-bsync, @kyvejs/evm). Instead of using the raw key, it converts it to a height as integer.

TendermintPreprocessed

This schema is supported for all Tendermint pools (runtime: @kyvejs/tendermint) and preprocesses the events in individual rows. It is recommended to use for datasets with big data items (e.g. Osmosis). This is the schema:

{
  "type": "object",
  "properties": {
    "_dlt_raw_id": { "type": "string" },
    "_dlt_extracted_at": { "type": "string" },
    "type": { "type": "string" },
    "value": { "type": "string" },
    "height": { "type": "string" },
    "array_index": { "type": "integer" },
    "bundle_id": { "type": "integer" }
  },
  "required": ["_dlt_raw_id", "_dlt_extracted_at", "type", "value", "height", "array_index", "bundle_id"]
}

For this schema, the height itself is no unique identifier, because more than one rows are written for a single data item. The first row includes the block without events in the value field (type = "block"). The events with type begin_block_event, tx_result, and end_block_event follow, including the event value in value and an array_index. This structure allows everyone to reconstruct the data completely.

Supported Destinations

BigQuery
Postgres

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
cmd/dlt		cmd/dlt
destinations		destinations
loader		loader
schema		schema
utils		utils
.gitignore		.gitignore
Makefile		Makefile
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KYVE - DataLoadTool

Build from Source

Initialization

Usage

`load`

`sync`

Manage config

Schemas

Base

Height

TendermintPreprocessed

Supported Destinations

About

Releases 1

Packages

Contributors 2

Languages

KYVENetwork/kyve-dlt

Folders and files

Latest commit

History

Repository files navigation

KYVE - DataLoadTool

Build from Source

Initialization

Usage

load

sync

Manage config

Schemas

Base

Height

TendermintPreprocessed

Supported Destinations

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

`load`

`sync`

Packages