Skip to content
Malte Möser edited this page Aug 14, 2019 · 17 revisions

Frequently Asked Questions

Table of Contents

General

Where can I find BlockSci's documentation?

Documentation for the Python interface is available here. Most users will want to use this interface.

Does BlockSci support cryptocurrency XYZ?

BlockSci supports many cryptocurrencies that are similar to Bitcoin (e.g., they forked Bitcoin's codebase and made no modifications to the data model). BlockSci comes with a disk parser that is highly optimized for Bitcoin, and a RPC parser that should work with most forks of Bitcoin (but is much slower than the disk parser).

The disk parser can break when a cryptocurrency changes the data format, adds new consensus rules or otherwise changes the rules of how blocks and transactions are created.

Does BlockSci support Monero?

No. Monero's data model is different from Bitcoin's and thus doesn't currently work with BlockSci. It would be possible to extend BlockSci to support Monero, but this is currently not on our roadmap.

Does BlockSci support Ethereum?

No. Ethereum's design is fundamentally different from Bitcoin's and thus incompatible with BlockSci.

Does BlockSci support Omni Layer / Colored Coins / etc.?

BlockSci only handles parsing of the core blockchain layer (layer 1), but exposes any special data stored in the blockchain. Thus, for most protocols that build upon layer 1, you can write your own analysis code.

Related issues:

What software do you use to develop BlockSci?

We're developing BlockSci on OSX using XCode. You can easily generate an XCode project using cmake:

mkdir xcode && cd xcode
cmake -G Xcode -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl ..

We don't have any recommendations for IDEs on other platforms, though we are using gdb to debug BlockSci on Linux.

Does BlockSci run on CentOS / Windows / etc.?

We only provide support for Ubuntu and OSX (MacOS). It may be possible to run BlockSci on other platforms by manually compiling the various dependencies.

Common Issues

Open files limit: Addresses are missing transactions

The default open files limit of many Linux distributions (e.g., Ubuntu) is too small for BlockSci. This can lead to, among other things, transactions apparently missing from addresses (i.e. when using addr.txes()). After you have increased the open files limit, reparse the chain and those missing transactions should show up.

Clustering

Does BlockSci provide state-of-the-art clustering?

BlockSci provides the fundamental building blocks of address clustering: multi-input clustering with CoinJoin detection and change address clustering with support for various different change address heuristics.

There are, however, many corner cases (e.g., MtGox allowing users to import their private keys, breaking the multi-input heuristic) that require special treatment to prevent the occurrence of "superclusters". Superclusters are extremely large clusters that occur when different clusters collapse into each other due to over-eager address linking. To some degree, address clustering today is more art than science, and building a highly accurate clustering module, while possible, is not in the current roadmap for BlockSci. Anything that goes beyond the basic address clustering described above, you'll need to implement yourself.

Here's some helpful literature on address clustering:

How do I use BlockSci's clustering module?

We recommend using the clustering module available through the Python interface.

If you haven't used the clusterer before, you'll need to first create a clustering:

import blocksci
chain = blocksci.chain("/path/to/blocksci/data/") # in v0.6 this needs to point to the config file

cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain)

If you already created such a clustering, you can simply load it:

cm = blocksci.cluster.ClusterManager("/directory/where/cluster/files/can/be/stored", chain)

Which heuristic is the clusterer using by default?

By default, the clusterer is using the following two heuristics:

  • Multi-Input: Inputs that are co-spent in the same transaction are clustered together, unless the transaction looks like a CoinJoin transaction.
  • Legacy Change: If there is an output that has less value than any of the inputs and was the first output to send coins to the associated address, it is clustered as the change address.

How do I use a different change address heuristic?

BlockSci provides a number of different change address heuristics.

You can use a different change address heuristic by passing it to the create_clustering function. For example:

reuse_change_heuristic = blocksci.heuristics.change.address_reuse()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, reuse_change_heuristic)

How do I disable change address clustering?

Currently, you need to use the following workaround to disable change address clustering:

no_change_heuristic = blocksci.heuristics.change.legacy() - blocksci.heuristics.change.legacy()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, no_change_heuristic)

In v0.6, you can use the none heuristic:

cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, blocksci.heuristics.change.none)

Why do some clusters appear to be empty?

Clusters may appear to be empty (with cluster.size() == 0 and cluster.transactions() == []) while cluster.type_equiv_size is greater than 0. This is not a bug, but an artifact of BlockSci's internal deduplication.

For example, assume there is a multisig address with three pubkeys. BlockSci keeps track of the three pubkeys independently of their combined use in a multisig address. During clustering, each of these four addresses (the multisig as well as the three pubkeys) starts in their own cluster. If the individual pubkeys are never used on their own, they'll remain in their single-address cluster. If a method such as .size() or .transactions() is called for such a cluster, BlockSci will check whether the addresses in the cluster have actually been used. If an address has never been used individually (as in the example above), BlockSci will tell you that the cluster is empty.

Why is cluster.size() slow?

Clustering works based on equiv addresses (see above). When calling cluster.size(), BlockSci first needs to look up in a database with which address types the equiv addresses are actually used on chain.

Instead, you can use cluster.type_equiv_size which does not need to perform the database lookups but simply returns the number of equiv addresses in the cluster.

Analysis / How do I ...?

How can I map addresses to exchanges or pools?

BlockSci allows to tag address clusters with names, but we don't provide any such tags ourself. There are a few public sources such as WalletExplorer or Blockchain.info, but they may not be reliable or complete.

BlockSci can map blocks to pools by looking at the information contained in the coinbase transaction, but the data we use to identify pools does not cover all pools/coinbase transactions. Furthermore, there's no guarantee that miners report their identity correctly in the coinbase transaction.

blocksci.get_miner(chain[300005])
>>> 'SlushPool'

Related Issues: #160, #250

How do I extract the full scriptPubKey and scriptSig of an output/input?

For most standard scripts, BlockSci does not store the full scriptSig and scriptPubKey but instead extracts the important information and stores it as an Address. Docs » Reference » Address Classes » Addresses provides more information about what information is stored.

The actual scriptSig and scriptPubKey are stored only for non-standard scripts. For example:

myout = chain.tx_with_hash("15c2b9bc3b93e0c0a037c5fa8402d0e34e13d3bb0ce7fca65888e5d24e597dcc").outputs[0]

myout.address_type == blocksci.address_type.nonstandard
>> True

myoutput.address.out_script
>> 'OP_DEPTH OP_1SUB OP_IF OP_RETURN 737069746861736820616e6420796d6f64652c2062726f6772616d6d657273346c796665 OP_ENDIF 0 OP_TOALTSTACK OP_DUP OP_HASH256 efb81cd930d56703304f63d7f94575c4cd17f0985ed2fd126aabf1d866471d2f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 9ddd5c986827e8bc5848b4fdc1f8152f597b852ed2429ae7ee2baf7a14096a8f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 fda5bd74925349ba07de25db126b9148a7a508e48475c33d2abe7c81a341a3ab OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_FROMALTSTACK'

How can I extract balances of all addresses?

See Faster way to get all address balances #264

How can I plot the UTXO Age Distribution over time?

See Updating the UTXO set at each block #108

Clone this wiki locally